[PATCH v4 0/7] add fuse-over-io

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v4 0/7] add fuse-over-io_uring support
@ 2026-02-07 12:08 Brian Song
  2026-02-07 12:08 ` [Patch v4 1/7] aio-posix: enable 128-byte SQEs Brian Song
                   ` (6 more replies)
  0 siblings, 7 replies; 21+ messages in thread
From: Brian Song @ 2026-02-07 12:08 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, hibriansong, hreitz, kwolf, eblake, armbru, stefanha,
	fam, bernd

Hello everyone,

This patch series introduces native io_uring support for FUSE storage
export to overcome the scalability limits of the /dev/fuse interface.
By utilizing shared memory ring buffers and per-core queues, this
feature drastically reduces context switch overhead and lock contention.
This allows FUSE export daemons to achieve much higher throughput and
lower latency by minimizing the userspace-kernel switch penalty.

More details on Fuse-over-io_uring:
https://docs.kernel.org/filesystems/fuse/fuse-io-uring.html


Changes in this version:

- Reorganized patch structure.
- Unified naming of Uring data structures (e.g. FuseRing -> FuseUring)
- Refactored FUSE_IN/OUT_OP_STRUCT_LEGACY
- Code cleanup and logic simplification:
	- Used the io_uring flag to indicate the intention to enable
	  Fuse-over-io_uring.
	- Used uring_started to track the active state.
	- Removed unnecessary #ifdef CONFIG_LINUX_IO_URING guards.
- Moved fuse_fd closing to BH in uring mode to prevent data races.
- Updated tests: now using mount to verify if the test image mount is
  fully gone.

More detail in the v3 cover letter:
https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00325.html

V2 cover letter:
https://lists.nongnu.org/archive/html/qemu-block/2025-08/msg00140.html

V1 cover letter:
https://lists.nongnu.org/archive/html/qemu-block/2025-07/msg00280.html

We used fio to test a 1GB file under both legacy FUSE and
FUSE-over-io_uring modes. The experiments were conducted with the
following iodepth and numjobs configurations: 1-1, 64-1, 1-4, and 64-4,
with 70% read and 30% write mix. This resulted in a total of 8 test
cases, measuring both latency and throughput.

Performance Results:

[Bandwidth (MiB/s)]
| Config (Job/QD)  | Read (Leg -> Uring) | Write (Leg -> Uring)|
|------------------|---------------------|---------------------|
| 1 Job, QD=1      | 72.2 -> 104         | 30.9 -> 44.7        |
| 1 Job, QD=64     | 114  -> 181         | 48.8 -> 77.7        |
| 4 Jobs, QD=1     | 109  -> 159         | 47.0 -> 68.5        |
| 4 Jobs, QD=64    | 106  -> 160         | 45.7 -> 68.9        |

[Latency (usec)]
| Config (Job/QD)  | Read (Leg -> Uring) | Write (Leg -> Uring)|
|------------------|---------------------|---------------------|
| 1 Job, QD=1      | 37.0 -> 23.7        | 36.9 -> 29.5        |
| 1 Job, QD=64     | 1537 -> 964         | 1535 -> 967         |
| 4 Jobs, QD=1     | 96.6 -> 66.4        | 114.2 -> 71.9       |
| 4 Jobs, QD=64    | 6560 -> 4234        | 6600 -> 4280        |

Brian Song (7):
  [Patch v4 1/7] aio-posix: enable 128-byte SQEs
  [Patch v4 2/7] fuse: io_uring mode init
  [Patch v4 3/7] fuse: uring support for write ops
  [Patch v4 4/7] fuse: refactor FUSE request handler
  [Patch v4 5/6] fuse: safe termination for io_uring
  [Patch v4 6/7] fuse: add 'io-uring' option
  [Patch v4 7/7] fuse: add io_uring test support

 block/export/fuse.c                  | 958 +++++++++++++++++++++++----
 docs/tools/qemu-storage-daemon.rst   |   7 +-
 qapi/block-export.json               |   5 +-
 storage-daemon/qemu-storage-daemon.c |   1 +
 tests/qemu-iotests/check             |   2 +
 tests/qemu-iotests/common.rc         |  47 +-
 util/fdmon-io_uring.c                |   7 +-
 7 files changed, 879 insertions(+), 148 deletions(-)

--
2.43.0



^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Patch v4 1/7] aio-posix: enable 128-byte SQEs
  2026-02-07 12:08 [PATCH v4 0/7] add fuse-over-io_uring support Brian Song
@ 2026-02-07 12:08 ` Brian Song
  2026-02-11 20:28   ` Stefan Hajnoczi
  2026-02-07 12:08 ` [Patch v4 2/7] fuse: io_uring mode init Brian Song
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Brian Song @ 2026-02-07 12:08 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, hibriansong, hreitz, kwolf, eblake, armbru, stefanha,
	fam, bernd

This patch enables the IORING_SETUP_SQE128 flag during io_uring
initialization to support the FUSE protocol requirements.

The FUSE-over-io_uring implementation embeds a protocol-specific
structure directly into the Submission Queue Entry (SQE)
to pass metadata such as the queue ID and commit ID.

Enabling SQE128 expands the SQE size to 128 bytes, providing 80 bytes
of available command space. This ensures sufficient room for the FUSE
headers and future protocol extensions.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Brian Song <hibriansong@gmail.com>
---
 util/fdmon-io_uring.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c
index d2433d1d99..e6efc8d8f7 100644
--- a/util/fdmon-io_uring.c
+++ b/util/fdmon-io_uring.c
@@ -452,10 +452,15 @@ static const FDMonOps fdmon_io_uring_ops = {
 void fdmon_io_uring_setup(AioContext *ctx, Error **errp)
 {
     int ret;
+    int flags;

     ctx->io_uring_fd_tag = NULL;

-    ret = io_uring_queue_init(FDMON_IO_URING_ENTRIES, &ctx->fdmon_io_uring, 0);
+    /* Needed by FUSE-over-io_uring */
+    flags = IORING_SETUP_SQE128;
+
+    ret = io_uring_queue_init(FDMON_IO_URING_ENTRIES,
+                              &ctx->fdmon_io_uring, flags);
     if (ret != 0) {
         error_setg_errno(errp, -ret, "Failed to initialize io_uring");
         return;
--
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Patch v4 2/7] fuse: io_uring mode init
  2026-02-07 12:08 [PATCH v4 0/7] add fuse-over-io_uring support Brian Song
  2026-02-07 12:08 ` [Patch v4 1/7] aio-posix: enable 128-byte SQEs Brian Song
@ 2026-02-07 12:08 ` Brian Song
  2026-02-11 20:56   ` Stefan Hajnoczi
  2026-02-07 12:08 ` [Patch v4 3/7] fuse: uring support for write ops Brian Song
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Brian Song @ 2026-02-07 12:08 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, hibriansong, hreitz, kwolf, eblake, armbru, stefanha,
	fam, bernd

The kernel documentation describes in detail how FUSE-over-io_uring
works: https://docs.kernel.org/filesystems/fuse/fuse-io-uring.html

This patch utilizes the legacy FUSE interface (/dev/fuse) during the
initialization phase to perform a protocol handshake with the kernel
driver. Once FUSE-over-io_uring support is negotiated, the
FUSE_IO_URING_CMD_REGISTER command is submitted to register the
io_uring queues.

Support for multiple IOThreads is also added to boost concurrency.
Since the current Linux kernel implementation requires registering one
uring queue per CPU core, we allocate the required number of
queues (nproc) and distribute them across the user-specified IOThreads
in a round-robin manner. To support concurrent in-flight requests per
io_uring queue, each ring queue is configured with
FUSE_DEFAULT_RING_QUEUE_DEPTH entries.

Specifically, the workflow is as follows:

- Initialize the io_uring queue depth when creating storage exports.
- Upon receiving a FUSE initialization request via the legacy path:
    - Perform the protocol handshake to confirm feature support.
    - Complete FUSE-over-io_uring registration, which includes:
        - Pre-allocating uring queue entries and payload buffers.
        - Binding the CQE handler to process incoming file operations.
        - Initializing Submission Queue Entries (SQEs).
        - Distributing the uring queues across FUSE IOThreads using a
          round-robin strategy.

After successful registration, the FUSE-over-io_uring CQE handler
takes over the processing of FUSE requests.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Brian Song <hibriansong@gmail.com>
---
 block/export/fuse.c | 430 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 414 insertions(+), 16 deletions(-)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index c0ad4696ce..ae7490b2a1 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -2,6 +2,7 @@
  * Present a block device as a raw image through FUSE
  *
  * Copyright (c) 2020, 2025 Hanna Czenczek <hreitz@redhat.com>
+ * Copyright (c) 2025, 2026 Brian Song <hibriansong@gmail.com>
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License as published by
@@ -39,6 +40,7 @@

 #include "standard-headers/linux/fuse.h"
 #include <sys/ioctl.h>
+#include <sys/sysinfo.h>

 #if defined(CONFIG_FALLOCATE_ZERO_RANGE)
 #include <linux/falloc.h>
@@ -63,12 +65,69 @@
     (FUSE_MAX_WRITE_BYTES - FUSE_IN_PLACE_WRITE_BYTES)

 typedef struct FuseExport FuseExport;
+typedef struct FuseQueue FuseQueue;
+
+#ifdef CONFIG_LINUX_IO_URING
+#define FUSE_DEFAULT_URING_QUEUE_DEPTH 64
+#define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32
+/*
+ * Under FUSE-over-io_uring mode:
+ *
+ * Each FuseUringEnt represents a FUSE request. It exposes two iovec entries for
+ * communication between the kernel driver and the userspace server:
+ *
+ * - The first iovec contains the request header (FUSE_BUFFER_HEADER_SIZE),
+ *   holding metadata describing the request.
+ * - The second iovec contains the payload, used for READ/WRITE operations.
+ */
+#define FUSE_BUFFER_HEADER_SIZE 0x1000
+
+typedef struct FuseUringQueue FuseUringQueue;
+
+typedef struct FuseUringEnt {
+    /* back pointer */
+    FuseUringQueue *rq;
+
+    /* commit id of a fuse request */
+    uint64_t req_commit_id;
+
+    /* fuse request header and payload */
+    struct fuse_uring_req_header req_header;
+    void *req_payload;
+    size_t req_payload_sz;
+
+    /* used for retry */
+    enum fuse_uring_cmd last_cmd;
+
+    /* The vector passed to the kernel */
+    struct iovec iov[2];
+
+    CqeHandler fuse_cqe_handler;
+} FuseUringEnt;
+
+/*
+ * In the current Linux kernel, FUSE-over-io_uring requires registering one
+ * FuseUringQueue per host CPU. These queues are allocated during setup
+ * and distributed to user-specified IOThreads (FuseQueue) in a round-robin
+ * fashion.
+ */
+struct FuseUringQueue {
+    int rqid;
+
+    /* back pointer */
+    FuseQueue *q;
+    FuseUringEnt *ent;
+
+    /* List entry for uring_queues */
+    QLIST_ENTRY(FuseUringQueue) next;
+};
+#endif /* CONFIG_LINUX_IO_URING */

 /*
  * One FUSE "queue", representing one FUSE FD from which requests are fetched
  * and processed.  Each queue is tied to an AioContext.
  */
-typedef struct FuseQueue {
+struct FuseQueue {
     FuseExport *exp;

     AioContext *ctx;
@@ -109,7 +168,11 @@ typedef struct FuseQueue {
      * Free this buffer with qemu_vfree().
      */
     void *spillover_buf;
-} FuseQueue;
+
+#ifdef CONFIG_LINUX_IO_URING
+    QLIST_HEAD(, FuseUringQueue) uring_queue_list;
+#endif
+};

 /*
  * Verify that FuseQueue.request_buf plus the spill-over buffer together
@@ -133,7 +196,7 @@ struct FuseExport {
      */
     bool halted;

-    int num_queues;
+    int num_fuse_queues;
     FuseQueue *queues;
     /*
      * True if this export should follow the generic export's AioContext.
@@ -149,6 +212,17 @@ struct FuseExport {
     /* Whether allow_other was used as a mount option or not */
     bool allow_other;

+    /* Whether to enable FUSE-over-io_uring */
+    bool is_uring;
+    /* Whether FUSE-over-io_uring is active */
+    bool uring_started;
+
+#ifdef CONFIG_LINUX_IO_URING
+    int uring_queue_depth;
+    int num_uring_queues;
+    FuseUringQueue *uring_queues;
+#endif
+
     mode_t st_mode;
     uid_t st_uid;
     gid_t st_gid;
@@ -205,7 +279,7 @@ static void fuse_attach_handlers(FuseExport *exp)
         return;
     }

-    for (int i = 0; i < exp->num_queues; i++) {
+    for (int i = 0; i < exp->num_fuse_queues; i++) {
         aio_set_fd_handler(exp->queues[i].ctx, exp->queues[i].fuse_fd,
                            read_from_fuse_fd, NULL, NULL, NULL,
                            &exp->queues[i]);
@@ -218,7 +292,7 @@ static void fuse_attach_handlers(FuseExport *exp)
  */
 static void fuse_detach_handlers(FuseExport *exp)
 {
-    for (int i = 0; i < exp->num_queues; i++) {
+    for (int i = 0; i < exp->num_fuse_queues; i++) {
         aio_set_fd_handler(exp->queues[i].ctx, exp->queues[i].fuse_fd,
                            NULL, NULL, NULL, NULL, NULL);
     }
@@ -237,7 +311,7 @@ static void fuse_export_drained_end(void *opaque)
     /* Refresh AioContext in case it changed */
     exp->common.ctx = blk_get_aio_context(exp->common.blk);
     if (exp->follow_aio_context) {
-        assert(exp->num_queues == 1);
+        assert(exp->num_fuse_queues == 1);
         exp->queues[0].ctx = exp->common.ctx;
     }

@@ -257,6 +331,248 @@ static const BlockDevOps fuse_export_blk_dev_ops = {
     .drained_poll  = fuse_export_drained_poll,
 };

+#ifdef CONFIG_LINUX_IO_URING
+static void coroutine_fn fuse_uring_co_process_request(FuseUringEnt *ent);
+static void fuse_uring_resubmit(struct io_uring_sqe *sqe, void *opaque);
+
+/**
+ * fuse_inc_in_flight() / fuse_dec_in_flight():
+ * Wrap the lifecycle of FUSE requests being processed. This ensures the
+ * block layer's drain operation waits for active requests to complete
+ * and prevents the export from being deleted prematurely.
+ *
+ * blk_exp_ref() / blk_exp_unref():
+ * Prevent the export from being deleted while there are outstanding
+ * dependencies.
+ *
+ * FUSE-over-io_uring mapping details:
+ *
+ * 1. SQE/CQE Lifecycle:
+ * blk_exp_ref() is called on SQE submission, and blk_exp_unref() on
+ * CQE completion. This protects the export until the kernel is done
+ * with the entry.
+ *
+ * 2. Request Processing:
+ * The coroutine processing a FUSE request must allow the drain operation
+ * to track it.
+ *
+ * - fuse_inc_in_flight() must be called *before* the coroutine starts
+ * (i.e., before qemu_coroutine_enter).
+ * - fuse_dec_in_flight() is called after processing ends.
+ *
+ * There is a small window where a CQE is pending in an iothread
+ * but the coroutine hasn't started yet. If we don't increment in_flight
+ * early, the main thread's drain operation might see zero in-flight
+ * requests and return early, falsely assuming the section is drained
+ * even though a request is about to be processed.
+ */
+static void coroutine_fn co_fuse_uring_queue_handle_cqe(void *opaque)
+{
+    FuseUringEnt *ent = opaque;
+    FuseExport *exp = ent->rq->q->exp;
+
+    /* A uring entry returned */
+    blk_exp_unref(&exp->common);
+
+    fuse_uring_co_process_request(ent);
+
+    /* Request is no longer in flight */
+    fuse_dec_in_flight(exp);
+}
+
+static void fuse_uring_cqe_handler(CqeHandler *cqe_handler)
+{
+    Coroutine *co;
+    FuseUringEnt *ent =
+        container_of(cqe_handler, FuseUringEnt, fuse_cqe_handler);
+    FuseExport *exp = ent->rq->q->exp;
+
+    if (unlikely(exp->halted)) {
+        return;
+    }
+
+    int err = cqe_handler->cqe.res;
+
+    if (unlikely(err != 0)) {
+        switch (err) {
+        case -EAGAIN:
+        case -EINTR:
+            aio_add_sqe(fuse_uring_resubmit, ent, &ent->fuse_cqe_handler);
+            break;
+        case -ENOTCONN:
+            /* Connection already gone */
+            break;
+        default:
+            fuse_export_halt(exp);
+            break;
+        }
+
+        /* A uring entry returned */
+        blk_exp_unref(&exp->common);
+    } else {
+        co = qemu_coroutine_create(co_fuse_uring_queue_handle_cqe, ent);
+        /* Account this request as in-flight */
+        fuse_inc_in_flight(exp);
+        qemu_coroutine_enter(co);
+    }
+}
+
+static void
+fuse_uring_sqe_set_req_data(struct fuse_uring_cmd_req *req,
+                            const unsigned int rqid,
+                            const unsigned int commit_id)
+{
+    req->qid = rqid;
+    req->commit_id = commit_id;
+    req->flags = 0;
+}
+
+static void
+fuse_uring_sqe_prepare(struct io_uring_sqe *sqe, FuseQueue *q, __u32 cmd_op)
+{
+    sqe->opcode = IORING_OP_URING_CMD;
+
+    sqe->fd = q->fuse_fd;
+    sqe->rw_flags = 0;
+    sqe->ioprio = 0;
+    sqe->off = 0;
+
+    sqe->cmd_op = cmd_op;
+    sqe->__pad1 = 0;
+}
+
+static void fuse_uring_prep_sqe_register(struct io_uring_sqe *sqe, void *opaque)
+{
+    FuseUringEnt *ent = opaque;
+    struct fuse_uring_cmd_req *req = (void *)&sqe->cmd[0];
+
+    ent->last_cmd = FUSE_IO_URING_CMD_REGISTER;
+    fuse_uring_sqe_prepare(sqe, ent->rq->q, ent->last_cmd);
+
+    sqe->addr = (uint64_t)(ent->iov);
+    sqe->len = 2;
+
+    fuse_uring_sqe_set_req_data(req, ent->rq->rqid, 0);
+}
+
+static void fuse_uring_resubmit(struct io_uring_sqe *sqe, void *opaque)
+{
+    FuseUringEnt *ent = opaque;
+    struct fuse_uring_cmd_req *req = (void *)&sqe->cmd[0];
+
+    fuse_uring_sqe_prepare(sqe, ent->rq->q, ent->last_cmd);
+
+    switch (ent->last_cmd) {
+    case FUSE_IO_URING_CMD_REGISTER:
+        sqe->addr = (uint64_t)(ent->iov);
+        sqe->len = 2;
+        fuse_uring_sqe_set_req_data(req, ent->rq->rqid, 0);
+        break;
+    case FUSE_IO_URING_CMD_COMMIT_AND_FETCH:
+        fuse_uring_sqe_set_req_data(req, ent->rq->rqid, ent->req_commit_id);
+        break;
+    default:
+        error_report("Unknown command type: %d", ent->last_cmd);
+        break;
+    }
+}
+
+static void fuse_uring_submit_register(void *opaque)
+{
+    FuseUringQueue *rq = opaque;
+    FuseExport *exp = rq->q->exp;
+
+    for (int j = 0; j < exp->uring_queue_depth; j++) {
+        /* Register a uring entry */
+        blk_exp_ref(&exp->common);
+
+        aio_add_sqe(fuse_uring_prep_sqe_register, &rq->ent[j],
+                    &rq->ent[j].fuse_cqe_handler);
+    }
+}
+
+/**
+ * Distribute uring queues across FUSE queues in the round-robin manner.
+ * This ensures even distribution of kernel uring queues across user-specified
+ * FUSE queues.
+ *
+ * num_uring_queues > num_fuse_queues: Each IOThread manages multiple uring
+ * queues (multi-queue mapping).
+ * num_uring_queues < num_fuse_queues: Excess IOThreads remain idle with no
+ * assigned uring queues.
+ */
+static void fuse_uring_setup_queues(FuseExport *exp, size_t bufsize)
+{
+    int num_uring_queues = get_nprocs_conf();
+
+    exp->num_uring_queues = num_uring_queues;
+    exp->uring_queues = g_new(FuseUringQueue, num_uring_queues);
+
+    for (int i = 0; i < num_uring_queues; i++) {
+        FuseUringQueue *rq = &exp->uring_queues[i];
+        rq->rqid = i;
+        rq->ent = g_new(FuseUringEnt, exp->uring_queue_depth);
+
+        for (int j = 0; j < exp->uring_queue_depth; j++) {
+            FuseUringEnt *ent = &rq->ent[j];
+            ent->rq = rq;
+            ent->req_payload_sz = bufsize - FUSE_BUFFER_HEADER_SIZE;
+            ent->req_payload = g_malloc0(ent->req_payload_sz);
+
+            ent->iov[0] = (struct iovec) {
+                &ent->req_header,
+                sizeof(struct fuse_uring_req_header)
+            };
+            ent->iov[1] = (struct iovec) {
+                ent->req_payload,
+                ent->req_payload_sz
+            };
+
+            ent->fuse_cqe_handler.cb = fuse_uring_cqe_handler;
+        }
+
+        /* Distribute uring queues across FUSE queues */
+        rq->q = &exp->queues[i % exp->num_fuse_queues];
+        QLIST_INSERT_HEAD(&(rq->q->uring_queue_list), rq, next);
+    }
+}
+
+static void
+fuse_schedule_ring_queue_registrations(FuseExport *exp)
+{
+    for (int i = 0; i < exp->num_fuse_queues; i++) {
+        FuseQueue *q = &exp->queues[i];
+        FuseUringQueue *rq;
+
+        QLIST_FOREACH(rq, &q->uring_queue_list, next) {
+            aio_bh_schedule_oneshot(q->ctx, fuse_uring_submit_register, rq);
+        }
+    }
+}
+
+static void fuse_uring_start(FuseExport *exp, struct fuse_init_out *out)
+{
+    assert(!exp->uring_started);
+    exp->uring_started = true;
+
+    /*
+     * Since we dont't enable the FUSE_MAX_PAGES feature, the value of
+     * fc->max_pages should be FUSE_DEFAULT_MAX_PAGES_PER_REQ, which is set by
+     * the kernel by default. Also, max_write should not exceed
+     * FUSE_DEFAULT_MAX_PAGES_PER_REQ * PAGE_SIZE.
+     */
+    size_t bufsize = out->max_write + FUSE_BUFFER_HEADER_SIZE;
+
+    if (!(out->flags & FUSE_MAX_PAGES)) {
+        bufsize = FUSE_DEFAULT_MAX_PAGES_PER_REQ * qemu_real_host_page_size()
+                         + FUSE_BUFFER_HEADER_SIZE;
+    }
+
+    fuse_uring_setup_queues(exp, bufsize);
+    fuse_schedule_ring_queue_registrations(exp);
+}
+#endif /* CONFIG_LINUX_IO_URING */
+
 static int fuse_export_create(BlockExport *blk_exp,
                               BlockExportOptions *blk_exp_args,
                               AioContext *const *multithread,
@@ -270,12 +586,24 @@ static int fuse_export_create(BlockExport *blk_exp,

     assert(blk_exp_args->type == BLOCK_EXPORT_TYPE_FUSE);

+#ifdef CONFIG_LINUX_IO_URING
+    /* TODO Add FUSE-over-io_uring Option */
+    exp->is_uring = false;
+    exp->uring_queue_depth = FUSE_DEFAULT_URING_QUEUE_DEPTH;
+#else
+    if (args->io_uring) {
+        error_setg(errp, "FUSE-over-io_uring requires CONFIG_LINUX_IO_URING");
+        return -ENOTSUP;
+    }
+    exp->is_uring = false;
+#endif
+
     if (multithread) {
         /* Guaranteed by common export code */
         assert(mt_count >= 1);

         exp->follow_aio_context = false;
-        exp->num_queues = mt_count;
+        exp->num_fuse_queues = mt_count;
         exp->queues = g_new(FuseQueue, mt_count);

         for (size_t i = 0; i < mt_count; i++) {
@@ -283,6 +611,10 @@ static int fuse_export_create(BlockExport *blk_exp,
                 .exp = exp,
                 .ctx = multithread[i],
                 .fuse_fd = -1,
+#ifdef CONFIG_LINUX_IO_URING
+                .uring_queue_list =
+                    QLIST_HEAD_INITIALIZER(exp->queues[i].uring_queue_list),
+#endif
             };
         }
     } else {
@@ -290,12 +622,16 @@ static int fuse_export_create(BlockExport *blk_exp,
         assert(mt_count == 0);

         exp->follow_aio_context = true;
-        exp->num_queues = 1;
+        exp->num_fuse_queues = 1;
         exp->queues = g_new(FuseQueue, 1);
         exp->queues[0] = (FuseQueue) {
             .exp = exp,
             .ctx = exp->common.ctx,
             .fuse_fd = -1,
+#ifdef CONFIG_LINUX_IO_URING
+            .uring_queue_list =
+                QLIST_HEAD_INITIALIZER(exp->queues[0].uring_queue_list),
+#endif
         };
     }

@@ -383,7 +719,7 @@ static int fuse_export_create(BlockExport *blk_exp,

     g_hash_table_insert(exports, g_strdup(exp->mountpoint), NULL);

-    assert(exp->num_queues >= 1);
+    assert(exp->num_fuse_queues >= 1);
     exp->queues[0].fuse_fd = fuse_session_fd(exp->fuse_session);
     ret = qemu_fcntl_addfl(exp->queues[0].fuse_fd, O_NONBLOCK);
     if (ret < 0) {
@@ -391,7 +727,7 @@ static int fuse_export_create(BlockExport *blk_exp,
         goto fail;
     }

-    for (int i = 1; i < exp->num_queues; i++) {
+    for (int i = 1; i < exp->num_fuse_queues; i++) {
         int fd = clone_fuse_fd(exp->queues[0].fuse_fd, errp);
         if (fd < 0) {
             ret = fd;
@@ -618,7 +954,7 @@ static void fuse_export_delete(BlockExport *blk_exp)
 {
     FuseExport *exp = container_of(blk_exp, FuseExport, common);

-    for (int i = 0; i < exp->num_queues; i++) {
+    for (int i = 0; i < exp->num_fuse_queues; i++) {
         FuseQueue *q = &exp->queues[i];

         /* Queue 0's FD belongs to the FUSE session */
@@ -685,17 +1021,37 @@ static bool is_regular_file(const char *path, Error **errp)
  */
 static ssize_t coroutine_fn
 fuse_co_init(FuseExport *exp, struct fuse_init_out *out,
-             uint32_t max_readahead, uint32_t flags)
+             uint32_t max_readahead, const struct fuse_init_in *in)
 {
-    const uint32_t supported_flags = FUSE_ASYNC_READ | FUSE_ASYNC_DIO;
+    uint64_t supported_flags = FUSE_ASYNC_READ | FUSE_ASYNC_DIO
+                                     | FUSE_INIT_EXT;
+    uint64_t outargflags = 0;
+    uint64_t inargflags = in->flags;
+
+    if (inargflags & FUSE_INIT_EXT) {
+        inargflags = inargflags | (uint64_t) in->flags2 << 32;
+    }
+
+#ifdef CONFIG_LINUX_IO_URING
+    if (exp->is_uring) {
+        if (inargflags & FUSE_OVER_IO_URING) {
+            supported_flags |= FUSE_OVER_IO_URING;
+        } else {
+            exp->is_uring = false;
+            return -EOPNOTSUPP;
+        }
+    }
+#endif
+
+    outargflags = inargflags & supported_flags;

     *out = (struct fuse_init_out) {
         .major = FUSE_KERNEL_VERSION,
         .minor = FUSE_KERNEL_MINOR_VERSION,
         .max_readahead = max_readahead,
         .max_write = FUSE_MAX_WRITE_BYTES,
-        .flags = flags & supported_flags,
-        .flags2 = 0,
+        .flags = outargflags,
+        .flags2 = outargflags >> 32,

         /* libfuse maximum: 2^16 - 1 */
         .max_background = UINT16_MAX,
@@ -1404,11 +1760,24 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
         req_id = in_hdr->unique;
     }

+#ifdef CONFIG_LINUX_IO_URING
+    /*
+     * Enable FUSE-over-io_uring mode if supported.
+     * FUSE_INIT is only handled in legacy mode.
+     * Failure returns -EOPNOTSUPP; success switches to io_uring path.
+     */
+    bool uring_initially_enabled = false;
+
+    if (unlikely(opcode == FUSE_INIT)) {
+        uring_initially_enabled = exp->is_uring;
+    }
+#endif
+
     switch (opcode) {
     case FUSE_INIT: {
         const struct fuse_init_in *in = FUSE_IN_OP_STRUCT(init, q);
         ret = fuse_co_init(exp, FUSE_OUT_OP_STRUCT(init, out_buf),
-                           in->max_readahead, in->flags);
+                           in->max_readahead, in);
         break;
     }

@@ -1513,7 +1882,36 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
     }

     qemu_vfree(spillover_buf);
+
+#ifdef CONFIG_LINUX_IO_URING
+    if (unlikely(opcode == FUSE_INIT) && uring_initially_enabled) {
+        if (exp->is_uring && !exp->uring_started) {
+            /*
+             * Handle FUSE-over-io_uring initialization.
+             * If io_uring mode was requested for this export but it has not
+             * been started yet, start it now.
+             */
+            struct fuse_init_out *out = FUSE_OUT_OP_STRUCT(init, out_buf);
+            fuse_uring_start(exp, out);
+        } else if (ret == -EOPNOTSUPP) {
+            /*
+             * If io_uring was requested but the kernel does not support it,
+             * halt the export.
+             */
+            error_report("System doesn't support FUSE-over-io_uring");
+            fuse_export_halt(exp);
+        }
+    }
+#endif
+}
+
+#ifdef CONFIG_LINUX_IO_URING
+static void coroutine_fn fuse_uring_co_process_request(FuseUringEnt *ent)
+{
+    /* TODO */
+    (void)ent;
 }
+#endif /* CONFIG_LINUX_IO_URING */

 const BlockExportDriver blk_exp_fuse = {
     .type               = BLOCK_EXPORT_TYPE_FUSE,
--
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Patch v4 3/7] fuse: uring support for write ops
  2026-02-07 12:08 [PATCH v4 0/7] add fuse-over-io_uring support Brian Song
  2026-02-07 12:08 ` [Patch v4 1/7] aio-posix: enable 128-byte SQEs Brian Song
  2026-02-07 12:08 ` [Patch v4 2/7] fuse: io_uring mode init Brian Song
@ 2026-02-07 12:08 ` Brian Song
  2026-02-11 21:08   ` Stefan Hajnoczi
  2026-02-07 12:08 ` [Patch v4 4/7] fuse: refactor FUSE request handler Brian Song
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Brian Song @ 2026-02-07 12:08 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, hibriansong, hreitz, kwolf, eblake, armbru, stefanha,
	fam, bernd

The payload of each uring entry serves as the buffer holding file data
for read/write operations. In this patch, fuse_co_write is refactored
to support using different buffer sources for write operations in
legacy and io_uring modes.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Brian Song <hibriansong@gmail.com>
---
 block/export/fuse.c | 55 ++++++++++++++++++++++++++++++---------------
 1 file changed, 37 insertions(+), 18 deletions(-)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index ae7490b2a1..867752555a 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -1299,6 +1299,9 @@ fuse_co_read(FuseExport *exp, void **bufptr, uint64_t offset, uint32_t size)
  * Data in @in_place_buf is assumed to be overwritten after yielding, so will
  * be copied to a bounce buffer beforehand.  @spillover_buf in contrast is
  * assumed to be exclusively owned and will be used as-is.
+ * In FUSE-over-io_uring mode, the actual req_payload content is stored in
+ * @spillover_buf. To ensure this buffer is used for writing, @in_place_buf
+ * is explicitly set to NULL.
  * Return the number of bytes written to *out on success, and -errno on error.
  */
 static ssize_t coroutine_fn
@@ -1306,8 +1309,8 @@ fuse_co_write(FuseExport *exp, struct fuse_write_out *out,
               uint64_t offset, uint32_t size,
               const void *in_place_buf, const void *spillover_buf)
 {
-    size_t in_place_size;
-    void *copied;
+    size_t in_place_size = 0;
+    void *copied = NULL;
     int64_t blk_len;
     int ret;
     struct iovec iov[2];
@@ -1322,10 +1325,12 @@ fuse_co_write(FuseExport *exp, struct fuse_write_out *out,
         return -EACCES;
     }

-    /* Must copy to bounce buffer before potentially yielding */
-    in_place_size = MIN(size, FUSE_IN_PLACE_WRITE_BYTES);
-    copied = blk_blockalign(exp->common.blk, in_place_size);
-    memcpy(copied, in_place_buf, in_place_size);
+    if (in_place_buf) {
+        /* Must copy to bounce buffer before potentially yielding */
+        in_place_size = MIN(size, FUSE_IN_PLACE_WRITE_BYTES);
+        copied = blk_blockalign(exp->common.blk, in_place_size);
+        memcpy(copied, in_place_buf, in_place_size);
+    }

     /**
      * Clients will expect short writes at EOF, so we have to limit
@@ -1349,26 +1354,38 @@ fuse_co_write(FuseExport *exp, struct fuse_write_out *out,
         }
     }

-    iov[0] = (struct iovec) {
-        .iov_base = copied,
-        .iov_len = in_place_size,
-    };
-    if (size > FUSE_IN_PLACE_WRITE_BYTES) {
-        assert(size - FUSE_IN_PLACE_WRITE_BYTES <= FUSE_SPILLOVER_BUF_SIZE);
-        iov[1] = (struct iovec) {
-            .iov_base = (void *)spillover_buf,
-            .iov_len = size - FUSE_IN_PLACE_WRITE_BYTES,
+    if (in_place_buf) {
+        iov[0] = (struct iovec) {
+            .iov_base = copied,
+            .iov_len = in_place_size,
         };
-        qemu_iovec_init_external(&qiov, iov, 2);
+        if (size > FUSE_IN_PLACE_WRITE_BYTES) {
+            assert(size - FUSE_IN_PLACE_WRITE_BYTES <= FUSE_SPILLOVER_BUF_SIZE);
+            iov[1] = (struct iovec) {
+                .iov_base = (void *)spillover_buf,
+                .iov_len = size - FUSE_IN_PLACE_WRITE_BYTES,
+            };
+            qemu_iovec_init_external(&qiov, iov, 2);
+        } else {
+            qemu_iovec_init_external(&qiov, iov, 1);
+        }
     } else {
+        /* fuse over io_uring */
+        iov[0] = (struct iovec) {
+            .iov_base = (void *)spillover_buf,
+            .iov_len = size,
+        };
         qemu_iovec_init_external(&qiov, iov, 1);
     }
+
     ret = blk_co_pwritev(exp->common.blk, offset, size, &qiov, 0);
     if (ret < 0) {
         goto fail_free_buffer;
     }

-    qemu_vfree(copied);
+    if (in_place_buf) {
+        qemu_vfree(copied);
+    }

     *out = (struct fuse_write_out) {
         .size = size,
@@ -1376,7 +1393,9 @@ fuse_co_write(FuseExport *exp, struct fuse_write_out *out,
     return sizeof(*out);

 fail_free_buffer:
-    qemu_vfree(copied);
+    if (in_place_buf) {
+        qemu_vfree(copied);
+    }
     return ret;
 }

--
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Patch v4 4/7] fuse: refactor FUSE request handler
  2026-02-07 12:08 [PATCH v4 0/7] add fuse-over-io_uring support Brian Song
                   ` (2 preceding siblings ...)
  2026-02-07 12:08 ` [Patch v4 3/7] fuse: uring support for write ops Brian Song
@ 2026-02-07 12:08 ` Brian Song
  2026-02-11 21:21   ` Stefan Hajnoczi
  2026-02-07 12:08 ` [Patch v4 5/6] fuse: safe termination for io_uring Brian Song
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 21+ messages in thread
From: Brian Song @ 2026-02-07 12:08 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, hibriansong, hreitz, kwolf, eblake, armbru, stefanha,
	fam, bernd

This patch implements the CQE handler for FUSE-over-io_uring. Upon
receiving a FUSE request via a Completion Queue Entry (CQE), the
handler processes the request and submits the response back to the
kernel via the FUSE_IO_URING_CMD_COMMIT_AND_FETCH command.

Additionally, the request processing logic shared between legacy and
io_uring modes has been extracted into fuse_co_process_request_common().
The execution flow now dispatches requests to the appropriate
mode-specific logic based on the uring_started flag.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Brian Song <hibriansong@gmail.com>
---
 block/export/fuse.c | 400 +++++++++++++++++++++++++++++++++-----------
 1 file changed, 301 insertions(+), 99 deletions(-)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index 867752555a..c117e081cd 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -138,8 +138,8 @@ struct FuseQueue {
      * FUSE_MIN_READ_BUFFER (from linux/fuse.h) bytes.
      * This however is just the first part of the buffer; every read is given
      * a vector of this buffer (which should be enough for all normal requests,
-     * which we check via the static assertion in FUSE_IN_OP_STRUCT()) and the
-     * spill-over buffer below.
+     * which we check via the static assertion in FUSE_IN_OP_STRUCT_LEGACY())
+     * and the spill-over buffer below.
      * Therefore, the size of this buffer plus FUSE_SPILLOVER_BUF_SIZE must be
      * FUSE_MIN_READ_BUFFER or more (checked via static assertion below).
      */
@@ -912,6 +912,7 @@ static void coroutine_fn co_read_from_fuse_fd(void *opaque)
     }

     fuse_co_process_request(q, spillover_buf);
+    qemu_vfree(spillover_buf);

 no_request:
     fuse_dec_in_flight(exp);
@@ -1684,100 +1685,75 @@ static int fuse_write_buf_response(int fd, uint32_t req_id,
 }

 /*
- * For use in fuse_co_process_request():
+ * For use in fuse_co_process_request_common():
  * Returns a pointer to the parameter object for the given operation (inside of
- * queue->request_buf, which is assumed to hold a fuse_in_header first).
- * Verifies that the object is complete (queue->request_buf is large enough to
- * hold it in one piece, and the request length includes the whole object).
+ * in_buf, which is assumed to hold a fuse_in_header first).
+ * Verifies that the object is complete (in_buf is large enough to hold it in
+ * one piece, and the request length includes the whole object).
+ * Only performs verification for legacy FUSE.
  *
  * Note that queue->request_buf may be overwritten after yielding, so the
  * returned pointer must not be used across a function that may yield!
  */
-#define FUSE_IN_OP_STRUCT(op_name, queue) \
+#define FUSE_IN_OP_STRUCT_LEGACY(op_name, queue) \
     ({ \
         const struct fuse_in_header *__in_hdr = \
             (const struct fuse_in_header *)(queue)->request_buf; \
         const struct fuse_##op_name##_in *__in = \
             (const struct fuse_##op_name##_in *)(__in_hdr + 1); \
         const size_t __param_len = sizeof(*__in_hdr) + sizeof(*__in); \
-        uint32_t __req_len; \
         \
-        QEMU_BUILD_BUG_ON(sizeof((queue)->request_buf) < __param_len); \
+        QEMU_BUILD_BUG_ON(sizeof((queue)->request_buf) < \
+                          (sizeof(struct fuse_in_header) + \
+                           sizeof(struct fuse_##op_name##_in))); \
         \
-        __req_len = __in_hdr->len; \
+        uint32_t __req_len = __in_hdr->len; \
         if (__req_len < __param_len) { \
             warn_report("FUSE request truncated (%" PRIu32 " < %zu)", \
                         __req_len, __param_len); \
             ret = -EINVAL; \
-            break; \
+            __in = NULL; \
         } \
         __in; \
     })

 /*
- * For use in fuse_co_process_request():
+ * For use in fuse_co_process_request_common():
  * Returns a pointer to the return object for the given operation (inside of
  * out_buf, which is assumed to hold a fuse_out_header first).
- * Verifies that out_buf is large enough to hold the whole object.
+ * Only performs verification for legacy FUSE.
+ * Note: Buffer size verification is done via static assertions in the caller
+ * (fuse_co_process_request) where out_buf is a local array.
  *
- * (out_buf should be a char[] array.)
+ * (out_buf should be a char[] array in the caller.)
  */
-#define FUSE_OUT_OP_STRUCT(op_name, out_buf) \
+#define FUSE_OUT_OP_STRUCT_LEGACY(op_name, out_buf) \
     ({ \
         struct fuse_out_header *__out_hdr = \
             (struct fuse_out_header *)(out_buf); \
         struct fuse_##op_name##_out *__out = \
             (struct fuse_##op_name##_out *)(__out_hdr + 1); \
         \
-        QEMU_BUILD_BUG_ON(sizeof(*__out_hdr) + sizeof(*__out) > \
-                          sizeof(out_buf)); \
-        \
         __out; \
     })

 /**
- * Process a FUSE request, incl. writing the response.
- *
- * Note that yielding in any request-processing function can overwrite the
- * contents of q->request_buf.  Anything that takes a buffer needs to take
- * care that the content is copied before yielding.
- *
- * @spillover_buf can contain the tail of a write request too large to fit into
- * q->request_buf.  This function takes ownership of it (i.e. will free it),
- * which assumes that its contents will not be overwritten by concurrent
- * requests (as opposed to q->request_buf).
+ * Shared helper for FUSE request processing. Handles both legacy and io_uring
+ * paths.
  */
-static void coroutine_fn
-fuse_co_process_request(FuseQueue *q, void *spillover_buf)
+static void coroutine_fn fuse_co_process_request_common(
+    FuseExport *exp,
+    uint32_t opcode,
+    uint64_t req_id,
+    void *in_buf,
+    void *spillover_buf,
+    void *out_buf,
+    void (*send_response)(void *opaque, uint32_t req_id, int ret,
+                          const void *buf, void *out_buf),
+    void *opaque /* FuseQueue* or FuseUringEnt* */)
 {
-    FuseExport *exp = q->exp;
-    uint32_t opcode;
-    uint64_t req_id;
-    /*
-     * Return buffer.  Must be large enough to hold all return headers, but does
-     * not include space for data returned by read requests.
-     * (FUSE_IN_OP_STRUCT() verifies at compile time that out_buf is indeed
-     * large enough.)
-     */
-    char out_buf[sizeof(struct fuse_out_header) +
-                 MAX_CONST(sizeof(struct fuse_init_out),
-                 MAX_CONST(sizeof(struct fuse_open_out),
-                 MAX_CONST(sizeof(struct fuse_attr_out),
-                 MAX_CONST(sizeof(struct fuse_write_out),
-                           sizeof(struct fuse_lseek_out)))))];
-    struct fuse_out_header *out_hdr = (struct fuse_out_header *)out_buf;
-    /* For read requests: Data to be returned */
     void *out_data_buffer = NULL;
-    ssize_t ret;
-
-    /* Limit scope to ensure pointer is no longer used after yielding */
-    {
-        const struct fuse_in_header *in_hdr =
-            (const struct fuse_in_header *)q->request_buf;
-
-        opcode = in_hdr->opcode;
-        req_id = in_hdr->unique;
-    }
+    int ret = 0;

 #ifdef CONFIG_LINUX_IO_URING
     /*
@@ -1794,15 +1770,32 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)

     switch (opcode) {
     case FUSE_INIT: {
-        const struct fuse_init_in *in = FUSE_IN_OP_STRUCT(init, q);
-        ret = fuse_co_init(exp, FUSE_OUT_OP_STRUCT(init, out_buf),
-                           in->max_readahead, in);
+        FuseQueue *q = opaque;
+        const struct fuse_init_in *in =
+            FUSE_IN_OP_STRUCT_LEGACY(init, q);
+        if (!in) {
+            break;
+        }
+
+        struct fuse_init_out *out =
+            FUSE_OUT_OP_STRUCT_LEGACY(init, out_buf);
+
+        ret = fuse_co_init(exp, out, in->max_readahead, in);
         break;
     }

-    case FUSE_OPEN:
-        ret = fuse_co_open(exp, FUSE_OUT_OP_STRUCT(open, out_buf));
+    case FUSE_OPEN: {
+        struct fuse_open_out *out;
+
+        if (exp->uring_started) {
+            out = out_buf;
+        } else {
+            out = FUSE_OUT_OP_STRUCT_LEGACY(open, out_buf);
+        }
+
+        ret = fuse_co_open(exp, out);
         break;
+    }

     case FUSE_RELEASE:
         ret = 0;
@@ -1812,37 +1805,105 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
         ret = -ENOENT; /* There is no node but the root node */
         break;

-    case FUSE_GETATTR:
-        ret = fuse_co_getattr(exp, FUSE_OUT_OP_STRUCT(attr, out_buf));
+    case FUSE_GETATTR: {
+        struct fuse_attr_out *out;
+
+        if (exp->uring_started) {
+            out = out_buf;
+        } else {
+            out = FUSE_OUT_OP_STRUCT_LEGACY(attr, out_buf);
+        }
+
+        ret = fuse_co_getattr(exp, out);
         break;
+    }

     case FUSE_SETATTR: {
-        const struct fuse_setattr_in *in = FUSE_IN_OP_STRUCT(setattr, q);
-        ret = fuse_co_setattr(exp, FUSE_OUT_OP_STRUCT(attr, out_buf),
-                              in->valid, in->size, in->mode, in->uid, in->gid);
+        const struct fuse_setattr_in *in;
+        struct fuse_attr_out *out;
+
+        if (exp->uring_started) {
+            in = in_buf;
+            out = out_buf;
+        } else {
+            FuseQueue *q = opaque;
+            in = FUSE_IN_OP_STRUCT_LEGACY(setattr, q);
+            if (!in) {
+                break;
+            }
+
+            out = FUSE_OUT_OP_STRUCT_LEGACY(attr, out_buf);
+        }
+
+        ret = fuse_co_setattr(exp, out, in->valid, in->size, in->mode,
+                              in->uid, in->gid);
         break;
     }

     case FUSE_READ: {
-        const struct fuse_read_in *in = FUSE_IN_OP_STRUCT(read, q);
+        const struct fuse_read_in *in;
+
+        if (exp->uring_started) {
+            in = in_buf;
+        } else {
+            FuseQueue *q = opaque;
+            in = FUSE_IN_OP_STRUCT_LEGACY(read, q);
+            if (!in) {
+                break;
+            }
+        }
+
         ret = fuse_co_read(exp, &out_data_buffer, in->offset, in->size);
         break;
     }

     case FUSE_WRITE: {
-        const struct fuse_write_in *in = FUSE_IN_OP_STRUCT(write, q);
-        uint32_t req_len;
-
-        req_len = ((const struct fuse_in_header *)q->request_buf)->len;
-        if (unlikely(req_len < sizeof(struct fuse_in_header) + sizeof(*in) +
-                               in->size)) {
-            warn_report("FUSE WRITE truncated; received %zu bytes of %" PRIu32,
-                        req_len - sizeof(struct fuse_in_header) - sizeof(*in),
-                        in->size);
-            ret = -EINVAL;
-            break;
-        }
+        const struct fuse_write_in *in;
+        struct fuse_write_out *out;
+        const void *in_place_buf;
+        const void *spill_buf;
+
+        if (exp->uring_started) {
+            FuseUringEnt *ent = opaque;
+
+            in = in_buf;
+            out = out_buf;
+
+            assert(in->size <= ent->req_header.ring_ent_in_out.payload_sz);

+            /*
+             * In uring mode, the "out_buf" (ent->payload) actually holds the
+             * input data for WRITE requests.
+             */
+            in_place_buf = NULL;
+            spill_buf = out_buf;
+        } else {
+            FuseQueue *q = opaque;
+            in = FUSE_IN_OP_STRUCT_LEGACY(write, q);
+            if (!in) {
+                break;
+            }
+
+            out = FUSE_OUT_OP_STRUCT_LEGACY(write, out_buf);
+
+            /* Additional check for WRITE: verify the request includes data */
+            uint32_t req_len =
+                ((const struct fuse_in_header *)(q->request_buf))->len;
+
+            if (unlikely(req_len < sizeof(struct fuse_in_header) + sizeof(*in) +
+                        in->size)) {
+                warn_report("FUSE WRITE truncated; received %zu bytes of %"
+                    PRIu32,
+                    req_len - sizeof(struct fuse_in_header) - sizeof(*in),
+                    in->size);
+                ret = -EINVAL;
+                break;
+            }
+
+            /* Legacy buffer setup */
+            in_place_buf = in + 1;
+            spill_buf = spillover_buf;
+        }
         /*
          * poll_fuse_fd() has checked that in_hdr->len matches the number of
          * bytes read, which cannot exceed the max_write value we set
@@ -1856,13 +1917,24 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
          * fuse_co_write() takes care to copy its contents before potentially
          * yielding.
          */
-        ret = fuse_co_write(exp, FUSE_OUT_OP_STRUCT(write, out_buf),
-                            in->offset, in->size, in + 1, spillover_buf);
+        ret = fuse_co_write(exp, out, in->offset, in->size,
+                            in_place_buf, spill_buf);
         break;
     }

     case FUSE_FALLOCATE: {
-        const struct fuse_fallocate_in *in = FUSE_IN_OP_STRUCT(fallocate, q);
+        const struct fuse_fallocate_in *in;
+
+        if (exp->uring_started) {
+            in = in_buf;
+        } else {
+            FuseQueue *q = opaque;
+            in = FUSE_IN_OP_STRUCT_LEGACY(fallocate, q);
+            if (!in) {
+                break;
+            }
+        }
+
         ret = fuse_co_fallocate(exp, in->offset, in->length, in->mode);
         break;
     }
@@ -1877,9 +1949,23 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)

 #ifdef CONFIG_FUSE_LSEEK
     case FUSE_LSEEK: {
-        const struct fuse_lseek_in *in = FUSE_IN_OP_STRUCT(lseek, q);
-        ret = fuse_co_lseek(exp, FUSE_OUT_OP_STRUCT(lseek, out_buf),
-                            in->offset, in->whence);
+        const struct fuse_lseek_in *in;
+        struct fuse_lseek_out *out;
+
+        if (exp->uring_started) {
+            in = in_buf;
+            out = out_buf;
+        } else {
+            FuseQueue *q = opaque;
+            in = FUSE_IN_OP_STRUCT_LEGACY(lseek, q);
+            if (!in) {
+                break;
+            }
+
+            out = FUSE_OUT_OP_STRUCT_LEGACY(lseek, out_buf);
+        }
+
+        ret = fuse_co_lseek(exp, out, in->offset, in->whence);
         break;
     }
 #endif
@@ -1888,20 +1974,12 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
         ret = -ENOSYS;
     }

-    /* Ignore errors from fuse_write*(), nothing we can do anyway */
+    send_response(opaque, req_id, ret, out_data_buffer, out_buf);
+
     if (out_data_buffer) {
-        assert(ret >= 0);
-        fuse_write_buf_response(q->fuse_fd, req_id, out_hdr,
-                                out_data_buffer, ret);
         qemu_vfree(out_data_buffer);
-    } else {
-        fuse_write_response(q->fuse_fd, req_id, out_hdr,
-                            ret < 0 ? ret : 0,
-                            ret < 0 ? 0 : ret);
     }

-    qemu_vfree(spillover_buf);
-
 #ifdef CONFIG_LINUX_IO_URING
     if (unlikely(opcode == FUSE_INIT) && uring_initially_enabled) {
         if (exp->is_uring && !exp->uring_started) {
@@ -1910,7 +1988,8 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
              * If io_uring mode was requested for this export but it has not
              * been started yet, start it now.
              */
-            struct fuse_init_out *out = FUSE_OUT_OP_STRUCT(init, out_buf);
+            struct fuse_init_out *out =
+                FUSE_OUT_OP_STRUCT_LEGACY(init, out_buf);
             fuse_uring_start(exp, out);
         } else if (ret == -EOPNOTSUPP) {
             /*
@@ -1923,12 +2002,135 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
     }
 #endif
 }
+/* Helper to send response for legacy */
+static void send_response_legacy(void *opaque, uint32_t req_id, int ret,
+                                 const void *buf, void *out_buf)
+{
+    FuseQueue *q = (FuseQueue *)opaque;
+    struct fuse_out_header *out_hdr = (struct fuse_out_header *)out_buf;
+    if (buf) {
+        assert(ret >= 0);
+        fuse_write_buf_response(q->fuse_fd, req_id, out_hdr, buf, ret);
+    } else {
+        fuse_write_response(q->fuse_fd, req_id, out_hdr,
+                            ret < 0 ? ret : 0,
+                            ret < 0 ? 0 : ret);
+    }
+}
+
+static void coroutine_fn
+fuse_co_process_request(FuseQueue *q, void *spillover_buf)
+{
+    FuseExport *exp = q->exp;
+    uint32_t opcode;
+    uint64_t req_id;
+
+    /*
+     * Return buffer.  Must be large enough to hold all return headers, but does
+     * not include space for data returned by read requests.
+     */
+    char out_buf[sizeof(struct fuse_out_header) +
+        MAX_CONST(sizeof(struct fuse_init_out),
+        MAX_CONST(sizeof(struct fuse_open_out),
+        MAX_CONST(sizeof(struct fuse_attr_out),
+        MAX_CONST(sizeof(struct fuse_write_out),
+                  sizeof(struct fuse_lseek_out)))))] = {0};
+
+    /* Verify that out_buf is large enough for all output structures */
+    QEMU_BUILD_BUG_ON(sizeof(struct fuse_out_header) +
+        sizeof(struct fuse_init_out) > sizeof(out_buf));
+    QEMU_BUILD_BUG_ON(sizeof(struct fuse_out_header) +
+        sizeof(struct fuse_open_out) > sizeof(out_buf));
+    QEMU_BUILD_BUG_ON(sizeof(struct fuse_out_header) +
+        sizeof(struct fuse_attr_out) > sizeof(out_buf));
+    QEMU_BUILD_BUG_ON(sizeof(struct fuse_out_header) +
+        sizeof(struct fuse_write_out) > sizeof(out_buf));
+#ifdef CONFIG_FUSE_LSEEK
+    QEMU_BUILD_BUG_ON(sizeof(struct fuse_out_header) +
+        sizeof(struct fuse_lseek_out) > sizeof(out_buf));
+#endif
+
+    /* Limit scope to ensure pointer is no longer used after yielding */
+    {
+        const struct fuse_in_header *in_hdr =
+            (const struct fuse_in_header *)q->request_buf;
+
+        opcode = in_hdr->opcode;
+        req_id = in_hdr->unique;
+    }
+
+    fuse_co_process_request_common(exp, opcode, req_id, NULL, spillover_buf,
+                                   out_buf, send_response_legacy, q);
+}

 #ifdef CONFIG_LINUX_IO_URING
+static void fuse_uring_prep_sqe_commit(struct io_uring_sqe *sqe, void *opaque)
+{
+    FuseUringEnt *ent = opaque;
+    struct fuse_uring_cmd_req *req = (void *)&sqe->cmd[0];
+
+    ent->last_cmd = FUSE_IO_URING_CMD_COMMIT_AND_FETCH;
+
+    fuse_uring_sqe_prepare(sqe, ent->rq->q, ent->last_cmd);
+    fuse_uring_sqe_set_req_data(req, ent->rq->rqid, ent->req_commit_id);
+}
+
+static void
+fuse_uring_send_response(FuseUringEnt *ent, uint32_t req_id, int ret,
+                         const void *out_data_buffer)
+{
+    FuseExport *exp = ent->rq->q->exp;
+
+    struct fuse_uring_req_header *rrh = &ent->req_header;
+    struct fuse_out_header *out_header = (struct fuse_out_header *)&rrh->in_out;
+    struct fuse_uring_ent_in_out *ent_in_out =
+        (struct fuse_uring_ent_in_out *)&rrh->ring_ent_in_out;
+
+    /* FUSE_READ */
+    if (out_data_buffer && ret > 0) {
+        memcpy(ent->req_payload, out_data_buffer, ret);
+    }
+
+    out_header->error  = ret < 0 ? ret : 0;
+    out_header->unique = req_id;
+    ent_in_out->payload_sz = ret > 0 ? ret : 0;
+
+    /* Commit and fetch a uring entry */
+    blk_exp_ref(&exp->common);
+    aio_add_sqe(fuse_uring_prep_sqe_commit, ent, &ent->fuse_cqe_handler);
+}
+
+/* Helper to send response for uring */
+static void send_response_uring(void *opaque, uint32_t req_id, int ret,
+                                const void *out_data_buffer, void *payload)
+{
+    FuseUringEnt *ent = (FuseUringEnt *)opaque;
+
+    fuse_uring_send_response(ent, req_id, ret, out_data_buffer);
+}
+
 static void coroutine_fn fuse_uring_co_process_request(FuseUringEnt *ent)
 {
-    /* TODO */
-    (void)ent;
+    FuseExport *exp = ent->rq->q->exp;
+    struct fuse_uring_req_header *rrh = &ent->req_header;
+    struct fuse_uring_ent_in_out *ent_in_out =
+        (struct fuse_uring_ent_in_out *)&rrh->ring_ent_in_out;
+    struct fuse_in_header *in_hdr =
+        (struct fuse_in_header *)&rrh->in_out;
+    uint32_t opcode = in_hdr->opcode;
+    uint64_t req_id = in_hdr->unique;
+
+    ent->req_commit_id = ent_in_out->commit_id;
+
+    if (unlikely(ent->req_commit_id == 0)) {
+        error_report("If this happens kernel will not find the response - "
+            "it will be stuck forever - better to abort immediately.");
+        fuse_export_halt(exp);
+        return;
+    }
+
+    fuse_co_process_request_common(exp, opcode, req_id, &rrh->op_in,
+        NULL, ent->req_payload, send_response_uring, ent);
 }
 #endif /* CONFIG_LINUX_IO_URING */

--
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Patch v4 5/6] fuse: safe termination for io_uring
  2026-02-07 12:08 [PATCH v4 0/7] add fuse-over-io_uring support Brian Song
                   ` (3 preceding siblings ...)
  2026-02-07 12:08 ` [Patch v4 4/7] fuse: refactor FUSE request handler Brian Song
@ 2026-02-07 12:08 ` Brian Song
  2026-02-11 21:52   ` Stefan Hajnoczi
  2026-02-07 12:09 ` [Patch v4 6/7] fuse: add 'io-uring' option Brian Song
  2026-02-07 12:09 ` [Patch v4 7/7] fuse: add io_uring test support Brian Song
  6 siblings, 1 reply; 21+ messages in thread
From: Brian Song @ 2026-02-07 12:08 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, hibriansong, hreitz, kwolf, eblake, armbru, stefanha,
	fam, bernd

When a termination signal is received, the storage-export-daemon stops
the export, exits the main loop (main_loop_wait), and begins resource
cleanup. However, some FUSE_IO_URING_CMD_COMMIT_AND_FETCH SQEs may
remain pending in the kernel, waiting for incoming FUSE requests.

Currently, there is no way to manually cancel these pending CQEs in the
kernel. As a result, after export termination, the related data
structures might be deleted before the pending CQEs return, causing the
CQE handler to be invoked after it has been freed, which may lead to a
segfault.

As a workaround, when submitting an SQE to the kernel, we increment the
block reference (blk_exp_ref) to prevent the CQE handler from being
deleted during export termination. Once the CQE is received, we
decrement the reference (blk_exp_unref).

However, this introduces a new issue: if no new FUSE requests arrive,
the pending SQEs held by the kernel will never complete. Consequently,
the export reference count never drops to zero, preventing the export
from shutting down cleanly.

To resolve this, we schedule a Bottom Half (BH) for each FUSE queue
during the export shutdown phase. The BH closes the fuse_fd to prevent
race conditions, while the session is unmounted during the remainder of
the shutdown sequence. This explicitly aborts all pending SQEs in the
kernel, forcing the corresponding CQEs to return. This triggers the
release of held references, allowing the export to be freed safely.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Brian Song <hibriansong@gmail.com>
---
 block/export/fuse.c | 100 +++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 90 insertions(+), 10 deletions(-)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index c117e081cd..abae83041b 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -934,6 +934,57 @@ static void read_from_fuse_fd(void *opaque)
     qemu_coroutine_enter(co);
 }

+#ifdef CONFIG_LINUX_IO_URING
+static void fuse_export_delete_uring(FuseExport *exp)
+{
+    exp->is_uring = false;
+    exp->uring_started = false;
+
+    for (int i = 0; i < exp->num_uring_queues; i++) {
+        FuseUringQueue *rq = &exp->uring_queues[i];
+
+        for (int j = 0; j < FUSE_DEFAULT_URING_QUEUE_DEPTH; j++) {
+            g_free(rq->ent[j].req_payload);
+        }
+        g_free(rq->ent);
+    }
+
+    g_free(exp->uring_queues);
+}
+#endif
+
+/**
+ * The Linux kernel currently lacks support for asynchronous cancellation
+ * of FUSE-over-io_uring SQEs. This can lead to a race where an IOThread may
+ * access fuse_fd after it is closed but before pending SQEs are canceled,
+ * potentially operating on a newly reused file descriptor.
+ *
+ * Therefore, schedule a BH in the IOThread to close and invalidate fuse_fd,
+ * to avoid races on fuse_fd.
+ */
+#ifdef CONFIG_LINUX_IO_URING
+static void close_fuse_fd(void *opaque)
+{
+    FuseQueue *q = opaque;
+
+    if (q->fuse_fd >= 0) {
+        close(q->fuse_fd);
+        q->fuse_fd = -1;
+    }
+}
+#endif
+
+/**
+ * During exit in FUSE-over-io_uring mode, qemu-storage-daemon requests
+ * shutdown in main() and then immediately tears down the block export.
+ * However, SQEs already submitted under FUSE-over-io_uring may still complete
+ * and generate CQEs that continue to hold references to the block export,
+ * preventing it from being freed cleanly.
+ *
+ * Since the Linux kernel currently lacks support for asynchronous cancellation
+ * of FUSE-over-io_uring SQEs, this function aborts the connection and cancels
+ * all pending SQEs to ensure a safe teardown.
+ */
 static void fuse_export_shutdown(BlockExport *blk_exp)
 {
     FuseExport *exp = container_of(blk_exp, FuseExport, common);
@@ -949,18 +1000,42 @@ static void fuse_export_shutdown(BlockExport *blk_exp)
          */
         g_hash_table_remove(exports, exp->mountpoint);
     }
+
+#ifdef CONFIG_LINUX_IO_URING
+    if (exp->uring_started) {
+        for (size_t i = 0; i < exp->num_fuse_queues; i++) {
+            FuseQueue *q = &exp->queues[i];
+
+            /* Queue 0's FD belongs to the FUSE session */
+            if (i > 0) {
+                aio_bh_schedule_oneshot(q->ctx, close_fuse_fd, q);
+            }
+        }
+
+        /* To cancel all pending SQEs */
+        if (exp->fuse_session) {
+            if (exp->mounted) {
+                fuse_session_unmount(exp->fuse_session);
+            }
+            fuse_session_destroy(exp->fuse_session);
+        }
+        g_free(exp->mountpoint);
+    }
+#endif
 }

 static void fuse_export_delete(BlockExport *blk_exp)
 {
     FuseExport *exp = container_of(blk_exp, FuseExport, common);

-    for (int i = 0; i < exp->num_fuse_queues; i++) {
+    for (size_t i = 0; i < exp->num_fuse_queues; i++) {
         FuseQueue *q = &exp->queues[i];

-        /* Queue 0's FD belongs to the FUSE session */
-        if (i > 0 && q->fuse_fd >= 0) {
-            close(q->fuse_fd);
+        if (!exp->uring_started) {
+            /* Queue 0's FD belongs to the FUSE session */
+            if (i > 0 && q->fuse_fd >= 0) {
+                close(q->fuse_fd);
+            }
         }
         if (q->spillover_buf) {
             qemu_vfree(q->spillover_buf);
@@ -968,15 +1043,20 @@ static void fuse_export_delete(BlockExport *blk_exp)
     }
     g_free(exp->queues);

-    if (exp->fuse_session) {
-        if (exp->mounted) {
-            fuse_session_unmount(exp->fuse_session);
+    if (exp->uring_started) {
+#ifdef CONFIG_LINUX_IO_URING
+        fuse_export_delete_uring(exp);
+#endif
+    } else {
+        if (exp->fuse_session) {
+            if (exp->mounted) {
+                fuse_session_unmount(exp->fuse_session);
+            }
+            fuse_session_destroy(exp->fuse_session);
         }

-        fuse_session_destroy(exp->fuse_session);
+        g_free(exp->mountpoint);
     }
-
-    g_free(exp->mountpoint);
 }

 /**
--
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Patch v4 6/7] fuse: add 'io-uring' option
  2026-02-07 12:08 [PATCH v4 0/7] add fuse-over-io_uring support Brian Song
                   ` (4 preceding siblings ...)
  2026-02-07 12:08 ` [Patch v4 5/6] fuse: safe termination for io_uring Brian Song
@ 2026-02-07 12:09 ` Brian Song
  2026-02-09  5:24   ` Markus Armbruster
  2026-02-11 22:50   ` Stefan Hajnoczi
  2026-02-07 12:09 ` [Patch v4 7/7] fuse: add io_uring test support Brian Song
  6 siblings, 2 replies; 21+ messages in thread
From: Brian Song @ 2026-02-07 12:09 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, hibriansong, hreitz, kwolf, eblake, armbru, stefanha,
	fam, bernd

This patch adds a new storage export option for storage-export-daemon
to enable FUSE-over-io_uring via 'io-uring=on|off' (default: off).

The initialization phase performs a protocol handshake via the legacy
/dev/fuse interface before transitioning to the io_uring mode.
If multiple IOThreads are configured, the export distributes the uring
queues to handle requests concurrently.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Brian Song <hibriansong@gmail.com>
---
 block/export/fuse.c                  | 3 +--
 docs/tools/qemu-storage-daemon.rst   | 7 +++++--
 qapi/block-export.json               | 5 ++++-
 storage-daemon/qemu-storage-daemon.c | 1 +
 4 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/block/export/fuse.c b/block/export/fuse.c
index abae83041b..09642ccf5a 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -587,8 +587,7 @@ static int fuse_export_create(BlockExport *blk_exp,
     assert(blk_exp_args->type == BLOCK_EXPORT_TYPE_FUSE);

 #ifdef CONFIG_LINUX_IO_URING
-    /* TODO Add FUSE-over-io_uring Option */
-    exp->is_uring = false;
+    exp->is_uring = args->io_uring;
     exp->uring_queue_depth = FUSE_DEFAULT_URING_QUEUE_DEPTH;
 #else
     if (args->io_uring) {
diff --git a/docs/tools/qemu-storage-daemon.rst b/docs/tools/qemu-storage-daemon.rst
index 35ab2d7807..ad0f41a78f 100644
--- a/docs/tools/qemu-storage-daemon.rst
+++ b/docs/tools/qemu-storage-daemon.rst
@@ -78,7 +78,7 @@ Standard options:
 .. option:: --export [type=]nbd,id=<id>,node-name=<node-name>[,name=<export-name>][,writable=on|off][,bitmap=<name>]
   --export [type=]vhost-user-blk,id=<id>,node-name=<node-name>,addr.type=unix,addr.path=<socket-path>[,writable=on|off][,logical-block-size=<block-size>][,num-queues=<num-queues>]
   --export [type=]vhost-user-blk,id=<id>,node-name=<node-name>,addr.type=fd,addr.str=<fd>[,writable=on|off][,logical-block-size=<block-size>][,num-queues=<num-queues>]
-  --export [type=]fuse,id=<id>,node-name=<node-name>,mountpoint=<file>[,growable=on|off][,writable=on|off][,allow-other=on|off|auto]
+  --export [type=]fuse,id=<id>,node-name=<node-name>,mountpoint=<file>[,growable=on|off][,writable=on|off][,allow-other=on|off|auto][,io-uring=on|off]
   --export [type=]vduse-blk,id=<id>,node-name=<node-name>,name=<vduse-name>[,writable=on|off][,num-queues=<num-queues>][,queue-size=<queue-size>][,logical-block-size=<block-size>][,serial=<serial-number>]

   is a block export definition. ``node-name`` is the block node that should be
@@ -111,7 +111,10 @@ Standard options:
   that enabling this option as a non-root user requires enabling the
   user_allow_other option in the global fuse.conf configuration file.  Setting
   ``allow-other`` to auto (the default) will try enabling this option, and on
-  error fall back to disabling it.
+  error fall back to disabling it. Once ``io-uring`` is enabled (off by default),
+  the FUSE-over-io_uring-related settings will be initialized to bypass the
+  traditional /dev/fuse communication mechanism and instead use io_uring to
+  handle FUSE operations.

   The ``vduse-blk`` export type takes a ``name`` (must be unique across the host)
   to create the VDUSE device.
diff --git a/qapi/block-export.json b/qapi/block-export.json
index 9ae703ad01..37f2fc47e2 100644
--- a/qapi/block-export.json
+++ b/qapi/block-export.json
@@ -184,12 +184,15 @@
 #     mount the export with allow_other, and if that fails, try again
 #     without.  (since 6.1; default: auto)
 #
+# @io-uring: Use FUSE-over-io-uring.  (since 10.2; default: false)
+#
 # Since: 6.0
 ##
 { 'struct': 'BlockExportOptionsFuse',
   'data': { 'mountpoint': 'str',
             '*growable': 'bool',
-            '*allow-other': 'FuseExportAllowOther' },
+            '*allow-other': 'FuseExportAllowOther',
+            '*io-uring': 'bool' },
   'if': 'CONFIG_FUSE' }

 ##
diff --git a/storage-daemon/qemu-storage-daemon.c b/storage-daemon/qemu-storage-daemon.c
index eb72561358..0cd4cd2b58 100644
--- a/storage-daemon/qemu-storage-daemon.c
+++ b/storage-daemon/qemu-storage-daemon.c
@@ -107,6 +107,7 @@ static void help(void)
 #ifdef CONFIG_FUSE
 "  --export [type=]fuse,id=<id>,node-name=<node-name>,mountpoint=<file>\n"
 "           [,growable=on|off][,writable=on|off][,allow-other=on|off|auto]\n"
+"           [,io-uring=on|off]"
 "                         export the specified block node over FUSE\n"
 "\n"
 #endif /* CONFIG_FUSE */
--
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Patch v4 7/7] fuse: add io_uring test support
  2026-02-07 12:08 [PATCH v4 0/7] add fuse-over-io_uring support Brian Song
                   ` (5 preceding siblings ...)
  2026-02-07 12:09 ` [Patch v4 6/7] fuse: add 'io-uring' option Brian Song
@ 2026-02-07 12:09 ` Brian Song
  2026-02-11 21:53   ` Stefan Hajnoczi
  6 siblings, 1 reply; 21+ messages in thread
From: Brian Song @ 2026-02-07 12:09 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, hibriansong, hreitz, kwolf, eblake, armbru, stefanha,
	fam, bernd

This patch adds support for testing FUSE-over-io_uring. It can be
enabled by setting the environment variable FUSE_OVER_IO_URING=1.

    $ FUSE_OVER_IO_URING=1 ./check -fuse

Additionally, the unmount detection logic is switched from `df` to
`mount`. Using `df` triggers `statfs()`, which attempts to communicate
with the FUSE daemon. During test teardown (when the daemon is being
killed), this often fails or returns stale data, causing the test to
misjudge the mount status.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Brian Song <hibriansong@gmail.com>
---
 tests/qemu-iotests/check     |  2 ++
 tests/qemu-iotests/common.rc | 47 +++++++++++++++++++++++++-----------
 2 files changed, 35 insertions(+), 14 deletions(-)

diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 545f9ec7bd..c6fa0f9e3d 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -94,6 +94,8 @@ def make_argparser() -> argparse.ArgumentParser:
         mg.add_argument('-' + fmt, dest='imgfmt', action='store_const',
                         const=fmt, help=f'test {fmt}')

+    # To test FUSE-over-io_uring, set the environment variable
+    # FUSE_OVER_IO_URING=1. This applies only when using the 'fuse' protocol
     protocol_list = ['file', 'rbd', 'nbd', 'ssh', 'nfs', 'fuse']
     g_prt = p.add_argument_group(
         '  image protocol options',
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index e977cb4eb6..a3e0ccb3d2 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -546,10 +546,37 @@ _make_test_img()
         # iotests.  The default allow-other=auto has the downside of printing a
         # fusermount error on its first attempt if allow_other is not
         # permissible, which we would need to filter.
-        QSD_NEED_PID=y $QSD \
-              --blockdev file,node-name=export-node,filename=$img_name,discard=unmap \
-              --export fuse,id=fuse-export,node-name=export-node,mountpoint="$export_mp",writable=on,growable=on,allow-other=off \
-              &
+        if [ -n "$FUSE_OVER_IO_URING" ]; then
+            # The current Linux kernel requires registering `nr_cpu` Ring Queues
+            # when FUSE-over-io_uring is enabled. Here, we set half of the
+            # Ring Queues to FUSE Queues (`nr_iothreads`) to test the queue
+            # distribution feature.
+            # See the comments in fuse.c regarding the round-robin distribution
+            # between Ring Queues and FUSE Queues.
+            nr_cpu=$(nproc 2>/dev/null || echo 1)
+            nr_iothreads=$((nr_cpu / 2))
+            if [ $nr_iothreads -lt 1 ]; then
+                nr_iothreads=1
+            fi
+
+            iothread_args=""
+            iothread_export_args=""
+            for ((i=0; i<$nr_iothreads; i++)); do
+                iothread_args="$iothread_args --object iothread,id=iothread$i"
+                iothread_export_args="$iothread_export_args,iothread.$i=iothread$i"
+            done
+
+            QSD_NEED_PID=y $QSD \
+                    $iothread_args \
+                    --blockdev file,node-name=export-node,filename=$img_name,discard=unmap \
+                    --export fuse,id=fuse-export,node-name=export-node,mountpoint="$export_mp",writable=on,growable=on,allow-other=off,io-uring=on$iothread_export_args \
+                &
+        else
+            QSD_NEED_PID=y $QSD \
+                --blockdev file,node-name=export-node,filename=$img_name,discard=unmap \
+                --export fuse,id=fuse-export,node-name=export-node,mountpoint="$export_mp",writable=on,growable=on,allow-other=off \
+                &
+        fi

         pidfile="$QEMU_TEST_DIR/qemu-storage-daemon.pid"

@@ -595,16 +622,8 @@ _rm_test_img()
         # Wait until the mount is gone
         timeout=10 # *0.5 s
         while true; do
-            # Will show the mount point; if the mount is still there,
-            # it will be $img.
-            df_output=$(df "$img" 2>/dev/null)
-
-            # But df may also show an error ("Transpoint endpoint not
-            # connected"), so retry in such cases
-            if [ -n "$df_output" ]; then
-                if ! echo "$df_output" | grep -q "$img"; then
-                    break
-                fi
+            if ! mount | grep -q "$img"; then
+                break
             fi

             sleep 0.5
--
2.43.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Patch v4 6/7] fuse: add 'io-uring' option
  2026-02-07 12:09 ` [Patch v4 6/7] fuse: add 'io-uring' option Brian Song
@ 2026-02-09  5:24   ` Markus Armbruster
  2026-02-11 22:50   ` Stefan Hajnoczi
  1 sibling, 0 replies; 21+ messages in thread
From: Markus Armbruster @ 2026-02-09  5:24 UTC (permalink / raw)
  To: Brian Song
  Cc: qemu-block, qemu-devel, hreitz, kwolf, eblake, stefanha, fam,
	bernd

Brian Song <hibriansong@gmail.com> writes:

> This patch adds a new storage export option for storage-export-daemon
> to enable FUSE-over-io_uring via 'io-uring=on|off' (default: off).
>
> The initialization phase performs a protocol handshake via the legacy
> /dev/fuse interface before transitioning to the io_uring mode.
> If multiple IOThreads are configured, the export distributes the uring
> queues to handle requests concurrently.
>
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Brian Song <hibriansong@gmail.com>

[...]

> diff --git a/qapi/block-export.json b/qapi/block-export.json
> index 9ae703ad01..37f2fc47e2 100644
> --- a/qapi/block-export.json
> +++ b/qapi/block-export.json
> @@ -184,12 +184,15 @@
>  #     mount the export with allow_other, and if that fails, try again
>  #     without.  (since 6.1; default: auto)
>  #
> +# @io-uring: Use FUSE-over-io-uring.  (since 10.2; default: false)

since 11.0

> +#
>  # Since: 6.0
>  ##
>  { 'struct': 'BlockExportOptionsFuse',
>    'data': { 'mountpoint': 'str',
>              '*growable': 'bool',
> -            '*allow-other': 'FuseExportAllowOther' },
> +            '*allow-other': 'FuseExportAllowOther',
> +            '*io-uring': 'bool' },
>    'if': 'CONFIG_FUSE' }
>
>  ##

With that
Acked-by: Markus Armbruster <armbru@redhat.com>

[...]



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch v4 1/7] aio-posix: enable 128-byte SQEs
  2026-02-07 12:08 ` [Patch v4 1/7] aio-posix: enable 128-byte SQEs Brian Song
@ 2026-02-11 20:28   ` Stefan Hajnoczi
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2026-02-11 20:28 UTC (permalink / raw)
  To: Brian Song
  Cc: qemu-block, qemu-devel, hreitz, kwolf, eblake, armbru, fam, bernd

[-- Attachment #1: Type: text/plain, Size: 879 bytes --]

On Sat, Feb 07, 2026 at 08:08:55PM +0800, Brian Song wrote:
> This patch enables the IORING_SETUP_SQE128 flag during io_uring
> initialization to support the FUSE protocol requirements.
> 
> The FUSE-over-io_uring implementation embeds a protocol-specific
> structure directly into the Submission Queue Entry (SQE)
> to pass metadata such as the queue ID and commit ID.
> 
> Enabling SQE128 expands the SQE size to 128 bytes, providing 80 bytes
> of available command space. This ensures sufficient room for the FUSE
> headers and future protocol extensions.
> 
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Brian Song <hibriansong@gmail.com>
> ---
>  util/fdmon-io_uring.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch v4 2/7] fuse: io_uring mode init
  2026-02-07 12:08 ` [Patch v4 2/7] fuse: io_uring mode init Brian Song
@ 2026-02-11 20:56   ` Stefan Hajnoczi
  2026-02-13  8:40     ` Brian Song
  2026-02-13  9:03     ` Brian Song
  0 siblings, 2 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2026-02-11 20:56 UTC (permalink / raw)
  To: Brian Song
  Cc: qemu-block, qemu-devel, hreitz, kwolf, eblake, armbru, fam, bernd

[-- Attachment #1: Type: text/plain, Size: 6332 bytes --]

On Sat, Feb 07, 2026 at 08:08:56PM +0800, Brian Song wrote:
> +typedef struct FuseUringEnt {
> +    /* back pointer */
> +    FuseUringQueue *rq;
> +
> +    /* commit id of a fuse request */
> +    uint64_t req_commit_id;

This field is only read by fuse_uring_resubmit() in this patch, never
assigned. Maybe some code later in the patch series should be squashed
into this patch so that this field is initialized somewhere?

Also, is it possible to drop req_commit_id and read from the existing
req_header.ring_ent_in_out.commit_id field instead?

> +static void fuse_uring_cqe_handler(CqeHandler *cqe_handler)
> +{
> +    Coroutine *co;
> +    FuseUringEnt *ent =
> +        container_of(cqe_handler, FuseUringEnt, fuse_cqe_handler);
> +    FuseExport *exp = ent->rq->q->exp;
> +
> +    if (unlikely(exp->halted)) {
> +        return;

Missing blk_exp_unref()?

> +    }
> +
> +    int err = cqe_handler->cqe.res;
> +
> +    if (unlikely(err != 0)) {
> +        switch (err) {
> +        case -EAGAIN:
> +        case -EINTR:
> +            aio_add_sqe(fuse_uring_resubmit, ent, &ent->fuse_cqe_handler);
> +            break;
> +        case -ENOTCONN:
> +            /* Connection already gone */
> +            break;
> +        default:
> +            fuse_export_halt(exp);
> +            break;
> +        }
> +
> +        /* A uring entry returned */
> +        blk_exp_unref(&exp->common);
> +    } else {
> +        co = qemu_coroutine_create(co_fuse_uring_queue_handle_cqe, ent);
> +        /* Account this request as in-flight */
> +        fuse_inc_in_flight(exp);
> +        qemu_coroutine_enter(co);
> +    }
> +}
> +
> +static void
> +fuse_uring_sqe_set_req_data(struct fuse_uring_cmd_req *req,
> +                            const unsigned int rqid,
> +                            const unsigned int commit_id)
> +{
> +    req->qid = rqid;
> +    req->commit_id = commit_id;
> +    req->flags = 0;
> +}
> +
> +static void
> +fuse_uring_sqe_prepare(struct io_uring_sqe *sqe, FuseQueue *q, __u32 cmd_op)
> +{
> +    sqe->opcode = IORING_OP_URING_CMD;
> +
> +    sqe->fd = q->fuse_fd;
> +    sqe->rw_flags = 0;
> +    sqe->ioprio = 0;
> +    sqe->off = 0;
> +
> +    sqe->cmd_op = cmd_op;
> +    sqe->__pad1 = 0;
> +}
> +
> +static void fuse_uring_prep_sqe_register(struct io_uring_sqe *sqe, void *opaque)
> +{
> +    FuseUringEnt *ent = opaque;
> +    struct fuse_uring_cmd_req *req = (void *)&sqe->cmd[0];
> +
> +    ent->last_cmd = FUSE_IO_URING_CMD_REGISTER;
> +    fuse_uring_sqe_prepare(sqe, ent->rq->q, ent->last_cmd);
> +
> +    sqe->addr = (uint64_t)(ent->iov);
> +    sqe->len = 2;
> +
> +    fuse_uring_sqe_set_req_data(req, ent->rq->rqid, 0);
> +}
> +
> +static void fuse_uring_resubmit(struct io_uring_sqe *sqe, void *opaque)
> +{
> +    FuseUringEnt *ent = opaque;
> +    struct fuse_uring_cmd_req *req = (void *)&sqe->cmd[0];
> +
> +    fuse_uring_sqe_prepare(sqe, ent->rq->q, ent->last_cmd);
> +
> +    switch (ent->last_cmd) {
> +    case FUSE_IO_URING_CMD_REGISTER:
> +        sqe->addr = (uint64_t)(ent->iov);
> +        sqe->len = 2;
> +        fuse_uring_sqe_set_req_data(req, ent->rq->rqid, 0);
> +        break;
> +    case FUSE_IO_URING_CMD_COMMIT_AND_FETCH:
> +        fuse_uring_sqe_set_req_data(req, ent->rq->rqid, ent->req_commit_id);
> +        break;
> +    default:
> +        error_report("Unknown command type: %d", ent->last_cmd);
> +        break;
> +    }
> +}
> +
> +static void fuse_uring_submit_register(void *opaque)
> +{
> +    FuseUringQueue *rq = opaque;
> +    FuseExport *exp = rq->q->exp;
> +
> +    for (int j = 0; j < exp->uring_queue_depth; j++) {
> +        /* Register a uring entry */
> +        blk_exp_ref(&exp->common);
> +
> +        aio_add_sqe(fuse_uring_prep_sqe_register, &rq->ent[j],
> +                    &rq->ent[j].fuse_cqe_handler);
> +    }
> +}
> +
> +/**
> + * Distribute uring queues across FUSE queues in the round-robin manner.
> + * This ensures even distribution of kernel uring queues across user-specified
> + * FUSE queues.
> + *
> + * num_uring_queues > num_fuse_queues: Each IOThread manages multiple uring
> + * queues (multi-queue mapping).
> + * num_uring_queues < num_fuse_queues: Excess IOThreads remain idle with no
> + * assigned uring queues.
> + */
> +static void fuse_uring_setup_queues(FuseExport *exp, size_t bufsize)
> +{
> +    int num_uring_queues = get_nprocs_conf();
> +
> +    exp->num_uring_queues = num_uring_queues;
> +    exp->uring_queues = g_new(FuseUringQueue, num_uring_queues);
> +
> +    for (int i = 0; i < num_uring_queues; i++) {
> +        FuseUringQueue *rq = &exp->uring_queues[i];
> +        rq->rqid = i;
> +        rq->ent = g_new(FuseUringEnt, exp->uring_queue_depth);
> +
> +        for (int j = 0; j < exp->uring_queue_depth; j++) {
> +            FuseUringEnt *ent = &rq->ent[j];
> +            ent->rq = rq;
> +            ent->req_payload_sz = bufsize - FUSE_BUFFER_HEADER_SIZE;
> +            ent->req_payload = g_malloc0(ent->req_payload_sz);

I don't see a corresponding g_free() in this patch? Exports can be
deleted at runtime, so this memory must be freed.

> +
> +            ent->iov[0] = (struct iovec) {
> +                &ent->req_header,
> +                sizeof(struct fuse_uring_req_header)
> +            };
> +            ent->iov[1] = (struct iovec) {
> +                ent->req_payload,
> +                ent->req_payload_sz
> +            };
> +
> +            ent->fuse_cqe_handler.cb = fuse_uring_cqe_handler;
> +        }
> +
> +        /* Distribute uring queues across FUSE queues */
> +        rq->q = &exp->queues[i % exp->num_fuse_queues];
> +        QLIST_INSERT_HEAD(&(rq->q->uring_queue_list), rq, next);
> +    }
> +}
> +
> +static void
> +fuse_schedule_ring_queue_registrations(FuseExport *exp)
> +{
> +    for (int i = 0; i < exp->num_fuse_queues; i++) {
> +        FuseQueue *q = &exp->queues[i];
> +        FuseUringQueue *rq;
> +
> +        QLIST_FOREACH(rq, &q->uring_queue_list, next) {
> +            aio_bh_schedule_oneshot(q->ctx, fuse_uring_submit_register, rq);
> +        }
> +    }
> +}
> +
> +static void fuse_uring_start(FuseExport *exp, struct fuse_init_out *out)
> +{
> +    assert(!exp->uring_started);
> +    exp->uring_started = true;
> +
> +    /*
> +     * Since we dont't enable the FUSE_MAX_PAGES feature, the value of

s/dont't/don't/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch v4 3/7] fuse: uring support for write ops
  2026-02-07 12:08 ` [Patch v4 3/7] fuse: uring support for write ops Brian Song
@ 2026-02-11 21:08   ` Stefan Hajnoczi
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2026-02-11 21:08 UTC (permalink / raw)
  To: Brian Song
  Cc: qemu-block, qemu-devel, hreitz, kwolf, eblake, armbru, fam, bernd

[-- Attachment #1: Type: text/plain, Size: 656 bytes --]

On Sat, Feb 07, 2026 at 08:08:57PM +0800, Brian Song wrote:
> The payload of each uring entry serves as the buffer holding file data
> for read/write operations. In this patch, fuse_co_write is refactored
> to support using different buffer sources for write operations in
> legacy and io_uring modes.
> 
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Brian Song <hibriansong@gmail.com>
> ---
>  block/export/fuse.c | 55 ++++++++++++++++++++++++++++++---------------
>  1 file changed, 37 insertions(+), 18 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch v4 4/7] fuse: refactor FUSE request handler
  2026-02-07 12:08 ` [Patch v4 4/7] fuse: refactor FUSE request handler Brian Song
@ 2026-02-11 21:21   ` Stefan Hajnoczi
  2026-02-13  8:07     ` Brian Song
  0 siblings, 1 reply; 21+ messages in thread
From: Stefan Hajnoczi @ 2026-02-11 21:21 UTC (permalink / raw)
  To: Brian Song
  Cc: qemu-block, qemu-devel, hreitz, kwolf, eblake, armbru, fam, bernd

[-- Attachment #1: Type: text/plain, Size: 22556 bytes --]

On Sat, Feb 07, 2026 at 08:08:58PM +0800, Brian Song wrote:
> This patch implements the CQE handler for FUSE-over-io_uring. Upon
> receiving a FUSE request via a Completion Queue Entry (CQE), the
> handler processes the request and submits the response back to the
> kernel via the FUSE_IO_URING_CMD_COMMIT_AND_FETCH command.
> 
> Additionally, the request processing logic shared between legacy and
> io_uring modes has been extracted into fuse_co_process_request_common().
> The execution flow now dispatches requests to the appropriate
> mode-specific logic based on the uring_started flag.
> 
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Brian Song <hibriansong@gmail.com>
> ---
>  block/export/fuse.c | 400 +++++++++++++++++++++++++++++++++-----------
>  1 file changed, 301 insertions(+), 99 deletions(-)
> 
> diff --git a/block/export/fuse.c b/block/export/fuse.c
> index 867752555a..c117e081cd 100644
> --- a/block/export/fuse.c
> +++ b/block/export/fuse.c
> @@ -138,8 +138,8 @@ struct FuseQueue {
>       * FUSE_MIN_READ_BUFFER (from linux/fuse.h) bytes.
>       * This however is just the first part of the buffer; every read is given
>       * a vector of this buffer (which should be enough for all normal requests,
> -     * which we check via the static assertion in FUSE_IN_OP_STRUCT()) and the
> -     * spill-over buffer below.
> +     * which we check via the static assertion in FUSE_IN_OP_STRUCT_LEGACY())
> +     * and the spill-over buffer below.
>       * Therefore, the size of this buffer plus FUSE_SPILLOVER_BUF_SIZE must be
>       * FUSE_MIN_READ_BUFFER or more (checked via static assertion below).
>       */
> @@ -912,6 +912,7 @@ static void coroutine_fn co_read_from_fuse_fd(void *opaque)
>      }
> 
>      fuse_co_process_request(q, spillover_buf);
> +    qemu_vfree(spillover_buf);
> 
>  no_request:
>      fuse_dec_in_flight(exp);
> @@ -1684,100 +1685,75 @@ static int fuse_write_buf_response(int fd, uint32_t req_id,
>  }
> 
>  /*
> - * For use in fuse_co_process_request():
> + * For use in fuse_co_process_request_common():
>   * Returns a pointer to the parameter object for the given operation (inside of
> - * queue->request_buf, which is assumed to hold a fuse_in_header first).
> - * Verifies that the object is complete (queue->request_buf is large enough to
> - * hold it in one piece, and the request length includes the whole object).
> + * in_buf, which is assumed to hold a fuse_in_header first).
> + * Verifies that the object is complete (in_buf is large enough to hold it in
> + * one piece, and the request length includes the whole object).
> + * Only performs verification for legacy FUSE.
>   *
>   * Note that queue->request_buf may be overwritten after yielding, so the
>   * returned pointer must not be used across a function that may yield!
>   */
> -#define FUSE_IN_OP_STRUCT(op_name, queue) \
> +#define FUSE_IN_OP_STRUCT_LEGACY(op_name, queue) \
>      ({ \
>          const struct fuse_in_header *__in_hdr = \
>              (const struct fuse_in_header *)(queue)->request_buf; \
>          const struct fuse_##op_name##_in *__in = \
>              (const struct fuse_##op_name##_in *)(__in_hdr + 1); \
>          const size_t __param_len = sizeof(*__in_hdr) + sizeof(*__in); \
> -        uint32_t __req_len; \
>          \
> -        QEMU_BUILD_BUG_ON(sizeof((queue)->request_buf) < __param_len); \
> +        QEMU_BUILD_BUG_ON(sizeof((queue)->request_buf) < \
> +                          (sizeof(struct fuse_in_header) + \
> +                           sizeof(struct fuse_##op_name##_in))); \
>          \
> -        __req_len = __in_hdr->len; \
> +        uint32_t __req_len = __in_hdr->len; \
>          if (__req_len < __param_len) { \
>              warn_report("FUSE request truncated (%" PRIu32 " < %zu)", \
>                          __req_len, __param_len); \
>              ret = -EINVAL; \
> -            break; \
> +            __in = NULL; \
>          } \
>          __in; \
>      })
> 
>  /*
> - * For use in fuse_co_process_request():
> + * For use in fuse_co_process_request_common():
>   * Returns a pointer to the return object for the given operation (inside of
>   * out_buf, which is assumed to hold a fuse_out_header first).
> - * Verifies that out_buf is large enough to hold the whole object.
> + * Only performs verification for legacy FUSE.
> + * Note: Buffer size verification is done via static assertions in the caller
> + * (fuse_co_process_request) where out_buf is a local array.
>   *
> - * (out_buf should be a char[] array.)
> + * (out_buf should be a char[] array in the caller.)
>   */
> -#define FUSE_OUT_OP_STRUCT(op_name, out_buf) \
> +#define FUSE_OUT_OP_STRUCT_LEGACY(op_name, out_buf) \
>      ({ \
>          struct fuse_out_header *__out_hdr = \
>              (struct fuse_out_header *)(out_buf); \
>          struct fuse_##op_name##_out *__out = \
>              (struct fuse_##op_name##_out *)(__out_hdr + 1); \
>          \
> -        QEMU_BUILD_BUG_ON(sizeof(*__out_hdr) + sizeof(*__out) > \
> -                          sizeof(out_buf)); \
> -        \
>          __out; \
>      })
> 
>  /**
> - * Process a FUSE request, incl. writing the response.
> - *
> - * Note that yielding in any request-processing function can overwrite the
> - * contents of q->request_buf.  Anything that takes a buffer needs to take
> - * care that the content is copied before yielding.
> - *
> - * @spillover_buf can contain the tail of a write request too large to fit into
> - * q->request_buf.  This function takes ownership of it (i.e. will free it),
> - * which assumes that its contents will not be overwritten by concurrent
> - * requests (as opposed to q->request_buf).
> + * Shared helper for FUSE request processing. Handles both legacy and io_uring
> + * paths.
>   */
> -static void coroutine_fn
> -fuse_co_process_request(FuseQueue *q, void *spillover_buf)
> +static void coroutine_fn fuse_co_process_request_common(
> +    FuseExport *exp,
> +    uint32_t opcode,
> +    uint64_t req_id,
> +    void *in_buf,
> +    void *spillover_buf,
> +    void *out_buf,
> +    void (*send_response)(void *opaque, uint32_t req_id, int ret,
> +                          const void *buf, void *out_buf),
> +    void *opaque /* FuseQueue* or FuseUringEnt* */)
>  {
> -    FuseExport *exp = q->exp;
> -    uint32_t opcode;
> -    uint64_t req_id;
> -    /*
> -     * Return buffer.  Must be large enough to hold all return headers, but does
> -     * not include space for data returned by read requests.
> -     * (FUSE_IN_OP_STRUCT() verifies at compile time that out_buf is indeed
> -     * large enough.)
> -     */
> -    char out_buf[sizeof(struct fuse_out_header) +
> -                 MAX_CONST(sizeof(struct fuse_init_out),
> -                 MAX_CONST(sizeof(struct fuse_open_out),
> -                 MAX_CONST(sizeof(struct fuse_attr_out),
> -                 MAX_CONST(sizeof(struct fuse_write_out),
> -                           sizeof(struct fuse_lseek_out)))))];
> -    struct fuse_out_header *out_hdr = (struct fuse_out_header *)out_buf;
> -    /* For read requests: Data to be returned */
>      void *out_data_buffer = NULL;
> -    ssize_t ret;
> -
> -    /* Limit scope to ensure pointer is no longer used after yielding */
> -    {
> -        const struct fuse_in_header *in_hdr =
> -            (const struct fuse_in_header *)q->request_buf;
> -
> -        opcode = in_hdr->opcode;
> -        req_id = in_hdr->unique;
> -    }
> +    int ret = 0;
> 
>  #ifdef CONFIG_LINUX_IO_URING
>      /*
> @@ -1794,15 +1770,32 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
> 
>      switch (opcode) {
>      case FUSE_INIT: {
> -        const struct fuse_init_in *in = FUSE_IN_OP_STRUCT(init, q);
> -        ret = fuse_co_init(exp, FUSE_OUT_OP_STRUCT(init, out_buf),
> -                           in->max_readahead, in);
> +        FuseQueue *q = opaque;
> +        const struct fuse_init_in *in =
> +            FUSE_IN_OP_STRUCT_LEGACY(init, q);
> +        if (!in) {
> +            break;
> +        }
> +
> +        struct fuse_init_out *out =
> +            FUSE_OUT_OP_STRUCT_LEGACY(init, out_buf);
> +
> +        ret = fuse_co_init(exp, out, in->max_readahead, in);
>          break;
>      }
> 
> -    case FUSE_OPEN:
> -        ret = fuse_co_open(exp, FUSE_OUT_OP_STRUCT(open, out_buf));
> +    case FUSE_OPEN: {
> +        struct fuse_open_out *out;
> +
> +        if (exp->uring_started) {
> +            out = out_buf;
> +        } else {
> +            out = FUSE_OUT_OP_STRUCT_LEGACY(open, out_buf);
> +        }

It would be nice to avoid these repetitive code changes. How about
moving the if (exp->uring_started) logic inside FUSE_IN_OP_STRUCT() and
FUSE_OUT_OP_STRUCT()?

Also, is it really necessary to make FUSE_IN_OP_STRUCT() return NULL
instead of using the break statement on error?

> +
> +        ret = fuse_co_open(exp, out);
>          break;
> +    }
> 
>      case FUSE_RELEASE:
>          ret = 0;
> @@ -1812,37 +1805,105 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
>          ret = -ENOENT; /* There is no node but the root node */
>          break;
> 
> -    case FUSE_GETATTR:
> -        ret = fuse_co_getattr(exp, FUSE_OUT_OP_STRUCT(attr, out_buf));
> +    case FUSE_GETATTR: {
> +        struct fuse_attr_out *out;
> +
> +        if (exp->uring_started) {
> +            out = out_buf;
> +        } else {
> +            out = FUSE_OUT_OP_STRUCT_LEGACY(attr, out_buf);
> +        }
> +
> +        ret = fuse_co_getattr(exp, out);
>          break;
> +    }
> 
>      case FUSE_SETATTR: {
> -        const struct fuse_setattr_in *in = FUSE_IN_OP_STRUCT(setattr, q);
> -        ret = fuse_co_setattr(exp, FUSE_OUT_OP_STRUCT(attr, out_buf),
> -                              in->valid, in->size, in->mode, in->uid, in->gid);
> +        const struct fuse_setattr_in *in;
> +        struct fuse_attr_out *out;
> +
> +        if (exp->uring_started) {
> +            in = in_buf;
> +            out = out_buf;
> +        } else {
> +            FuseQueue *q = opaque;
> +            in = FUSE_IN_OP_STRUCT_LEGACY(setattr, q);
> +            if (!in) {
> +                break;
> +            }
> +
> +            out = FUSE_OUT_OP_STRUCT_LEGACY(attr, out_buf);
> +        }
> +
> +        ret = fuse_co_setattr(exp, out, in->valid, in->size, in->mode,
> +                              in->uid, in->gid);
>          break;
>      }
> 
>      case FUSE_READ: {
> -        const struct fuse_read_in *in = FUSE_IN_OP_STRUCT(read, q);
> +        const struct fuse_read_in *in;
> +
> +        if (exp->uring_started) {
> +            in = in_buf;
> +        } else {
> +            FuseQueue *q = opaque;
> +            in = FUSE_IN_OP_STRUCT_LEGACY(read, q);
> +            if (!in) {
> +                break;
> +            }
> +        }
> +
>          ret = fuse_co_read(exp, &out_data_buffer, in->offset, in->size);
>          break;
>      }
> 
>      case FUSE_WRITE: {
> -        const struct fuse_write_in *in = FUSE_IN_OP_STRUCT(write, q);
> -        uint32_t req_len;
> -
> -        req_len = ((const struct fuse_in_header *)q->request_buf)->len;
> -        if (unlikely(req_len < sizeof(struct fuse_in_header) + sizeof(*in) +
> -                               in->size)) {
> -            warn_report("FUSE WRITE truncated; received %zu bytes of %" PRIu32,
> -                        req_len - sizeof(struct fuse_in_header) - sizeof(*in),
> -                        in->size);
> -            ret = -EINVAL;
> -            break;
> -        }
> +        const struct fuse_write_in *in;
> +        struct fuse_write_out *out;
> +        const void *in_place_buf;
> +        const void *spill_buf;
> +
> +        if (exp->uring_started) {
> +            FuseUringEnt *ent = opaque;
> +
> +            in = in_buf;
> +            out = out_buf;
> +
> +            assert(in->size <= ent->req_header.ring_ent_in_out.payload_sz);
> 
> +            /*
> +             * In uring mode, the "out_buf" (ent->payload) actually holds the
> +             * input data for WRITE requests.
> +             */
> +            in_place_buf = NULL;
> +            spill_buf = out_buf;
> +        } else {
> +            FuseQueue *q = opaque;
> +            in = FUSE_IN_OP_STRUCT_LEGACY(write, q);
> +            if (!in) {
> +                break;
> +            }
> +
> +            out = FUSE_OUT_OP_STRUCT_LEGACY(write, out_buf);
> +
> +            /* Additional check for WRITE: verify the request includes data */
> +            uint32_t req_len =
> +                ((const struct fuse_in_header *)(q->request_buf))->len;
> +
> +            if (unlikely(req_len < sizeof(struct fuse_in_header) + sizeof(*in) +
> +                        in->size)) {
> +                warn_report("FUSE WRITE truncated; received %zu bytes of %"
> +                    PRIu32,
> +                    req_len - sizeof(struct fuse_in_header) - sizeof(*in),
> +                    in->size);
> +                ret = -EINVAL;
> +                break;
> +            }
> +
> +            /* Legacy buffer setup */
> +            in_place_buf = in + 1;
> +            spill_buf = spillover_buf;
> +        }
>          /*
>           * poll_fuse_fd() has checked that in_hdr->len matches the number of
>           * bytes read, which cannot exceed the max_write value we set
> @@ -1856,13 +1917,24 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
>           * fuse_co_write() takes care to copy its contents before potentially
>           * yielding.
>           */
> -        ret = fuse_co_write(exp, FUSE_OUT_OP_STRUCT(write, out_buf),
> -                            in->offset, in->size, in + 1, spillover_buf);
> +        ret = fuse_co_write(exp, out, in->offset, in->size,
> +                            in_place_buf, spill_buf);
>          break;
>      }
> 
>      case FUSE_FALLOCATE: {
> -        const struct fuse_fallocate_in *in = FUSE_IN_OP_STRUCT(fallocate, q);
> +        const struct fuse_fallocate_in *in;
> +
> +        if (exp->uring_started) {
> +            in = in_buf;
> +        } else {
> +            FuseQueue *q = opaque;
> +            in = FUSE_IN_OP_STRUCT_LEGACY(fallocate, q);
> +            if (!in) {
> +                break;
> +            }
> +        }
> +
>          ret = fuse_co_fallocate(exp, in->offset, in->length, in->mode);
>          break;
>      }
> @@ -1877,9 +1949,23 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
> 
>  #ifdef CONFIG_FUSE_LSEEK
>      case FUSE_LSEEK: {
> -        const struct fuse_lseek_in *in = FUSE_IN_OP_STRUCT(lseek, q);
> -        ret = fuse_co_lseek(exp, FUSE_OUT_OP_STRUCT(lseek, out_buf),
> -                            in->offset, in->whence);
> +        const struct fuse_lseek_in *in;
> +        struct fuse_lseek_out *out;
> +
> +        if (exp->uring_started) {
> +            in = in_buf;
> +            out = out_buf;
> +        } else {
> +            FuseQueue *q = opaque;
> +            in = FUSE_IN_OP_STRUCT_LEGACY(lseek, q);
> +            if (!in) {
> +                break;
> +            }
> +
> +            out = FUSE_OUT_OP_STRUCT_LEGACY(lseek, out_buf);
> +        }
> +
> +        ret = fuse_co_lseek(exp, out, in->offset, in->whence);
>          break;
>      }
>  #endif
> @@ -1888,20 +1974,12 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
>          ret = -ENOSYS;
>      }
> 
> -    /* Ignore errors from fuse_write*(), nothing we can do anyway */
> +    send_response(opaque, req_id, ret, out_data_buffer, out_buf);
> +
>      if (out_data_buffer) {
> -        assert(ret >= 0);
> -        fuse_write_buf_response(q->fuse_fd, req_id, out_hdr,
> -                                out_data_buffer, ret);
>          qemu_vfree(out_data_buffer);
> -    } else {
> -        fuse_write_response(q->fuse_fd, req_id, out_hdr,
> -                            ret < 0 ? ret : 0,
> -                            ret < 0 ? 0 : ret);
>      }
> 
> -    qemu_vfree(spillover_buf);
> -
>  #ifdef CONFIG_LINUX_IO_URING
>      if (unlikely(opcode == FUSE_INIT) && uring_initially_enabled) {
>          if (exp->is_uring && !exp->uring_started) {
> @@ -1910,7 +1988,8 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
>               * If io_uring mode was requested for this export but it has not
>               * been started yet, start it now.
>               */
> -            struct fuse_init_out *out = FUSE_OUT_OP_STRUCT(init, out_buf);
> +            struct fuse_init_out *out =
> +                FUSE_OUT_OP_STRUCT_LEGACY(init, out_buf);
>              fuse_uring_start(exp, out);
>          } else if (ret == -EOPNOTSUPP) {
>              /*
> @@ -1923,12 +2002,135 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
>      }
>  #endif
>  }
> +/* Helper to send response for legacy */
> +static void send_response_legacy(void *opaque, uint32_t req_id, int ret,
> +                                 const void *buf, void *out_buf)
> +{
> +    FuseQueue *q = (FuseQueue *)opaque;
> +    struct fuse_out_header *out_hdr = (struct fuse_out_header *)out_buf;
> +    if (buf) {
> +        assert(ret >= 0);
> +        fuse_write_buf_response(q->fuse_fd, req_id, out_hdr, buf, ret);
> +    } else {
> +        fuse_write_response(q->fuse_fd, req_id, out_hdr,
> +                            ret < 0 ? ret : 0,
> +                            ret < 0 ? 0 : ret);
> +    }
> +}
> +
> +static void coroutine_fn
> +fuse_co_process_request(FuseQueue *q, void *spillover_buf)
> +{
> +    FuseExport *exp = q->exp;
> +    uint32_t opcode;
> +    uint64_t req_id;
> +
> +    /*
> +     * Return buffer.  Must be large enough to hold all return headers, but does
> +     * not include space for data returned by read requests.
> +     */
> +    char out_buf[sizeof(struct fuse_out_header) +
> +        MAX_CONST(sizeof(struct fuse_init_out),
> +        MAX_CONST(sizeof(struct fuse_open_out),
> +        MAX_CONST(sizeof(struct fuse_attr_out),
> +        MAX_CONST(sizeof(struct fuse_write_out),
> +                  sizeof(struct fuse_lseek_out)))))] = {0};
> +
> +    /* Verify that out_buf is large enough for all output structures */
> +    QEMU_BUILD_BUG_ON(sizeof(struct fuse_out_header) +
> +        sizeof(struct fuse_init_out) > sizeof(out_buf));
> +    QEMU_BUILD_BUG_ON(sizeof(struct fuse_out_header) +
> +        sizeof(struct fuse_open_out) > sizeof(out_buf));
> +    QEMU_BUILD_BUG_ON(sizeof(struct fuse_out_header) +
> +        sizeof(struct fuse_attr_out) > sizeof(out_buf));
> +    QEMU_BUILD_BUG_ON(sizeof(struct fuse_out_header) +
> +        sizeof(struct fuse_write_out) > sizeof(out_buf));
> +#ifdef CONFIG_FUSE_LSEEK
> +    QEMU_BUILD_BUG_ON(sizeof(struct fuse_out_header) +
> +        sizeof(struct fuse_lseek_out) > sizeof(out_buf));
> +#endif
> +
> +    /* Limit scope to ensure pointer is no longer used after yielding */
> +    {
> +        const struct fuse_in_header *in_hdr =
> +            (const struct fuse_in_header *)q->request_buf;
> +
> +        opcode = in_hdr->opcode;
> +        req_id = in_hdr->unique;
> +    }
> +
> +    fuse_co_process_request_common(exp, opcode, req_id, NULL, spillover_buf,
> +                                   out_buf, send_response_legacy, q);
> +}
> 
>  #ifdef CONFIG_LINUX_IO_URING
> +static void fuse_uring_prep_sqe_commit(struct io_uring_sqe *sqe, void *opaque)
> +{
> +    FuseUringEnt *ent = opaque;
> +    struct fuse_uring_cmd_req *req = (void *)&sqe->cmd[0];
> +
> +    ent->last_cmd = FUSE_IO_URING_CMD_COMMIT_AND_FETCH;
> +
> +    fuse_uring_sqe_prepare(sqe, ent->rq->q, ent->last_cmd);
> +    fuse_uring_sqe_set_req_data(req, ent->rq->rqid, ent->req_commit_id);
> +}
> +
> +static void
> +fuse_uring_send_response(FuseUringEnt *ent, uint32_t req_id, int ret,
> +                         const void *out_data_buffer)
> +{
> +    FuseExport *exp = ent->rq->q->exp;
> +
> +    struct fuse_uring_req_header *rrh = &ent->req_header;
> +    struct fuse_out_header *out_header = (struct fuse_out_header *)&rrh->in_out;
> +    struct fuse_uring_ent_in_out *ent_in_out =
> +        (struct fuse_uring_ent_in_out *)&rrh->ring_ent_in_out;
> +
> +    /* FUSE_READ */
> +    if (out_data_buffer && ret > 0) {
> +        memcpy(ent->req_payload, out_data_buffer, ret);
> +    }
> +
> +    out_header->error  = ret < 0 ? ret : 0;
> +    out_header->unique = req_id;
> +    ent_in_out->payload_sz = ret > 0 ? ret : 0;
> +
> +    /* Commit and fetch a uring entry */
> +    blk_exp_ref(&exp->common);
> +    aio_add_sqe(fuse_uring_prep_sqe_commit, ent, &ent->fuse_cqe_handler);
> +}
> +
> +/* Helper to send response for uring */
> +static void send_response_uring(void *opaque, uint32_t req_id, int ret,
> +                                const void *out_data_buffer, void *payload)
> +{
> +    FuseUringEnt *ent = (FuseUringEnt *)opaque;
> +
> +    fuse_uring_send_response(ent, req_id, ret, out_data_buffer);
> +}
> +
>  static void coroutine_fn fuse_uring_co_process_request(FuseUringEnt *ent)
>  {
> -    /* TODO */
> -    (void)ent;
> +    FuseExport *exp = ent->rq->q->exp;
> +    struct fuse_uring_req_header *rrh = &ent->req_header;
> +    struct fuse_uring_ent_in_out *ent_in_out =
> +        (struct fuse_uring_ent_in_out *)&rrh->ring_ent_in_out;
> +    struct fuse_in_header *in_hdr =
> +        (struct fuse_in_header *)&rrh->in_out;
> +    uint32_t opcode = in_hdr->opcode;
> +    uint64_t req_id = in_hdr->unique;
> +
> +    ent->req_commit_id = ent_in_out->commit_id;
> +
> +    if (unlikely(ent->req_commit_id == 0)) {
> +        error_report("If this happens kernel will not find the response - "
> +            "it will be stuck forever - better to abort immediately.");
> +        fuse_export_halt(exp);
> +        return;
> +    }
> +
> +    fuse_co_process_request_common(exp, opcode, req_id, &rrh->op_in,
> +        NULL, ent->req_payload, send_response_uring, ent);
>  }
>  #endif /* CONFIG_LINUX_IO_URING */
> 
> --
> 2.43.0
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch v4 5/6] fuse: safe termination for io_uring
  2026-02-07 12:08 ` [Patch v4 5/6] fuse: safe termination for io_uring Brian Song
@ 2026-02-11 21:52   ` Stefan Hajnoczi
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2026-02-11 21:52 UTC (permalink / raw)
  To: Brian Song
  Cc: qemu-block, qemu-devel, hreitz, kwolf, eblake, armbru, fam, bernd

[-- Attachment #1: Type: text/plain, Size: 1959 bytes --]

On Sat, Feb 07, 2026 at 08:08:59PM +0800, Brian Song wrote:
> When a termination signal is received, the storage-export-daemon stops
> the export, exits the main loop (main_loop_wait), and begins resource
> cleanup. However, some FUSE_IO_URING_CMD_COMMIT_AND_FETCH SQEs may
> remain pending in the kernel, waiting for incoming FUSE requests.
> 
> Currently, there is no way to manually cancel these pending CQEs in the
> kernel. As a result, after export termination, the related data
> structures might be deleted before the pending CQEs return, causing the
> CQE handler to be invoked after it has been freed, which may lead to a
> segfault.
> 
> As a workaround, when submitting an SQE to the kernel, we increment the
> block reference (blk_exp_ref) to prevent the CQE handler from being
> deleted during export termination. Once the CQE is received, we
> decrement the reference (blk_exp_unref).
> 
> However, this introduces a new issue: if no new FUSE requests arrive,
> the pending SQEs held by the kernel will never complete. Consequently,
> the export reference count never drops to zero, preventing the export
> from shutting down cleanly.
> 
> To resolve this, we schedule a Bottom Half (BH) for each FUSE queue
> during the export shutdown phase. The BH closes the fuse_fd to prevent
> race conditions, while the session is unmounted during the remainder of
> the shutdown sequence. This explicitly aborts all pending SQEs in the
> kernel, forcing the corresponding CQEs to return. This triggers the
> release of held references, allowing the export to be freed safely.
> 
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Brian Song <hibriansong@gmail.com>
> ---
>  block/export/fuse.c | 100 +++++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 90 insertions(+), 10 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch v4 7/7] fuse: add io_uring test support
  2026-02-07 12:09 ` [Patch v4 7/7] fuse: add io_uring test support Brian Song
@ 2026-02-11 21:53   ` Stefan Hajnoczi
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2026-02-11 21:53 UTC (permalink / raw)
  To: Brian Song
  Cc: qemu-block, qemu-devel, hreitz, kwolf, eblake, armbru, fam, bernd

[-- Attachment #1: Type: text/plain, Size: 960 bytes --]

On Sat, Feb 07, 2026 at 08:09:01PM +0800, Brian Song wrote:
> This patch adds support for testing FUSE-over-io_uring. It can be
> enabled by setting the environment variable FUSE_OVER_IO_URING=1.
> 
>     $ FUSE_OVER_IO_URING=1 ./check -fuse
> 
> Additionally, the unmount detection logic is switched from `df` to
> `mount`. Using `df` triggers `statfs()`, which attempts to communicate
> with the FUSE daemon. During test teardown (when the daemon is being
> killed), this often fails or returns stale data, causing the test to
> misjudge the mount status.
> 
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Brian Song <hibriansong@gmail.com>
> ---
>  tests/qemu-iotests/check     |  2 ++
>  tests/qemu-iotests/common.rc | 47 +++++++++++++++++++++++++-----------
>  2 files changed, 35 insertions(+), 14 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch v4 6/7] fuse: add 'io-uring' option
  2026-02-07 12:09 ` [Patch v4 6/7] fuse: add 'io-uring' option Brian Song
  2026-02-09  5:24   ` Markus Armbruster
@ 2026-02-11 22:50   ` Stefan Hajnoczi
  1 sibling, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2026-02-11 22:50 UTC (permalink / raw)
  To: Brian Song
  Cc: qemu-block, qemu-devel, hreitz, kwolf, eblake, armbru, fam, bernd

[-- Attachment #1: Type: text/plain, Size: 969 bytes --]

On Sat, Feb 07, 2026 at 08:09:00PM +0800, Brian Song wrote:
> This patch adds a new storage export option for storage-export-daemon
> to enable FUSE-over-io_uring via 'io-uring=on|off' (default: off).
> 
> The initialization phase performs a protocol handshake via the legacy
> /dev/fuse interface before transitioning to the io_uring mode.
> If multiple IOThreads are configured, the export distributes the uring
> queues to handle requests concurrently.
> 
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Brian Song <hibriansong@gmail.com>
> ---
>  block/export/fuse.c                  | 3 +--
>  docs/tools/qemu-storage-daemon.rst   | 7 +++++--
>  qapi/block-export.json               | 5 ++++-
>  storage-daemon/qemu-storage-daemon.c | 1 +
>  4 files changed, 11 insertions(+), 5 deletions(-)

Aside from Markus' comment:

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch v4 4/7] fuse: refactor FUSE request handler
  2026-02-11 21:21   ` Stefan Hajnoczi
@ 2026-02-13  8:07     ` Brian Song
  2026-02-19 21:14       ` [PATCH] fuse: unify op_in for io_uring and classic FUSE Stefan Hajnoczi
  0 siblings, 1 reply; 21+ messages in thread
From: Brian Song @ 2026-02-13  8:07 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-block, qemu-devel, hreitz, kwolf, eblake, armbru, fam, bernd



> On Feb 12, 2026, at 05:21, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Sat, Feb 07, 2026 at 08:08:58PM +0800, Brian Song wrote:
>> This patch implements the CQE handler for FUSE-over-io_uring. Upon
>> receiving a FUSE request via a Completion Queue Entry (CQE), the
>> handler processes the request and submits the response back to the
>> kernel via the FUSE_IO_URING_CMD_COMMIT_AND_FETCH command.
>> 
>> Additionally, the request processing logic shared between legacy and
>> io_uring modes has been extracted into fuse_co_process_request_common().
>> The execution flow now dispatches requests to the appropriate
>> mode-specific logic based on the uring_started flag.
>> 
>> Suggested-by: Kevin Wolf <kwolf@redhat.com>
>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
>> Signed-off-by: Brian Song <hibriansong@gmail.com>
>> ---
>> block/export/fuse.c | 400 +++++++++++++++++++++++++++++++++-----------
>> 1 file changed, 301 insertions(+), 99 deletions(-)
>> 
>> diff --git a/block/export/fuse.c b/block/export/fuse.c
>> index 867752555a..c117e081cd 100644
>> --- a/block/export/fuse.c
>> +++ b/block/export/fuse.c
>> @@ -138,8 +138,8 @@ struct FuseQueue {
>>      * FUSE_MIN_READ_BUFFER (from linux/fuse.h) bytes.
>>      * This however is just the first part of the buffer; every read is given
>>      * a vector of this buffer (which should be enough for all normal requests,
>> -     * which we check via the static assertion in FUSE_IN_OP_STRUCT()) and the
>> -     * spill-over buffer below.
>> +     * which we check via the static assertion in FUSE_IN_OP_STRUCT_LEGACY())
>> +     * and the spill-over buffer below.
>>      * Therefore, the size of this buffer plus FUSE_SPILLOVER_BUF_SIZE must be
>>      * FUSE_MIN_READ_BUFFER or more (checked via static assertion below).
>>      */
>> @@ -912,6 +912,7 @@ static void coroutine_fn co_read_from_fuse_fd(void *opaque)
>>     }
>> 
>>     fuse_co_process_request(q, spillover_buf);
>> +    qemu_vfree(spillover_buf);
>> 
>> no_request:
>>     fuse_dec_in_flight(exp);
>> @@ -1684,100 +1685,75 @@ static int fuse_write_buf_response(int fd, uint32_t req_id,
>> }
>> 
>> /*
>> - * For use in fuse_co_process_request():
>> + * For use in fuse_co_process_request_common():
>>  * Returns a pointer to the parameter object for the given operation (inside of
>> - * queue->request_buf, which is assumed to hold a fuse_in_header first).
>> - * Verifies that the object is complete (queue->request_buf is large enough to
>> - * hold it in one piece, and the request length includes the whole object).
>> + * in_buf, which is assumed to hold a fuse_in_header first).
>> + * Verifies that the object is complete (in_buf is large enough to hold it in
>> + * one piece, and the request length includes the whole object).
>> + * Only performs verification for legacy FUSE.
>>  *
>>  * Note that queue->request_buf may be overwritten after yielding, so the
>>  * returned pointer must not be used across a function that may yield!
>>  */
>> -#define FUSE_IN_OP_STRUCT(op_name, queue) \
>> +#define FUSE_IN_OP_STRUCT_LEGACY(op_name, queue) \
>>     ({ \
>>         const struct fuse_in_header *__in_hdr = \
>>             (const struct fuse_in_header *)(queue)->request_buf; \
>>         const struct fuse_##op_name##_in *__in = \
>>             (const struct fuse_##op_name##_in *)(__in_hdr + 1); \
>>         const size_t __param_len = sizeof(*__in_hdr) + sizeof(*__in); \
>> -        uint32_t __req_len; \
>>         \
>> -        QEMU_BUILD_BUG_ON(sizeof((queue)->request_buf) < __param_len); \
>> +        QEMU_BUILD_BUG_ON(sizeof((queue)->request_buf) < \
>> +                          (sizeof(struct fuse_in_header) + \
>> +                           sizeof(struct fuse_##op_name##_in))); \
>>         \
>> -        __req_len = __in_hdr->len; \
>> +        uint32_t __req_len = __in_hdr->len; \
>>         if (__req_len < __param_len) { \
>>             warn_report("FUSE request truncated (%" PRIu32 " < %zu)", \
>>                         __req_len, __param_len); \
>>             ret = -EINVAL; \
>> -            break; \
>> +            __in = NULL; \
>>         } \
>>         __in; \
>>     })
>> 
>> /*
>> - * For use in fuse_co_process_request():
>> + * For use in fuse_co_process_request_common():
>>  * Returns a pointer to the return object for the given operation (inside of
>>  * out_buf, which is assumed to hold a fuse_out_header first).
>> - * Verifies that out_buf is large enough to hold the whole object.
>> + * Only performs verification for legacy FUSE.
>> + * Note: Buffer size verification is done via static assertions in the caller
>> + * (fuse_co_process_request) where out_buf is a local array.
>>  *
>> - * (out_buf should be a char[] array.)
>> + * (out_buf should be a char[] array in the caller.)
>>  */
>> -#define FUSE_OUT_OP_STRUCT(op_name, out_buf) \
>> +#define FUSE_OUT_OP_STRUCT_LEGACY(op_name, out_buf) \
>>     ({ \
>>         struct fuse_out_header *__out_hdr = \
>>             (struct fuse_out_header *)(out_buf); \
>>         struct fuse_##op_name##_out *__out = \
>>             (struct fuse_##op_name##_out *)(__out_hdr + 1); \
>>         \
>> -        QEMU_BUILD_BUG_ON(sizeof(*__out_hdr) + sizeof(*__out) > \
>> -                          sizeof(out_buf)); \
>> -        \
>>         __out; \
>>     })
>> 
>> /**
>> - * Process a FUSE request, incl. writing the response.
>> - *
>> - * Note that yielding in any request-processing function can overwrite the
>> - * contents of q->request_buf.  Anything that takes a buffer needs to take
>> - * care that the content is copied before yielding.
>> - *
>> - * @spillover_buf can contain the tail of a write request too large to fit into
>> - * q->request_buf.  This function takes ownership of it (i.e. will free it),
>> - * which assumes that its contents will not be overwritten by concurrent
>> - * requests (as opposed to q->request_buf).
>> + * Shared helper for FUSE request processing. Handles both legacy and io_uring
>> + * paths.
>>  */
>> -static void coroutine_fn
>> -fuse_co_process_request(FuseQueue *q, void *spillover_buf)
>> +static void coroutine_fn fuse_co_process_request_common(
>> +    FuseExport *exp,
>> +    uint32_t opcode,
>> +    uint64_t req_id,
>> +    void *in_buf,
>> +    void *spillover_buf,
>> +    void *out_buf,
>> +    void (*send_response)(void *opaque, uint32_t req_id, int ret,
>> +                          const void *buf, void *out_buf),
>> +    void *opaque /* FuseQueue* or FuseUringEnt* */)
>> {
>> -    FuseExport *exp = q->exp;
>> -    uint32_t opcode;
>> -    uint64_t req_id;
>> -    /*
>> -     * Return buffer.  Must be large enough to hold all return headers, but does
>> -     * not include space for data returned by read requests.
>> -     * (FUSE_IN_OP_STRUCT() verifies at compile time that out_buf is indeed
>> -     * large enough.)
>> -     */
>> -    char out_buf[sizeof(struct fuse_out_header) +
>> -                 MAX_CONST(sizeof(struct fuse_init_out),
>> -                 MAX_CONST(sizeof(struct fuse_open_out),
>> -                 MAX_CONST(sizeof(struct fuse_attr_out),
>> -                 MAX_CONST(sizeof(struct fuse_write_out),
>> -                           sizeof(struct fuse_lseek_out)))))];
>> -    struct fuse_out_header *out_hdr = (struct fuse_out_header *)out_buf;
>> -    /* For read requests: Data to be returned */
>>     void *out_data_buffer = NULL;
>> -    ssize_t ret;
>> -
>> -    /* Limit scope to ensure pointer is no longer used after yielding */
>> -    {
>> -        const struct fuse_in_header *in_hdr =
>> -            (const struct fuse_in_header *)q->request_buf;
>> -
>> -        opcode = in_hdr->opcode;
>> -        req_id = in_hdr->unique;
>> -    }
>> +    int ret = 0;
>> 
>> #ifdef CONFIG_LINUX_IO_URING
>>     /*
>> @@ -1794,15 +1770,32 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
>> 
>>     switch (opcode) {
>>     case FUSE_INIT: {
>> -        const struct fuse_init_in *in = FUSE_IN_OP_STRUCT(init, q);
>> -        ret = fuse_co_init(exp, FUSE_OUT_OP_STRUCT(init, out_buf),
>> -                           in->max_readahead, in);
>> +        FuseQueue *q = opaque;
>> +        const struct fuse_init_in *in =
>> +            FUSE_IN_OP_STRUCT_LEGACY(init, q);
>> +        if (!in) {
>> +            break;
>> +        }
>> +
>> +        struct fuse_init_out *out =
>> +            FUSE_OUT_OP_STRUCT_LEGACY(init, out_buf);
>> +
>> +        ret = fuse_co_init(exp, out, in->max_readahead, in);
>>         break;
>>     }
>> 
>> -    case FUSE_OPEN:
>> -        ret = fuse_co_open(exp, FUSE_OUT_OP_STRUCT(open, out_buf));
>> +    case FUSE_OPEN: {
>> +        struct fuse_open_out *out;
>> +
>> +        if (exp->uring_started) {
>> +            out = out_buf;
>> +        } else {
>> +            out = FUSE_OUT_OP_STRUCT_LEGACY(open, out_buf);
>> +        }
> 
> It would be nice to avoid these repetitive code changes. How about
> moving the if (exp->uring_started) logic inside FUSE_IN_OP_STRUCT() and
> FUSE_OUT_OP_STRUCT()?

It would be better to leave it as is. Changing FUSE_OUT_OP_STRUCT() is easy, but it complicates FUSE_IN_OP_STRUCT() because we'd have to pass both in_buf and opaque to handle the assignment.

Since opaque represents either a FuseQueue* or a FuseUringEnt*, wouldn't it be confusing to integrate the if (exp->uring_started) check into FUSE_IN_OP_STRUCT()? Passing a variable that represents two different modes, along with in_buf(used in uring mode), repeatedly could get confusing.

> 
> Also, is it really necessary to make FUSE_IN_OP_STRUCT() return NULL
> instead of using the break statement on error?

I thought it would be more elegant to put the break outside of FUSE_IN_OP_STRUCT(), so that it looks more like a function call. But we definitely can put it back.

For example, if we move the if (exp->uring_started) logic inside FUSE_OUT_OP_STRUCT(), it would be better to pass exp as a parameter rather than referencing it directly inside the macro.

>> +
>> +        ret = fuse_co_open(exp, out);
>>         break;
>> +    }
>> 
>>     case FUSE_RELEASE:
>>         ret = 0;
>> @@ -1812,37 +1805,105 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
>>         ret = -ENOENT; /* There is no node but the root node */
>>         break;
>> 
>> -    case FUSE_GETATTR:
>> -        ret = fuse_co_getattr(exp, FUSE_OUT_OP_STRUCT(attr, out_buf));
>> +    case FUSE_GETATTR: {
>> +        struct fuse_attr_out *out;
>> +
>> +        if (exp->uring_started) {
>> +            out = out_buf;
>> +        } else {
>> +            out = FUSE_OUT_OP_STRUCT_LEGACY(attr, out_buf);
>> +        }
>> +
>> +        ret = fuse_co_getattr(exp, out);
>>         break;
>> +    }
>> 
>>     case FUSE_SETATTR: {
>> -        const struct fuse_setattr_in *in = FUSE_IN_OP_STRUCT(setattr, q);
>> -        ret = fuse_co_setattr(exp, FUSE_OUT_OP_STRUCT(attr, out_buf),
>> -                              in->valid, in->size, in->mode, in->uid, in->gid);
>> +        const struct fuse_setattr_in *in;
>> +        struct fuse_attr_out *out;
>> +
>> +        if (exp->uring_started) {
>> +            in = in_buf;
>> +            out = out_buf;
>> +        } else {
>> +            FuseQueue *q = opaque;
>> +            in = FUSE_IN_OP_STRUCT_LEGACY(setattr, q);
>> +            if (!in) {
>> +                break;
>> +            }
>> +
>> +            out = FUSE_OUT_OP_STRUCT_LEGACY(attr, out_buf);
>> +        }
>> +
>> +        ret = fuse_co_setattr(exp, out, in->valid, in->size, in->mode,
>> +                              in->uid, in->gid);
>>         break;
>>     }
>> 
>>     case FUSE_READ: {
>> -        const struct fuse_read_in *in = FUSE_IN_OP_STRUCT(read, q);
>> +        const struct fuse_read_in *in;
>> +
>> +        if (exp->uring_started) {
>> +            in = in_buf;
>> +        } else {
>> +            FuseQueue *q = opaque;
>> +            in = FUSE_IN_OP_STRUCT_LEGACY(read, q);
>> +            if (!in) {
>> +                break;
>> +            }
>> +        }
>> +
>>         ret = fuse_co_read(exp, &out_data_buffer, in->offset, in->size);
>>         break;
>>     }
>> 
>>     case FUSE_WRITE: {
>> -        const struct fuse_write_in *in = FUSE_IN_OP_STRUCT(write, q);
>> -        uint32_t req_len;
>> -
>> -        req_len = ((const struct fuse_in_header *)q->request_buf)->len;
>> -        if (unlikely(req_len < sizeof(struct fuse_in_header) + sizeof(*in) +
>> -                               in->size)) {
>> -            warn_report("FUSE WRITE truncated; received %zu bytes of %" PRIu32,
>> -                        req_len - sizeof(struct fuse_in_header) - sizeof(*in),
>> -                        in->size);
>> -            ret = -EINVAL;
>> -            break;
>> -        }
>> +        const struct fuse_write_in *in;
>> +        struct fuse_write_out *out;
>> +        const void *in_place_buf;
>> +        const void *spill_buf;
>> +
>> +        if (exp->uring_started) {
>> +            FuseUringEnt *ent = opaque;
>> +
>> +            in = in_buf;
>> +            out = out_buf;
>> +
>> +            assert(in->size <= ent->req_header.ring_ent_in_out.payload_sz);
>> 
>> +            /*
>> +             * In uring mode, the "out_buf" (ent->payload) actually holds the
>> +             * input data for WRITE requests.
>> +             */
>> +            in_place_buf = NULL;
>> +            spill_buf = out_buf;
>> +        } else {
>> +            FuseQueue *q = opaque;
>> +            in = FUSE_IN_OP_STRUCT_LEGACY(write, q);
>> +            if (!in) {
>> +                break;
>> +            }
>> +
>> +            out = FUSE_OUT_OP_STRUCT_LEGACY(write, out_buf);
>> +
>> +            /* Additional check for WRITE: verify the request includes data */
>> +            uint32_t req_len =
>> +                ((const struct fuse_in_header *)(q->request_buf))->len;
>> +
>> +            if (unlikely(req_len < sizeof(struct fuse_in_header) + sizeof(*in) +
>> +                        in->size)) {
>> +                warn_report("FUSE WRITE truncated; received %zu bytes of %"
>> +                    PRIu32,
>> +                    req_len - sizeof(struct fuse_in_header) - sizeof(*in),
>> +                    in->size);
>> +                ret = -EINVAL;
>> +                break;
>> +            }
>> +
>> +            /* Legacy buffer setup */
>> +            in_place_buf = in + 1;
>> +            spill_buf = spillover_buf;
>> +        }
>>         /*
>>          * poll_fuse_fd() has checked that in_hdr->len matches the number of
>>          * bytes read, which cannot exceed the max_write value we set
>> @@ -1856,13 +1917,24 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
>>          * fuse_co_write() takes care to copy its contents before potentially
>>          * yielding.
>>          */
>> -        ret = fuse_co_write(exp, FUSE_OUT_OP_STRUCT(write, out_buf),
>> -                            in->offset, in->size, in + 1, spillover_buf);
>> +        ret = fuse_co_write(exp, out, in->offset, in->size,
>> +                            in_place_buf, spill_buf);
>>         break;
>>     }
>> 
>>     case FUSE_FALLOCATE: {
>> -        const struct fuse_fallocate_in *in = FUSE_IN_OP_STRUCT(fallocate, q);
>> +        const struct fuse_fallocate_in *in;
>> +
>> +        if (exp->uring_started) {
>> +            in = in_buf;
>> +        } else {
>> +            FuseQueue *q = opaque;
>> +            in = FUSE_IN_OP_STRUCT_LEGACY(fallocate, q);
>> +            if (!in) {
>> +                break;
>> +            }
>> +        }
>> +
>>         ret = fuse_co_fallocate(exp, in->offset, in->length, in->mode);
>>         break;
>>     }
>> @@ -1877,9 +1949,23 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
>> 
>> #ifdef CONFIG_FUSE_LSEEK
>>     case FUSE_LSEEK: {
>> -        const struct fuse_lseek_in *in = FUSE_IN_OP_STRUCT(lseek, q);
>> -        ret = fuse_co_lseek(exp, FUSE_OUT_OP_STRUCT(lseek, out_buf),
>> -                            in->offset, in->whence);
>> +        const struct fuse_lseek_in *in;
>> +        struct fuse_lseek_out *out;
>> +
>> +        if (exp->uring_started) {
>> +            in = in_buf;
>> +            out = out_buf;
>> +        } else {
>> +            FuseQueue *q = opaque;
>> +            in = FUSE_IN_OP_STRUCT_LEGACY(lseek, q);
>> +            if (!in) {
>> +                break;
>> +            }
>> +
>> +            out = FUSE_OUT_OP_STRUCT_LEGACY(lseek, out_buf);
>> +        }
>> +
>> +        ret = fuse_co_lseek(exp, out, in->offset, in->whence);
>>         break;
>>     }
>> #endif
>> @@ -1888,20 +1974,12 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
>>         ret = -ENOSYS;
>>     }
>> 
>> -    /* Ignore errors from fuse_write*(), nothing we can do anyway */
>> +    send_response(opaque, req_id, ret, out_data_buffer, out_buf);
>> +
>>     if (out_data_buffer) {
>> -        assert(ret >= 0);
>> -        fuse_write_buf_response(q->fuse_fd, req_id, out_hdr,
>> -                                out_data_buffer, ret);
>>         qemu_vfree(out_data_buffer);
>> -    } else {
>> -        fuse_write_response(q->fuse_fd, req_id, out_hdr,
>> -                            ret < 0 ? ret : 0,
>> -                            ret < 0 ? 0 : ret);
>>     }
>> 
>> -    qemu_vfree(spillover_buf);
>> -
>> #ifdef CONFIG_LINUX_IO_URING
>>     if (unlikely(opcode == FUSE_INIT) && uring_initially_enabled) {
>>         if (exp->is_uring && !exp->uring_started) {
>> @@ -1910,7 +1988,8 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
>>              * If io_uring mode was requested for this export but it has not
>>              * been started yet, start it now.
>>              */
>> -            struct fuse_init_out *out = FUSE_OUT_OP_STRUCT(init, out_buf);
>> +            struct fuse_init_out *out =
>> +                FUSE_OUT_OP_STRUCT_LEGACY(init, out_buf);
>>             fuse_uring_start(exp, out);
>>         } else if (ret == -EOPNOTSUPP) {
>>             /*
>> @@ -1923,12 +2002,135 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
>>     }
>> #endif
>> }
>> +/* Helper to send response for legacy */
>> +static void send_response_legacy(void *opaque, uint32_t req_id, int ret,
>> +                                 const void *buf, void *out_buf)
>> +{
>> +    FuseQueue *q = (FuseQueue *)opaque;
>> +    struct fuse_out_header *out_hdr = (struct fuse_out_header *)out_buf;
>> +    if (buf) {
>> +        assert(ret >= 0);
>> +        fuse_write_buf_response(q->fuse_fd, req_id, out_hdr, buf, ret);
>> +    } else {
>> +        fuse_write_response(q->fuse_fd, req_id, out_hdr,
>> +                            ret < 0 ? ret : 0,
>> +                            ret < 0 ? 0 : ret);
>> +    }
>> +}
>> +
>> +static void coroutine_fn
>> +fuse_co_process_request(FuseQueue *q, void *spillover_buf)
>> +{
>> +    FuseExport *exp = q->exp;
>> +    uint32_t opcode;
>> +    uint64_t req_id;
>> +
>> +    /*
>> +     * Return buffer.  Must be large enough to hold all return headers, but does
>> +     * not include space for data returned by read requests.
>> +     */
>> +    char out_buf[sizeof(struct fuse_out_header) +
>> +        MAX_CONST(sizeof(struct fuse_init_out),
>> +        MAX_CONST(sizeof(struct fuse_open_out),
>> +        MAX_CONST(sizeof(struct fuse_attr_out),
>> +        MAX_CONST(sizeof(struct fuse_write_out),
>> +                  sizeof(struct fuse_lseek_out)))))] = {0};
>> +
>> +    /* Verify that out_buf is large enough for all output structures */
>> +    QEMU_BUILD_BUG_ON(sizeof(struct fuse_out_header) +
>> +        sizeof(struct fuse_init_out) > sizeof(out_buf));
>> +    QEMU_BUILD_BUG_ON(sizeof(struct fuse_out_header) +
>> +        sizeof(struct fuse_open_out) > sizeof(out_buf));
>> +    QEMU_BUILD_BUG_ON(sizeof(struct fuse_out_header) +
>> +        sizeof(struct fuse_attr_out) > sizeof(out_buf));
>> +    QEMU_BUILD_BUG_ON(sizeof(struct fuse_out_header) +
>> +        sizeof(struct fuse_write_out) > sizeof(out_buf));
>> +#ifdef CONFIG_FUSE_LSEEK
>> +    QEMU_BUILD_BUG_ON(sizeof(struct fuse_out_header) +
>> +        sizeof(struct fuse_lseek_out) > sizeof(out_buf));
>> +#endif
>> +
>> +    /* Limit scope to ensure pointer is no longer used after yielding */
>> +    {
>> +        const struct fuse_in_header *in_hdr =
>> +            (const struct fuse_in_header *)q->request_buf;
>> +
>> +        opcode = in_hdr->opcode;
>> +        req_id = in_hdr->unique;
>> +    }
>> +
>> +    fuse_co_process_request_common(exp, opcode, req_id, NULL, spillover_buf,
>> +                                   out_buf, send_response_legacy, q);
>> +}
>> 
>> #ifdef CONFIG_LINUX_IO_URING
>> +static void fuse_uring_prep_sqe_commit(struct io_uring_sqe *sqe, void *opaque)
>> +{
>> +    FuseUringEnt *ent = opaque;
>> +    struct fuse_uring_cmd_req *req = (void *)&sqe->cmd[0];
>> +
>> +    ent->last_cmd = FUSE_IO_URING_CMD_COMMIT_AND_FETCH;
>> +
>> +    fuse_uring_sqe_prepare(sqe, ent->rq->q, ent->last_cmd);
>> +    fuse_uring_sqe_set_req_data(req, ent->rq->rqid, ent->req_commit_id);
>> +}
>> +
>> +static void
>> +fuse_uring_send_response(FuseUringEnt *ent, uint32_t req_id, int ret,
>> +                         const void *out_data_buffer)
>> +{
>> +    FuseExport *exp = ent->rq->q->exp;
>> +
>> +    struct fuse_uring_req_header *rrh = &ent->req_header;
>> +    struct fuse_out_header *out_header = (struct fuse_out_header *)&rrh->in_out;
>> +    struct fuse_uring_ent_in_out *ent_in_out =
>> +        (struct fuse_uring_ent_in_out *)&rrh->ring_ent_in_out;
>> +
>> +    /* FUSE_READ */
>> +    if (out_data_buffer && ret > 0) {
>> +        memcpy(ent->req_payload, out_data_buffer, ret);
>> +    }
>> +
>> +    out_header->error  = ret < 0 ? ret : 0;
>> +    out_header->unique = req_id;
>> +    ent_in_out->payload_sz = ret > 0 ? ret : 0;
>> +
>> +    /* Commit and fetch a uring entry */
>> +    blk_exp_ref(&exp->common);
>> +    aio_add_sqe(fuse_uring_prep_sqe_commit, ent, &ent->fuse_cqe_handler);
>> +}
>> +
>> +/* Helper to send response for uring */
>> +static void send_response_uring(void *opaque, uint32_t req_id, int ret,
>> +                                const void *out_data_buffer, void *payload)
>> +{
>> +    FuseUringEnt *ent = (FuseUringEnt *)opaque;
>> +
>> +    fuse_uring_send_response(ent, req_id, ret, out_data_buffer);
>> +}
>> +
>> static void coroutine_fn fuse_uring_co_process_request(FuseUringEnt *ent)
>> {
>> -    /* TODO */
>> -    (void)ent;
>> +    FuseExport *exp = ent->rq->q->exp;
>> +    struct fuse_uring_req_header *rrh = &ent->req_header;
>> +    struct fuse_uring_ent_in_out *ent_in_out =
>> +        (struct fuse_uring_ent_in_out *)&rrh->ring_ent_in_out;
>> +    struct fuse_in_header *in_hdr =
>> +        (struct fuse_in_header *)&rrh->in_out;
>> +    uint32_t opcode = in_hdr->opcode;
>> +    uint64_t req_id = in_hdr->unique;
>> +
>> +    ent->req_commit_id = ent_in_out->commit_id;
>> +
>> +    if (unlikely(ent->req_commit_id == 0)) {
>> +        error_report("If this happens kernel will not find the response - "
>> +            "it will be stuck forever - better to abort immediately.");
>> +        fuse_export_halt(exp);
>> +        return;
>> +    }
>> +
>> +    fuse_co_process_request_common(exp, opcode, req_id, &rrh->op_in,
>> +        NULL, ent->req_payload, send_response_uring, ent);
>> }
>> #endif /* CONFIG_LINUX_IO_URING */
>> 
>> --
>> 2.43.0




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch v4 2/7] fuse: io_uring mode init
  2026-02-11 20:56   ` Stefan Hajnoczi
@ 2026-02-13  8:40     ` Brian Song
  2026-02-13  9:03     ` Brian Song
  1 sibling, 0 replies; 21+ messages in thread
From: Brian Song @ 2026-02-13  8:40 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-block, qemu-devel, hreitz, kwolf, eblake, armbru, fam, bernd



> On Feb 12, 2026, at 04:56, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Sat, Feb 07, 2026 at 08:08:56PM +0800, Brian Song wrote:
>> +typedef struct FuseUringEnt {
>> +    /* back pointer */
>> +    FuseUringQueue *rq;
>> +
>> +    /* commit id of a fuse request */
>> +    uint64_t req_commit_id;
> 
> This field is only read by fuse_uring_resubmit() in this patch, never
> assigned. Maybe some code later in the patch series should be squashed
> into this patch so that this field is initialized somewhere?
> 

Yes, req_commit_id is assigned in fuse_uring_co_process_request(), so I left it empty in this patch.
For the same reason, I will probably keep fuse_uring_resubmit() empty for now and introduce req_commit_id later.

> Also, is it possible to drop req_commit_id and read from the existing
> req_header.ring_ent_in_out.commit_id field instead?

Of course. I just did the same as in libfuse.

Both members, req_commit_id and payload_sz, of struct fuse_ring_ent are already present in req_header, but we need to dereference multiple times to access them. That is probably why these duplicated members exist here.

I think I will remove req_commit_id and req_payload_sz from struct FuseUringEnt and add a comment indicating where they can be found.

> 
>> +static void fuse_uring_cqe_handler(CqeHandler *cqe_handler)
>> +{
>> +    Coroutine *co;
>> +    FuseUringEnt *ent =
>> +        container_of(cqe_handler, FuseUringEnt, fuse_cqe_handler);
>> +    FuseExport *exp = ent->rq->q->exp;
>> +
>> +    if (unlikely(exp->halted)) {
>> +        return;
> 
> Missing blk_exp_unref()?

Yes, missed one in this patch..

> 
>> +    }
>> +
>> +    int err = cqe_handler->cqe.res;
>> +
>> +    if (unlikely(err != 0)) {
>> +        switch (err) {
>> +        case -EAGAIN:
>> +        case -EINTR:
>> +            aio_add_sqe(fuse_uring_resubmit, ent, &ent->fuse_cqe_handler);
>> +            break;
>> +        case -ENOTCONN:
>> +            /* Connection already gone */
>> +            break;
>> +        default:
>> +            fuse_export_halt(exp);
>> +            break;
>> +        }
>> +
>> +        /* A uring entry returned */
>> +        blk_exp_unref(&exp->common);
>> +    } else {
>> +        co = qemu_coroutine_create(co_fuse_uring_queue_handle_cqe, ent);
>> +        /* Account this request as in-flight */
>> +        fuse_inc_in_flight(exp);
>> +        qemu_coroutine_enter(co);
>> +    }
>> +}
>> +
>> +static void
>> +fuse_uring_sqe_set_req_data(struct fuse_uring_cmd_req *req,
>> +                            const unsigned int rqid,
>> +                            const unsigned int commit_id)
>> +{
>> +    req->qid = rqid;
>> +    req->commit_id = commit_id;
>> +    req->flags = 0;
>> +}
>> +
>> +static void
>> +fuse_uring_sqe_prepare(struct io_uring_sqe *sqe, FuseQueue *q, __u32 cmd_op)
>> +{
>> +    sqe->opcode = IORING_OP_URING_CMD;
>> +
>> +    sqe->fd = q->fuse_fd;
>> +    sqe->rw_flags = 0;
>> +    sqe->ioprio = 0;
>> +    sqe->off = 0;
>> +
>> +    sqe->cmd_op = cmd_op;
>> +    sqe->__pad1 = 0;
>> +}
>> +
>> +static void fuse_uring_prep_sqe_register(struct io_uring_sqe *sqe, void *opaque)
>> +{
>> +    FuseUringEnt *ent = opaque;
>> +    struct fuse_uring_cmd_req *req = (void *)&sqe->cmd[0];
>> +
>> +    ent->last_cmd = FUSE_IO_URING_CMD_REGISTER;
>> +    fuse_uring_sqe_prepare(sqe, ent->rq->q, ent->last_cmd);
>> +
>> +    sqe->addr = (uint64_t)(ent->iov);
>> +    sqe->len = 2;
>> +
>> +    fuse_uring_sqe_set_req_data(req, ent->rq->rqid, 0);
>> +}
>> +
>> +static void fuse_uring_resubmit(struct io_uring_sqe *sqe, void *opaque)
>> +{
>> +    FuseUringEnt *ent = opaque;
>> +    struct fuse_uring_cmd_req *req = (void *)&sqe->cmd[0];
>> +
>> +    fuse_uring_sqe_prepare(sqe, ent->rq->q, ent->last_cmd);
>> +
>> +    switch (ent->last_cmd) {
>> +    case FUSE_IO_URING_CMD_REGISTER:
>> +        sqe->addr = (uint64_t)(ent->iov);
>> +        sqe->len = 2;
>> +        fuse_uring_sqe_set_req_data(req, ent->rq->rqid, 0);
>> +        break;
>> +    case FUSE_IO_URING_CMD_COMMIT_AND_FETCH:
>> +        fuse_uring_sqe_set_req_data(req, ent->rq->rqid, ent->req_commit_id);
>> +        break;
>> +    default:
>> +        error_report("Unknown command type: %d", ent->last_cmd);
>> +        break;
>> +    }
>> +}
>> +
>> +static void fuse_uring_submit_register(void *opaque)
>> +{
>> +    FuseUringQueue *rq = opaque;
>> +    FuseExport *exp = rq->q->exp;
>> +
>> +    for (int j = 0; j < exp->uring_queue_depth; j++) {
>> +        /* Register a uring entry */
>> +        blk_exp_ref(&exp->common);
>> +
>> +        aio_add_sqe(fuse_uring_prep_sqe_register, &rq->ent[j],
>> +                    &rq->ent[j].fuse_cqe_handler);
>> +    }
>> +}
>> +
>> +/**
>> + * Distribute uring queues across FUSE queues in the round-robin manner.
>> + * This ensures even distribution of kernel uring queues across user-specified
>> + * FUSE queues.
>> + *
>> + * num_uring_queues > num_fuse_queues: Each IOThread manages multiple uring
>> + * queues (multi-queue mapping).
>> + * num_uring_queues < num_fuse_queues: Excess IOThreads remain idle with no
>> + * assigned uring queues.
>> + */
>> +static void fuse_uring_setup_queues(FuseExport *exp, size_t bufsize)
>> +{
>> +    int num_uring_queues = get_nprocs_conf();
>> +
>> +    exp->num_uring_queues = num_uring_queues;
>> +    exp->uring_queues = g_new(FuseUringQueue, num_uring_queues);
>> +
>> +    for (int i = 0; i < num_uring_queues; i++) {
>> +        FuseUringQueue *rq = &exp->uring_queues[i];
>> +        rq->rqid = i;
>> +        rq->ent = g_new(FuseUringEnt, exp->uring_queue_depth);
>> +
>> +        for (int j = 0; j < exp->uring_queue_depth; j++) {
>> +            FuseUringEnt *ent = &rq->ent[j];
>> +            ent->rq = rq;
>> +            ent->req_payload_sz = bufsize - FUSE_BUFFER_HEADER_SIZE;
>> +            ent->req_payload = g_malloc0(ent->req_payload_sz);
> 
> I don't see a corresponding g_free() in this patch? Exports can be
> deleted at runtime, so this memory must be freed.
> 
>> +
>> +            ent->iov[0] = (struct iovec) {
>> +                &ent->req_header,
>> +                sizeof(struct fuse_uring_req_header)
>> +            };
>> +            ent->iov[1] = (struct iovec) {
>> +                ent->req_payload,
>> +                ent->req_payload_sz
>> +            };
>> +
>> +            ent->fuse_cqe_handler.cb = fuse_uring_cqe_handler;
>> +        }
>> +
>> +        /* Distribute uring queues across FUSE queues */
>> +        rq->q = &exp->queues[i % exp->num_fuse_queues];
>> +        QLIST_INSERT_HEAD(&(rq->q->uring_queue_list), rq, next);
>> +    }
>> +}
>> +
>> +static void
>> +fuse_schedule_ring_queue_registrations(FuseExport *exp)
>> +{
>> +    for (int i = 0; i < exp->num_fuse_queues; i++) {
>> +        FuseQueue *q = &exp->queues[i];
>> +        FuseUringQueue *rq;
>> +
>> +        QLIST_FOREACH(rq, &q->uring_queue_list, next) {
>> +            aio_bh_schedule_oneshot(q->ctx, fuse_uring_submit_register, rq);
>> +        }
>> +    }
>> +}
>> +
>> +static void fuse_uring_start(FuseExport *exp, struct fuse_init_out *out)
>> +{
>> +    assert(!exp->uring_started);
>> +    exp->uring_started = true;
>> +
>> +    /*
>> +     * Since we dont't enable the FUSE_MAX_PAGES feature, the value of
> 
> s/dont't/don't/



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch v4 2/7] fuse: io_uring mode init
  2026-02-11 20:56   ` Stefan Hajnoczi
  2026-02-13  8:40     ` Brian Song
@ 2026-02-13  9:03     ` Brian Song
  2026-02-19 14:42       ` Stefan Hajnoczi
  1 sibling, 1 reply; 21+ messages in thread
From: Brian Song @ 2026-02-13  9:03 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-block, qemu-devel, hreitz, kwolf, eblake, armbru, fam, bernd



> On Feb 12, 2026, at 04:56, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> On Sat, Feb 07, 2026 at 08:08:56PM +0800, Brian Song wrote:
>> +typedef struct FuseUringEnt {
>> +    /* back pointer */
>> +    FuseUringQueue *rq;
>> +
>> +    /* commit id of a fuse request */
>> +    uint64_t req_commit_id;

[...]

>> +
>> +/**
>> + * Distribute uring queues across FUSE queues in the round-robin manner.
>> + * This ensures even distribution of kernel uring queues across user-specified
>> + * FUSE queues.
>> + *
>> + * num_uring_queues > num_fuse_queues: Each IOThread manages multiple uring
>> + * queues (multi-queue mapping).
>> + * num_uring_queues < num_fuse_queues: Excess IOThreads remain idle with no
>> + * assigned uring queues.
>> + */
>> +static void fuse_uring_setup_queues(FuseExport *exp, size_t bufsize)
>> +{
>> +    int num_uring_queues = get_nprocs_conf();
>> +
>> +    exp->num_uring_queues = num_uring_queues;
>> +    exp->uring_queues = g_new(FuseUringQueue, num_uring_queues);
>> +
>> +    for (int i = 0; i < num_uring_queues; i++) {
>> +        FuseUringQueue *rq = &exp->uring_queues[i];
>> +        rq->rqid = i;
>> +        rq->ent = g_new(FuseUringEnt, exp->uring_queue_depth);
>> +
>> +        for (int j = 0; j < exp->uring_queue_depth; j++) {
>> +            FuseUringEnt *ent = &rq->ent[j];
>> +            ent->rq = rq;
>> +            ent->req_payload_sz = bufsize - FUSE_BUFFER_HEADER_SIZE;
>> +            ent->req_payload = g_malloc0(ent->req_payload_sz);
> 
> I don't see a corresponding g_free() in this patch? Exports can be
> deleted at runtime, so this memory must be freed.
> 

ent->req_payload is deleted in fuse_export_delete_uring() in a later patch.
Should we merge patch 5 into this one?

>> +
>> +            ent->iov[0] = (struct iovec) {
>> +                &ent->req_header,
>> +                sizeof(struct fuse_uring_req_header)
>> +            };
>> +            ent->iov[1] = (struct iovec) {
>> +                ent->req_payload,
>> +                ent->req_payload_sz
>> +            };
>> +
>> +            ent->fuse_cqe_handler.cb = fuse_uring_cqe_handler;
>> +        }
>> +
>> +        /* Distribute uring queues across FUSE queues */
>> +        rq->q = &exp->queues[i % exp->num_fuse_queues];
>> +        QLIST_INSERT_HEAD(&(rq->q->uring_queue_list), rq, next);
>> +    }
>> +}
>> +
>> +static void
>> +fuse_schedule_ring_queue_registrations(FuseExport *exp)
>> +{
>> +    for (int i = 0; i < exp->num_fuse_queues; i++) {
>> +        FuseQueue *q = &exp->queues[i];
>> +        FuseUringQueue *rq;
>> +
>> +        QLIST_FOREACH(rq, &q->uring_queue_list, next) {
>> +            aio_bh_schedule_oneshot(q->ctx, fuse_uring_submit_register, rq);
>> +        }
>> +    }
>> +}
>> +
>> +static void fuse_uring_start(FuseExport *exp, struct fuse_init_out *out)
>> +{
>> +    assert(!exp->uring_started);
>> +    exp->uring_started = true;
>> +
>> +    /*
>> +     * Since we dont't enable the FUSE_MAX_PAGES feature, the value of
> 
> s/dont't/don't/



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Patch v4 2/7] fuse: io_uring mode init
  2026-02-13  9:03     ` Brian Song
@ 2026-02-19 14:42       ` Stefan Hajnoczi
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2026-02-19 14:42 UTC (permalink / raw)
  To: Brian Song
  Cc: qemu-block, qemu-devel, hreitz, kwolf, eblake, armbru, fam, bernd

[-- Attachment #1: Type: text/plain, Size: 2249 bytes --]

On Fri, Feb 13, 2026 at 05:03:23PM +0800, Brian Song wrote:
> 
> 
> > On Feb 12, 2026, at 04:56, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> > 
> > On Sat, Feb 07, 2026 at 08:08:56PM +0800, Brian Song wrote:
> >> +typedef struct FuseUringEnt {
> >> +    /* back pointer */
> >> +    FuseUringQueue *rq;
> >> +
> >> +    /* commit id of a fuse request */
> >> +    uint64_t req_commit_id;
> 
> [...]
> 
> >> +
> >> +/**
> >> + * Distribute uring queues across FUSE queues in the round-robin manner.
> >> + * This ensures even distribution of kernel uring queues across user-specified
> >> + * FUSE queues.
> >> + *
> >> + * num_uring_queues > num_fuse_queues: Each IOThread manages multiple uring
> >> + * queues (multi-queue mapping).
> >> + * num_uring_queues < num_fuse_queues: Excess IOThreads remain idle with no
> >> + * assigned uring queues.
> >> + */
> >> +static void fuse_uring_setup_queues(FuseExport *exp, size_t bufsize)
> >> +{
> >> +    int num_uring_queues = get_nprocs_conf();
> >> +
> >> +    exp->num_uring_queues = num_uring_queues;
> >> +    exp->uring_queues = g_new(FuseUringQueue, num_uring_queues);
> >> +
> >> +    for (int i = 0; i < num_uring_queues; i++) {
> >> +        FuseUringQueue *rq = &exp->uring_queues[i];
> >> +        rq->rqid = i;
> >> +        rq->ent = g_new(FuseUringEnt, exp->uring_queue_depth);
> >> +
> >> +        for (int j = 0; j < exp->uring_queue_depth; j++) {
> >> +            FuseUringEnt *ent = &rq->ent[j];
> >> +            ent->rq = rq;
> >> +            ent->req_payload_sz = bufsize - FUSE_BUFFER_HEADER_SIZE;
> >> +            ent->req_payload = g_malloc0(ent->req_payload_sz);
> > 
> > I don't see a corresponding g_free() in this patch? Exports can be
> > deleted at runtime, so this memory must be freed.
> > 
> 
> ent->req_payload is deleted in fuse_export_delete_uring() in a later patch.
> Should we merge patch 5 into this one?

Unless there is a strong reason why the free needs to be deferred to a
later, it's safer to include it in the same patch that allocates the
memory. It makes code review easier and backporting patches safer (no
chance of forgetting to backport the other patch that frees the memory).

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH] fuse: unify op_in for io_uring and classic FUSE
  2026-02-13  8:07     ` Brian Song
@ 2026-02-19 21:14       ` Stefan Hajnoczi
  0 siblings, 0 replies; 21+ messages in thread
From: Stefan Hajnoczi @ 2026-02-19 21:14 UTC (permalink / raw)
  To: Brian Song
  Cc: qemu-block, qemu-devel, hreitz, kwolf, eblake, armbru, fam, bernd,
	Stefan Hajnoczi

Handle the memory layout differences for op_in in
fuse_co_process_request_common()'s callers. This way it's not necessary
to check whether uring is started.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/export/fuse.c | 118 ++++++++++++--------------------------------
 1 file changed, 32 insertions(+), 86 deletions(-)

Hi Brian,
You mentioned that FUSE_IN_OP_STRUCT_LEGACY() is harder to eliminate so
I took a look at it this morning. This patch compiles but I have not
tested it.

If you like this approach, please include it in your series and write a
similar patch for FUSE_OUT_OP_STRUCT_LEGACY().

diff --git a/block/export/fuse.c b/block/export/fuse.c
index e5b753e717..d4f393c2eb 100644
--- a/block/export/fuse.c
+++ b/block/export/fuse.c
@@ -137,8 +137,7 @@ struct FuseQueue {
      * The request buffer must be able to hold a full write, and/or at least
      * FUSE_MIN_READ_BUFFER (from linux/fuse.h) bytes.
      * This however is just the first part of the buffer; every read is given
-     * a vector of this buffer (which should be enough for all normal requests,
-     * which we check via the static assertion in FUSE_IN_OP_STRUCT_LEGACY())
+     * a vector of this buffer (which should be enough for all normal requests)
      * and the spill-over buffer below.
      * Therefore, the size of this buffer plus FUSE_SPILLOVER_BUF_SIZE must be
      * FUSE_MIN_READ_BUFFER or more (checked via static assertion below).
@@ -1765,33 +1764,24 @@ static int fuse_write_buf_response(int fd, uint32_t req_id,
 
 /*
  * For use in fuse_co_process_request_common():
- * Returns a pointer to the parameter object for the given operation (inside of
- * in_buf, which is assumed to hold a fuse_in_header first).
- * Verifies that the object is complete (in_buf is large enough to hold it in
- * one piece, and the request length includes the whole object).
- * Only performs verification for legacy FUSE.
+ * Returns a pointer to the parameter object for the given operation.
+ * Verifies that the object is complete (the request length includes the whole
+ * object).
  *
  * Note that queue->request_buf may be overwritten after yielding, so the
  * returned pointer must not be used across a function that may yield!
  */
-#define FUSE_IN_OP_STRUCT_LEGACY(op_name, queue) \
+#define FUSE_IN_OP_STRUCT(op_name) \
     ({ \
-        const struct fuse_in_header *__in_hdr = \
-            (const struct fuse_in_header *)(queue)->request_buf; \
-        const struct fuse_##op_name##_in *__in = \
-            (const struct fuse_##op_name##_in *)(__in_hdr + 1); \
-        const size_t __param_len = sizeof(*__in_hdr) + sizeof(*__in); \
-        \
-        QEMU_BUILD_BUG_ON(sizeof((queue)->request_buf) < \
-                          (sizeof(struct fuse_in_header) + \
-                           sizeof(struct fuse_##op_name##_in))); \
-        \
-        uint32_t __req_len = __in_hdr->len; \
-        if (__req_len < __param_len) { \
-            warn_report("FUSE request truncated (%" PRIu32 " < %zu)", \
-                        __req_len, __param_len); \
+        const struct fuse_##op_name##_in *__in = op_in; \
+        const size_t __needed = sizeof(struct fuse_in_header) + \
+                                sizeof(*__in); \
+        QEMU_BUILD_BUG_ON(sizeof(((FuseQueue *)0)->request_buf) < __needed); \
+        if (req_len < __needed) { \
+            warn_report("FUSE request with opcode %u truncated (%u < %zu)", \
+                        opcode, req_len, __needed); \
             ret = -EINVAL; \
-            __in = NULL; \
+            break; \
         } \
         __in; \
     })
@@ -1822,9 +1812,10 @@ static int fuse_write_buf_response(int fd, uint32_t req_id,
  */
 static void coroutine_fn fuse_co_process_request_common(
     FuseExport *exp,
+    uint32_t req_len,
     uint32_t opcode,
     uint64_t req_id,
-    void *in_buf,
+    const void *op_in,
     void *spillover_buf,
     void *out_buf,
     void (*send_response)(void *opaque, uint32_t req_id, int ret,
@@ -1849,13 +1840,7 @@ static void coroutine_fn fuse_co_process_request_common(
 
     switch (opcode) {
     case FUSE_INIT: {
-        FuseQueue *q = opaque;
-        const struct fuse_init_in *in =
-            FUSE_IN_OP_STRUCT_LEGACY(init, q);
-        if (!in) {
-            break;
-        }
-
+        const struct fuse_init_in *in = FUSE_IN_OP_STRUCT(init);
         struct fuse_init_out *out =
             FUSE_OUT_OP_STRUCT_LEGACY(init, out_buf);
 
@@ -1898,19 +1883,12 @@ static void coroutine_fn fuse_co_process_request_common(
     }
 
     case FUSE_SETATTR: {
-        const struct fuse_setattr_in *in;
+        const struct fuse_setattr_in *in = FUSE_IN_OP_STRUCT(setattr);
         struct fuse_attr_out *out;
 
         if (exp->uring_started) {
-            in = in_buf;
             out = out_buf;
         } else {
-            FuseQueue *q = opaque;
-            in = FUSE_IN_OP_STRUCT_LEGACY(setattr, q);
-            if (!in) {
-                break;
-            }
-
             out = FUSE_OUT_OP_STRUCT_LEGACY(attr, out_buf);
         }
 
@@ -1920,24 +1898,13 @@ static void coroutine_fn fuse_co_process_request_common(
     }
 
     case FUSE_READ: {
-        const struct fuse_read_in *in;
-
-        if (exp->uring_started) {
-            in = in_buf;
-        } else {
-            FuseQueue *q = opaque;
-            in = FUSE_IN_OP_STRUCT_LEGACY(read, q);
-            if (!in) {
-                break;
-            }
-        }
-
+        const struct fuse_read_in *in = FUSE_IN_OP_STRUCT(read);
         ret = fuse_co_read(exp, &out_data_buffer, in->offset, in->size);
         break;
     }
 
     case FUSE_WRITE: {
-        const struct fuse_write_in *in;
+        const struct fuse_write_in *in = FUSE_IN_OP_STRUCT(write);
         struct fuse_write_out *out;
         const void *in_place_buf;
         const void *spill_buf;
@@ -1945,7 +1912,6 @@ static void coroutine_fn fuse_co_process_request_common(
         if (exp->uring_started) {
             FuseUringEnt *ent = opaque;
 
-            in = in_buf;
             out = out_buf;
 
             assert(in->size <= ent->req_header.ring_ent_in_out.payload_sz);
@@ -1957,17 +1923,8 @@ static void coroutine_fn fuse_co_process_request_common(
             in_place_buf = NULL;
             spill_buf = out_buf;
         } else {
-            FuseQueue *q = opaque;
-            in = FUSE_IN_OP_STRUCT_LEGACY(write, q);
-            if (!in) {
-                break;
-            }
-
-            out = FUSE_OUT_OP_STRUCT_LEGACY(write, out_buf);
-
             /* Additional check for WRITE: verify the request includes data */
-            uint32_t req_len =
-                ((const struct fuse_in_header *)(q->request_buf))->len;
+            out = FUSE_OUT_OP_STRUCT_LEGACY(write, out_buf);
 
             if (unlikely(req_len < sizeof(struct fuse_in_header) + sizeof(*in) +
                         in->size)) {
@@ -2002,18 +1959,7 @@ static void coroutine_fn fuse_co_process_request_common(
     }
 
     case FUSE_FALLOCATE: {
-        const struct fuse_fallocate_in *in;
-
-        if (exp->uring_started) {
-            in = in_buf;
-        } else {
-            FuseQueue *q = opaque;
-            in = FUSE_IN_OP_STRUCT_LEGACY(fallocate, q);
-            if (!in) {
-                break;
-            }
-        }
-
+        const struct fuse_fallocate_in *in = FUSE_IN_OP_STRUCT(fallocate);
         ret = fuse_co_fallocate(exp, in->offset, in->length, in->mode);
         break;
     }
@@ -2028,19 +1974,12 @@ static void coroutine_fn fuse_co_process_request_common(
 
 #ifdef CONFIG_FUSE_LSEEK
     case FUSE_LSEEK: {
-        const struct fuse_lseek_in *in;
+        const struct fuse_lseek_in *in = FUSE_IN_OP_STRUCT(lseek);
         struct fuse_lseek_out *out;
 
         if (exp->uring_started) {
-            in = in_buf;
             out = out_buf;
         } else {
-            FuseQueue *q = opaque;
-            in = FUSE_IN_OP_STRUCT_LEGACY(lseek, q);
-            if (!in) {
-                break;
-            }
-
             out = FUSE_OUT_OP_STRUCT_LEGACY(lseek, out_buf);
         }
 
@@ -2081,6 +2020,7 @@ static void coroutine_fn fuse_co_process_request_common(
     }
 #endif
 }
+
 /* Helper to send response for legacy */
 static void send_response_legacy(void *opaque, uint32_t req_id, int ret,
                                  const void *buf, void *out_buf)
@@ -2101,8 +2041,10 @@ static void coroutine_fn
 fuse_co_process_request(FuseQueue *q, void *spillover_buf)
 {
     FuseExport *exp = q->exp;
+    uint32_t req_len;
     uint32_t opcode;
     uint64_t req_id;
+    const void *op_in;
 
     /*
      * Return buffer.  Must be large enough to hold all return headers, but does
@@ -2134,12 +2076,15 @@ fuse_co_process_request(FuseQueue *q, void *spillover_buf)
         const struct fuse_in_header *in_hdr =
             (const struct fuse_in_header *)q->request_buf;
 
+        req_len = in_hdr->len;
         opcode = in_hdr->opcode;
         req_id = in_hdr->unique;
+        op_in = in_hdr + 1;
     }
 
-    fuse_co_process_request_common(exp, opcode, req_id, NULL, spillover_buf,
-                                   out_buf, send_response_legacy, q);
+    fuse_co_process_request_common(exp, req_len, opcode, req_id, op_in,
+            spillover_buf, out_buf,
+            send_response_legacy, q);
 }
 
 #ifdef CONFIG_LINUX_IO_URING
@@ -2196,6 +2141,7 @@ static void coroutine_fn fuse_uring_co_process_request(FuseUringEnt *ent)
         (struct fuse_uring_ent_in_out *)&rrh->ring_ent_in_out;
     struct fuse_in_header *in_hdr =
         (struct fuse_in_header *)&rrh->in_out;
+    uint32_t req_len = in_hdr->len;
     uint32_t opcode = in_hdr->opcode;
     uint64_t req_id = in_hdr->unique;
 
@@ -2208,7 +2154,7 @@ static void coroutine_fn fuse_uring_co_process_request(FuseUringEnt *ent)
         return;
     }
 
-    fuse_co_process_request_common(exp, opcode, req_id, &rrh->op_in,
+    fuse_co_process_request_common(exp, req_len, opcode, req_id, &rrh->op_in,
         NULL, ent->req_payload, send_response_uring, ent);
 }
 #endif /* CONFIG_LINUX_IO_URING */
-- 
2.53.0



^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2026-02-19 21:17 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-02-07 12:08 [PATCH v4 0/7] add fuse-over-io_uring support Brian Song
2026-02-07 12:08 ` [Patch v4 1/7] aio-posix: enable 128-byte SQEs Brian Song
2026-02-11 20:28   ` Stefan Hajnoczi
2026-02-07 12:08 ` [Patch v4 2/7] fuse: io_uring mode init Brian Song
2026-02-11 20:56   ` Stefan Hajnoczi
2026-02-13  8:40     ` Brian Song
2026-02-13  9:03     ` Brian Song
2026-02-19 14:42       ` Stefan Hajnoczi
2026-02-07 12:08 ` [Patch v4 3/7] fuse: uring support for write ops Brian Song
2026-02-11 21:08   ` Stefan Hajnoczi
2026-02-07 12:08 ` [Patch v4 4/7] fuse: refactor FUSE request handler Brian Song
2026-02-11 21:21   ` Stefan Hajnoczi
2026-02-13  8:07     ` Brian Song
2026-02-19 21:14       ` [PATCH] fuse: unify op_in for io_uring and classic FUSE Stefan Hajnoczi
2026-02-07 12:08 ` [Patch v4 5/6] fuse: safe termination for io_uring Brian Song
2026-02-11 21:52   ` Stefan Hajnoczi
2026-02-07 12:09 ` [Patch v4 6/7] fuse: add 'io-uring' option Brian Song
2026-02-09  5:24   ` Markus Armbruster
2026-02-11 22:50   ` Stefan Hajnoczi
2026-02-07 12:09 ` [Patch v4 7/7] fuse: add io_uring test support Brian Song
2026-02-11 21:53   ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.