All of lore.kernel.org
 help / color / mirror / Atom feed
From: Brian Song <hibriansong@gmail.com>
To: qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, armbru@redhat.com, bernd@bsbernd.com,
	fam@euphon.net, hreitz@redhat.com, kwolf@redhat.com,
	stefanha@redhat.com
Subject: Re: [PATCH 1/3] fuse: add FUSE-over-io_uring enable opt and init
Date: Sat, 16 Aug 2025 19:13:53 -0400	[thread overview]
Message-ID: <beb43845-a761-4031-a7b7-aaca56abb6de@gmail.com> (raw)
In-Reply-To: <20250815034619.51980-2-hizhisong@gmail.com>



On 8/14/25 11:46 PM, Brian Song wrote:
> From: Brian Song <hibriansong@gmail.com>
> 
> This patch adds a new export option for storage-export-daemon to enable
> or disable FUSE-over-io_uring via the switch io-uring=on|off (disable
> by default). It also implements the protocol handshake with the Linux
> kernel during the FUSE-over-io_uring initialization phase.
> 
> See: https://docs.kernel.org/filesystems/fuse-io-uring.html
> 
> The kernel documentation describes in detail how FUSE-over-io_uring
> works. This patch implements the Initial SQE stage shown in thediagram:
> it initializes one queue per IOThread, each currently supporting a
> single submission queue entry (SQE). When the FUSE driver sends the
> first FUSE request (FUSE_INIT), storage-export-daemon calls
> fuse_uring_start() to complete initialization, ultimately submitting
> the SQE with the FUSE_IO_URING_CMD_REGISTER command to confirm
> successful initialization with the kernel.
> 
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
> Signed-off-by: Brian Song <hibriansong@gmail.com>
> ---
>   block/export/fuse.c                  | 161 ++++++++++++++++++++++++---
>   docs/tools/qemu-storage-daemon.rst   |  11 +-
>   qapi/block-export.json               |   5 +-
>   storage-daemon/qemu-storage-daemon.c |   1 +
>   util/fdmon-io_uring.c                |   5 +-
>   5 files changed, 159 insertions(+), 24 deletions(-)
> 
> diff --git a/block/export/fuse.c b/block/export/fuse.c
> index c0ad4696ce..59fa79f486 100644
> --- a/block/export/fuse.c
> +++ b/block/export/fuse.c
> @@ -48,6 +48,11 @@
>   #include <linux/fs.h>
>   #endif
> 
> +#define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32
> +
> +/* room needed in buffer to accommodate header */
> +#define FUSE_BUFFER_HEADER_SIZE 0x1000
> +
>   /* Prevent overly long bounce buffer allocations */
>   #define FUSE_MAX_READ_BYTES (MIN(BDRV_REQUEST_MAX_BYTES, 1 * 1024 * 1024))
>   /*
> @@ -63,12 +68,31 @@
>       (FUSE_MAX_WRITE_BYTES - FUSE_IN_PLACE_WRITE_BYTES)
> 
>   typedef struct FuseExport FuseExport;
> +typedef struct FuseQueue FuseQueue;
> +
> +typedef struct FuseRingEnt {
> +    /* back pointer */
> +    FuseQueue *q;
> +
> +    /* commit id of a fuse request */
> +    uint64_t req_commit_id;
> +
> +    /* fuse request header and payload */
> +    struct fuse_uring_req_header req_header;
> +    void *op_payload;
> +    size_t req_payload_sz;
> +
> +    /* The vector passed to the kernel */
> +    struct iovec iov[2];
> +
> +    CqeHandler fuse_cqe_handler;
> +} FuseRingEnt;
> 
>   /*
>    * One FUSE "queue", representing one FUSE FD from which requests are fetched
>    * and processed.  Each queue is tied to an AioContext.
>    */
> -typedef struct FuseQueue {
> +struct FuseQueue {
>       FuseExport *exp;
> 
>       AioContext *ctx;
> @@ -109,7 +133,12 @@ typedef struct FuseQueue {
>        * Free this buffer with qemu_vfree().
>        */
>       void *spillover_buf;
> -} FuseQueue;
> +
> +#ifdef CONFIG_LINUX_IO_URING
> +    int qid;
> +    FuseRingEnt ent;
> +#endif
> +};
> 
>   /*
>    * Verify that FuseQueue.request_buf plus the spill-over buffer together
> @@ -148,6 +177,7 @@ struct FuseExport {
>       bool growable;
>       /* Whether allow_other was used as a mount option or not */
>       bool allow_other;
> +    bool is_uring;
> 
>       mode_t st_mode;
>       uid_t st_uid;
> @@ -257,6 +287,93 @@ static const BlockDevOps fuse_export_blk_dev_ops = {
>       .drained_poll  = fuse_export_drained_poll,
>   };
> 
> +#ifdef CONFIG_LINUX_IO_URING
> +
> +static void fuse_uring_sqe_set_req_data(struct fuse_uring_cmd_req *req,
> +                    const unsigned int qid,
> +                    const unsigned int commit_id)
> +{
> +    req->qid = qid;
> +    req->commit_id = commit_id;
> +    req->flags = 0;
> +}
> +
> +static void fuse_uring_sqe_prepare(struct io_uring_sqe *sqe, FuseQueue *q,
> +               __u32 cmd_op)
> +{
> +    sqe->opcode = IORING_OP_URING_CMD;
> +
> +    sqe->fd = q->fuse_fd;
> +    sqe->rw_flags = 0;
> +    sqe->ioprio = 0;
> +    sqe->off = 0;
> +
> +    sqe->cmd_op = cmd_op;
> +    sqe->__pad1 = 0;
> +}
> +
> +static void fuse_uring_prep_sqe_register(struct io_uring_sqe *sqe, void *opaque)
> +{
> +    FuseQueue *q = opaque;
> +    struct fuse_uring_cmd_req *req = (void *)&sqe->cmd[0];
> +
> +    fuse_uring_sqe_prepare(sqe, q, FUSE_IO_URING_CMD_REGISTER);
> +
> +    sqe->addr = (uint64_t)(q->ent.iov);
> +    sqe->len = 2;
> +
> +    fuse_uring_sqe_set_req_data(req, q->qid, 0);
> +}
> +
> +static void fuse_uring_submit_register(void *opaque)
> +{
> +    FuseQueue *q = opaque;
> +    FuseExport *exp = q->exp;
> +
> +
> +    aio_add_sqe(fuse_uring_prep_sqe_register, q, &(q->ent.fuse_cqe_handler));

I think there might be a tricky issue with the io_uring integration in 
QEMU. Currently, when the number of IOThreads goes above ~6 or 7, 
there’s a pretty high chance of a hang. I added some debug logging in 
the kernel’s fuse_uring_cmd() registration part, and noticed that the 
number of register calls is less than the total number of entries in the 
queue. In theory, we should be registering each entry for each queue.

On the userspace side, everything seems normal, the number of 
aio_add_sqe() calls matches the number of IOThreads. But here’s the 
weird part: if I add a printf inside the while loop in 
fdmon-io_uring.c::fdmon_io_uring_wait(), suddenly everything works fine, 
and the kernel receives registration requests for all entries as expected.

     do {
         ret = io_uring_submit_and_wait(&ctx->fdmon_io_uring, wait_nr);
         fprintf(stderr, "io_uring_submit_and_wait ret: %d\n", ret);
     } while (ret == -EINTR);

My guess is that printf is just slowing down the loop, or maybe there’s 
some implicit memory barrier happening. Obviously, the right fix isn’t 
to sprinkle fprintfs around. I suspect there might be a subtle 
synchronization/race issue here.

Brian


  reply	other threads:[~2025-08-16 23:15 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-15  3:46 [RFC PATCH 0/3] block/export: Add FUSE-over-io_uring for Storage Exports Zhi Song
2025-08-15  3:46 ` [PATCH 1/3] fuse: add FUSE-over-io_uring enable opt and init Zhi Song
2025-08-16 23:13   ` Brian Song [this message]
2025-08-17 13:42     ` Stefan Hajnoczi
2025-08-18 23:04     ` Bernd Schubert
2025-08-19  1:15       ` Brian Song
2025-08-19 22:26         ` Bernd Schubert
2025-08-19 23:23           ` Brian Song
2025-08-20  3:31             ` Brian Song
2025-08-15  3:46 ` [PATCH 2/3] fuse: Handle FUSE-uring requests Zhi Song
2025-08-15  3:46 ` [PATCH 3/3] fuse: Safe termination for FUSE-uring Zhi Song
2025-08-17 13:45 ` [RFC PATCH 0/3] block/export: Add FUSE-over-io_uring for Storage Exports Stefan Hajnoczi
2025-08-18 22:54   ` Bernd Schubert
2025-08-21  1:32   ` Brian Song
2025-08-21 14:20     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=beb43845-a761-4031-a7b7-aaca56abb6de@gmail.com \
    --to=hibriansong@gmail.com \
    --cc=armbru@redhat.com \
    --cc=bernd@bsbernd.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.