qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Brian Song <hibriansong@gmail.com>
To: Bernd Schubert <bernd@bsbernd.com>, qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, armbru@redhat.com, fam@euphon.net,
	hreitz@redhat.com, kwolf@redhat.com, stefanha@redhat.com
Subject: Re: [PATCH 1/3] fuse: add FUSE-over-io_uring enable opt and init
Date: Mon, 18 Aug 2025 21:15:44 -0400	[thread overview]
Message-ID: <a76de187-f181-47d3-a36e-6eb1a865cc5e@gmail.com> (raw)
In-Reply-To: <105d583b-e931-4e60-a113-9a571c18c726@bsbernd.com>



On 8/18/25 7:04 PM, Bernd Schubert wrote:
> 
> 
> On 8/17/25 01:13, Brian Song wrote:
>>
>>
>> On 8/14/25 11:46 PM, Brian Song wrote:
>>> From: Brian Song <hibriansong@gmail.com>
>>>
>>> This patch adds a new export option for storage-export-daemon to enable
>>> or disable FUSE-over-io_uring via the switch io-uring=on|off (disable
>>> by default). It also implements the protocol handshake with the Linux
>>> kernel during the FUSE-over-io_uring initialization phase.
>>>
>>> See: https://docs.kernel.org/filesystems/fuse-io-uring.html
>>>
>>> The kernel documentation describes in detail how FUSE-over-io_uring
>>> works. This patch implements the Initial SQE stage shown in thediagram:
>>> it initializes one queue per IOThread, each currently supporting a
>>> single submission queue entry (SQE). When the FUSE driver sends the
>>> first FUSE request (FUSE_INIT), storage-export-daemon calls
>>> fuse_uring_start() to complete initialization, ultimately submitting
>>> the SQE with the FUSE_IO_URING_CMD_REGISTER command to confirm
>>> successful initialization with the kernel.
>>>
>>> Suggested-by: Kevin Wolf <kwolf@redhat.com>
>>> Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> Signed-off-by: Brian Song <hibriansong@gmail.com>
>>> ---
>>>    block/export/fuse.c                  | 161 ++++++++++++++++++++++++---
>>>    docs/tools/qemu-storage-daemon.rst   |  11 +-
>>>    qapi/block-export.json               |   5 +-
>>>    storage-daemon/qemu-storage-daemon.c |   1 +
>>>    util/fdmon-io_uring.c                |   5 +-
>>>    5 files changed, 159 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/block/export/fuse.c b/block/export/fuse.c
>>> index c0ad4696ce..59fa79f486 100644
>>> --- a/block/export/fuse.c
>>> +++ b/block/export/fuse.c
>>> @@ -48,6 +48,11 @@
>>>    #include <linux/fs.h>
>>>    #endif
>>>
>>> +#define FUSE_DEFAULT_MAX_PAGES_PER_REQ 32
>>> +
>>> +/* room needed in buffer to accommodate header */
>>> +#define FUSE_BUFFER_HEADER_SIZE 0x1000
>>> +
>>>    /* Prevent overly long bounce buffer allocations */
>>>    #define FUSE_MAX_READ_BYTES (MIN(BDRV_REQUEST_MAX_BYTES, 1 * 1024 * 1024))
>>>    /*
>>> @@ -63,12 +68,31 @@
>>>        (FUSE_MAX_WRITE_BYTES - FUSE_IN_PLACE_WRITE_BYTES)
>>>
>>>    typedef struct FuseExport FuseExport;
>>> +typedef struct FuseQueue FuseQueue;
>>> +
>>> +typedef struct FuseRingEnt {
>>> +    /* back pointer */
>>> +    FuseQueue *q;
>>> +
>>> +    /* commit id of a fuse request */
>>> +    uint64_t req_commit_id;
>>> +
>>> +    /* fuse request header and payload */
>>> +    struct fuse_uring_req_header req_header;
>>> +    void *op_payload;
>>> +    size_t req_payload_sz;
>>> +
>>> +    /* The vector passed to the kernel */
>>> +    struct iovec iov[2];
>>> +
>>> +    CqeHandler fuse_cqe_handler;
>>> +} FuseRingEnt;
>>>
>>>    /*
>>>     * One FUSE "queue", representing one FUSE FD from which requests are fetched
>>>     * and processed.  Each queue is tied to an AioContext.
>>>     */
>>> -typedef struct FuseQueue {
>>> +struct FuseQueue {
>>>        FuseExport *exp;
>>>
>>>        AioContext *ctx;
>>> @@ -109,7 +133,12 @@ typedef struct FuseQueue {
>>>         * Free this buffer with qemu_vfree().
>>>         */
>>>        void *spillover_buf;
>>> -} FuseQueue;
>>> +
>>> +#ifdef CONFIG_LINUX_IO_URING
>>> +    int qid;
>>> +    FuseRingEnt ent;
>>> +#endif
>>> +};
>>>
>>>    /*
>>>     * Verify that FuseQueue.request_buf plus the spill-over buffer together
>>> @@ -148,6 +177,7 @@ struct FuseExport {
>>>        bool growable;
>>>        /* Whether allow_other was used as a mount option or not */
>>>        bool allow_other;
>>> +    bool is_uring;
>>>
>>>        mode_t st_mode;
>>>        uid_t st_uid;
>>> @@ -257,6 +287,93 @@ static const BlockDevOps fuse_export_blk_dev_ops = {
>>>        .drained_poll  = fuse_export_drained_poll,
>>>    };
>>>
>>> +#ifdef CONFIG_LINUX_IO_URING
>>> +
>>> +static void fuse_uring_sqe_set_req_data(struct fuse_uring_cmd_req *req,
>>> +                    const unsigned int qid,
>>> +                    const unsigned int commit_id)
>>> +{
>>> +    req->qid = qid;
>>> +    req->commit_id = commit_id;
>>> +    req->flags = 0;
>>> +}
>>> +
>>> +static void fuse_uring_sqe_prepare(struct io_uring_sqe *sqe, FuseQueue *q,
>>> +               __u32 cmd_op)
>>> +{
>>> +    sqe->opcode = IORING_OP_URING_CMD;
>>> +
>>> +    sqe->fd = q->fuse_fd;
>>> +    sqe->rw_flags = 0;
>>> +    sqe->ioprio = 0;
>>> +    sqe->off = 0;
>>> +
>>> +    sqe->cmd_op = cmd_op;
>>> +    sqe->__pad1 = 0;
>>> +}
>>> +
>>> +static void fuse_uring_prep_sqe_register(struct io_uring_sqe *sqe, void *opaque)
>>> +{
>>> +    FuseQueue *q = opaque;
>>> +    struct fuse_uring_cmd_req *req = (void *)&sqe->cmd[0];
>>> +
>>> +    fuse_uring_sqe_prepare(sqe, q, FUSE_IO_URING_CMD_REGISTER);
>>> +
>>> +    sqe->addr = (uint64_t)(q->ent.iov);
>>> +    sqe->len = 2;
>>> +
>>> +    fuse_uring_sqe_set_req_data(req, q->qid, 0);
>>> +}
>>> +
>>> +static void fuse_uring_submit_register(void *opaque)
>>> +{
>>> +    FuseQueue *q = opaque;
>>> +    FuseExport *exp = q->exp;
>>> +
>>> +
>>> +    aio_add_sqe(fuse_uring_prep_sqe_register, q, &(q->ent.fuse_cqe_handler));
>>
>> I think there might be a tricky issue with the io_uring integration in
>> QEMU. Currently, when the number of IOThreads goes above ~6 or 7,
>> there’s a pretty high chance of a hang. I added some debug logging in
>> the kernel’s fuse_uring_cmd() registration part, and noticed that the
>> number of register calls is less than the total number of entries in the
>> queue. In theory, we should be registering each entry for each queue.
> 
> Did you also try to add logging at the top of fuse_uring_cmd()? I wonder
> if there is a start up race and if initial commands are just getting
> refused. I had run into issues you are describing in some versions of
> the -rfc patches, but thought that everything was fixed for that.
> I.e. not excluded that there is still a kernel issue left.
> 
> Thanks,
> Bernd
> 
> 

Yes. I added a printk at the beginning of fuse_uring_cmd(), another at 
the beginning of fuse_uring_register(), and one more at the end of 
fuse_uring_do_register(). Then I created and registered 20 queues, each 
with a single ring entry. It printed 37 times(diff every time) with 
opcode FUSE_IO_URING_CMD_REGISTER (would expect 20), and only 6 queues 
were registered successfully. The rest of fuse_uring_cmd (x31) exited 
inside the if (!fc->initialized) branch in fuse_uring_cmd()

dmesg: https://gist.github.com/hibriansong/4eda6e7e92601df497282dcd56fd5470

Thanks,
Brian


  reply	other threads:[~2025-08-19  1:16 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-15  3:46 [RFC PATCH 0/3] block/export: Add FUSE-over-io_uring for Storage Exports Zhi Song
2025-08-15  3:46 ` [PATCH 1/3] fuse: add FUSE-over-io_uring enable opt and init Zhi Song
2025-08-16 23:13   ` Brian Song
2025-08-17 13:42     ` Stefan Hajnoczi
2025-08-18 23:04     ` Bernd Schubert
2025-08-19  1:15       ` Brian Song [this message]
2025-08-19 22:26         ` Bernd Schubert
2025-08-19 23:23           ` Brian Song
2025-08-20  3:31             ` Brian Song
2025-08-15  3:46 ` [PATCH 2/3] fuse: Handle FUSE-uring requests Zhi Song
2025-08-15  3:46 ` [PATCH 3/3] fuse: Safe termination for FUSE-uring Zhi Song
2025-08-17 13:45 ` [RFC PATCH 0/3] block/export: Add FUSE-over-io_uring for Storage Exports Stefan Hajnoczi
2025-08-18 22:54   ` Bernd Schubert
2025-08-21  1:32   ` Brian Song
2025-08-21 14:20     ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a76de187-f181-47d3-a36e-6eb1a865cc5e@gmail.com \
    --to=hibriansong@gmail.com \
    --cc=armbru@redhat.com \
    --cc=bernd@bsbernd.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).