All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Hanna Czenczek <hreitz@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	Brian Song <hibriansong@gmail.com>
Subject: Re: [PATCH v3 13/21] fuse: Manually process requests (without libfuse)
Date: Wed, 22 Oct 2025 14:19:23 +0200	[thread overview]
Message-ID: <aPjLywlk8LyerNna@redhat.com> (raw)
In-Reply-To: <20250701114437.207419-14-hreitz@redhat.com>

Am 01.07.2025 um 13:44 hat Hanna Czenczek geschrieben:
> Manually read requests from the /dev/fuse FD and process them, without
> using libfuse.  This allows us to safely add parallel request processing
> in coroutines later, without having to worry about libfuse internals.
> (Technically, we already have exactly that problem with
> read_from_fuse_export()/read_from_fuse_fd() nesting.)
> [...]

> +/**
> + * Write a FUSE response to the given @fd, using a single buffer consecutively
> + * containing both the response header and data: Initialize *out_hdr, and write
> + * it plus @response_data_length consecutive bytes to @fd.
> + *
> + * @fd: FUSE file descriptor
> + * @req_id: Corresponding request ID
> + * @out_hdr: Pointer to buffer that will hold the output header, and
> + *           additionally already contains @response_data_length data bytes
> + *           starting at *out_hdr + 1.
> + * @err: Error code (-errno, or 0 in case of success)
> + * @response_data_length: Length of data to return (following *out_hdr)
> + */
> +static int fuse_write_response(int fd, uint32_t req_id,
> +                               struct fuse_out_header *out_hdr, int err,
> +                               size_t response_data_length)
> +{
> +    void *write_ptr = out_hdr;
> +    size_t to_write = sizeof(*out_hdr) + response_data_length;
> +    ssize_t ret;
> +
> +    *out_hdr = (struct fuse_out_header) {
> +        .len = to_write,
> +        .error = err,
> +        .unique = req_id,
> +    };
> +
> +    while (true) {
> +        ret = RETRY_ON_EINTR(write(fd, write_ptr, to_write));
> +        if (ret < 0) {
> +            ret = -errno;
> +            error_report("Failed to write to FUSE device: %s", strerror(-ret));
> +            return ret;
> +        } else {
> +            to_write -= ret;
> +            if (to_write > 0) {
> +                write_ptr += ret;
> +            } else {
> +                return 0; /* success */
> +            }
> +        }
> +    }
> +}

Two thoughts on this one:

This looks like essentially a reimplementation of qemu_write_full(),
except that it doesn't count how many bytes were successfully written
until the error happened. Is this sufficiently different to exist?

And do we even need to consider short writes? I would be surprised if
FUSE let that happen. libfuse doesn't seem to consider it an option, it
just calls writev() once in fuse_write_msg_dev(). And indeed, the kernel
returns either an error or the full byte count in fuse_dev_do_write().

> +/**
> + * Write a FUSE response to the given @fd, using separate buffers for the
> + * response header and data: Initialize *out_hdr, and write it plus the data in
> + * *buf to @fd.
> + *
> + * In contrast to fuse_write_response(), this function cannot return errors, and
> + * will always return success (error code 0).
> + *
> + * @fd: FUSE file descriptor
> + * @req_id: Corresponding request ID
> + * @out_hdr: Pointer to buffer that will hold the output header
> + * @buf: Pointer to response data
> + * @buflen: Length of response data
> + */
> +static int fuse_write_buf_response(int fd, uint32_t req_id,
> +                                   struct fuse_out_header *out_hdr,
> +                                   const void *buf, size_t buflen)
> +{
> +    struct iovec iov[2] = {
> +        { out_hdr, sizeof(*out_hdr) },
> +        { (void *)buf, buflen },
> +    };
> +    struct iovec *iovp = iov;
> +    unsigned iov_count = ARRAY_SIZE(iov);
> +    size_t to_write = sizeof(*out_hdr) + buflen;
> +    ssize_t ret;
> +
> +    *out_hdr = (struct fuse_out_header) {
> +        .len = to_write,
> +        .unique = req_id,
> +    };
> +
> +    while (true) {
> +        ret = RETRY_ON_EINTR(writev(fd, iovp, iov_count));
> +        if (ret < 0) {
> +            ret = -errno;
> +            error_report("Failed to write to FUSE device: %s", strerror(-ret));
> +            return ret;
> +        } else {
> +            to_write -= ret;
> +            if (to_write > 0) {
> +                iov_discard_front(&iovp, &iov_count, ret);
> +            } else {
> +                return 0; /* success */
> +            }
> +        }
> +    }
> +}

Same question as above (except that a qemu_writev_full() doesn't exist
yet if we decided that for some reason short writes really are a
concern).

> +/*
> + * For use in fuse_process_request():
> + * Returns a pointer to the parameter object for the given operation (inside of
> + * export->request_buf, which is assumed to hold a fuse_in_header first).
> + * Verifies that the object is complete (export->request_buf is large enough to
> + * hold it in one piece, and the request length includes the whole object).
> + *
> + * Note that export->request_buf may be overwritten after polling, so the
> + * returned pointer must not be used across a function that may poll!
> + */
> +#define FUSE_IN_OP_STRUCT(op_name, export) \
> +    ({ \
> +        const struct fuse_in_header *__in_hdr = \
> +            (const struct fuse_in_header *)(export)->request_buf; \
> +        const struct fuse_##op_name##_in *__in = \
> +            (const struct fuse_##op_name##_in *)(__in_hdr + 1); \
> +        const size_t __param_len = sizeof(*__in_hdr) + sizeof(*__in); \
> +        uint32_t __req_len; \
> +        \
> +        QEMU_BUILD_BUG_ON(sizeof((export)->request_buf) < __param_len); \
> +        \
> +        __req_len = __in_hdr->len; \
> +        if (__req_len < __param_len) { \
> +            warn_report("FUSE request truncated (%" PRIu32 " < %zu)", \
> +                        __req_len, __param_len); \
> +            ret = -EINVAL; \
> +            break; \
> +        } \
> +        __in; \
> +    })
> +
> +/*
> + * For use in fuse_process_request():
> + * Returns a pointer to the return object for the given operation (inside of
> + * out_buf, which is assumed to hold a fuse_out_header first).
> + * Verifies that out_buf is large enough to hold the whole object.
> + *
> + * (out_buf should be a char[] array.)
> + */
> +#define FUSE_OUT_OP_STRUCT(op_name, out_buf) \
> +    ({ \
> +        struct fuse_out_header *__out_hdr = \
> +            (struct fuse_out_header *)(out_buf); \
> +        struct fuse_##op_name##_out *__out = \
> +            (struct fuse_##op_name##_out *)(__out_hdr + 1); \
> +        \
> +        QEMU_BUILD_BUG_ON(sizeof(*__out_hdr) + sizeof(*__out) > \
> +                          sizeof(out_buf)); \
> +        \
> +        __out; \
> +    })
> +
> +/**
> + * Process a FUSE request, incl. writing the response.
> + *
> + * Note that polling in any request-processing function can lead to a nested
> + * read_from_fuse_fd() call, which will overwrite the contents of
> + * exp->request_buf.  Anything that takes a buffer needs to take care that the
> + * content is copied before potentially polling.
> + */
> +static void fuse_process_request(FuseExport *exp)
> +{
> +    uint32_t opcode;
> +    uint64_t req_id;
> +    /*
> +     * Return buffer.  Must be large enough to hold all return headers, but does
> +     * not include space for data returned by read requests.
> +     * (FUSE_IN_OP_STRUCT() verifies at compile time that out_buf is indeed
> +     * large enough.)
> +     */
> +    char out_buf[sizeof(struct fuse_out_header) +
> +                 MAX_CONST(sizeof(struct fuse_init_out),
> +                 MAX_CONST(sizeof(struct fuse_open_out),
> +                 MAX_CONST(sizeof(struct fuse_attr_out),
> +                 MAX_CONST(sizeof(struct fuse_write_out),
> +                           sizeof(struct fuse_lseek_out)))))];

This and the macros above are a little ugly. Can't we just define a type
for this? Something like this:

struct {
    struct fuse_out_header header;
    union {
        struct fuse_init_out init;
        struct fuse_open_out open;
        struct fuse_attr_out attr;
        struct fuse_write_out write;
        struct fuse_lseek_out lseek;
    };
};

And then have a variable of that type here. Instead of
FUSE_OUT_OP_STRUCT(), you can now just access the union members.

For FUSE_IN_OP_STRUCT(), there's the additional length check that you
have, so maybe for that one, the macro still has its use.

> +    struct fuse_out_header *out_hdr = (struct fuse_out_header *)out_buf;
> +    /* For read requests: Data to be returned */
> +    void *out_data_buffer = NULL;
> +    ssize_t ret;
> +
> +    /* Limit scope to ensure pointer is no longer used after polling */
> +    {
> +        const struct fuse_in_header *in_hdr =
> +            (const struct fuse_in_header *)exp->request_buf;
> +
> +        opcode = in_hdr->opcode;
> +        req_id = in_hdr->unique;
> +    }
> +
> +    switch (opcode) {
> +    case FUSE_INIT: {
> +        const struct fuse_init_in *in = FUSE_IN_OP_STRUCT(init, exp);
> +        ret = fuse_init(exp, FUSE_OUT_OP_STRUCT(init, out_buf),
> +                        in->max_readahead, in->flags);
> +        break;
> +    }
> +
> +    case FUSE_OPEN:
> +        ret = fuse_open(exp, FUSE_OUT_OP_STRUCT(open, out_buf));
> +        break;
> +
> +    case FUSE_RELEASE:
> +        ret = 0;
> +        break;
> +
> +    case FUSE_LOOKUP:
> +        ret = -ENOENT; /* There is no node but the root node */
> +        break;
> +
> +    case FUSE_GETATTR:
> +        ret = fuse_getattr(exp, FUSE_OUT_OP_STRUCT(attr, out_buf));
> +        break;
> +
> +    case FUSE_SETATTR: {
> +        const struct fuse_setattr_in *in = FUSE_IN_OP_STRUCT(setattr, exp);
> +        ret = fuse_setattr(exp, FUSE_OUT_OP_STRUCT(attr, out_buf),
> +                           in->valid, in->size, in->mode, in->uid, in->gid);
> +        break;
> +    }
> +
> +    case FUSE_READ: {
> +        const struct fuse_read_in *in = FUSE_IN_OP_STRUCT(read, exp);
> +        ret = fuse_read(exp, &out_data_buffer, in->offset, in->size);
> +        break;
> +    }
> +
> +    case FUSE_WRITE: {
> +        const struct fuse_write_in *in = FUSE_IN_OP_STRUCT(write, exp);
> +        uint32_t req_len;
> +
> +        req_len = ((const struct fuse_in_header *)exp->request_buf)->len;

Hm, was that the idea when you limited the in_hdr scope above? :-)

> +        if (unlikely(req_len < sizeof(struct fuse_in_header) + sizeof(*in) +
> +                               in->size)) {
> +            warn_report("FUSE WRITE truncated; received %zu bytes of %" PRIu32,
> +                        req_len - sizeof(struct fuse_in_header) - sizeof(*in),
> +                        in->size);
> +            ret = -EINVAL;
> +            break;
> +        }
> +
> +        /*
> +         * poll_fuse_fd() has checked that in_hdr->len matches the number of
> +         * bytes read, which cannot exceed the max_write value we set
> +         * (FUSE_MAX_WRITE_BYTES).  So we know that FUSE_MAX_WRITE_BYTES >=
> +         * in_hdr->len >= in->size + X, so this assertion must hold.
> +         */
> +        assert(in->size <= FUSE_MAX_WRITE_BYTES);
> +
> +        /*
> +         * Passing a pointer to `in` (i.e. the request buffer) is fine because
> +         * fuse_write() takes care to copy its contents before potentially
> +         * polling.
> +         */
> +        ret = fuse_write(exp, FUSE_OUT_OP_STRUCT(write, out_buf),
> +                         in->offset, in->size, in + 1);
> +        break;
> +    }
> +
> +    case FUSE_FALLOCATE: {
> +        const struct fuse_fallocate_in *in = FUSE_IN_OP_STRUCT(fallocate, exp);
> +        ret = fuse_fallocate(exp, in->offset, in->length, in->mode);
> +        break;
> +    }
> +
> +    case FUSE_FSYNC:
> +        ret = fuse_fsync(exp);
> +        break;
> +
> +    case FUSE_FLUSH:
> +        ret = fuse_flush(exp);
> +        break;
> +
>  #ifdef CONFIG_FUSE_LSEEK
> -    .lseek      = fuse_lseek,
> +    case FUSE_LSEEK: {
> +        const struct fuse_lseek_in *in = FUSE_IN_OP_STRUCT(lseek, exp);
> +        ret = fuse_lseek(exp, FUSE_OUT_OP_STRUCT(lseek, out_buf),
> +                         in->offset, in->whence);
> +        break;
> +    }
>  #endif
> -};
> +
> +    default:
> +        ret = -ENOSYS;
> +    }
> +
> +    /* Ignore errors from fuse_write*(), nothing we can do anyway */
> +    if (out_data_buffer) {
> +        assert(ret >= 0);
> +        fuse_write_buf_response(exp->fuse_fd, req_id, out_hdr,
> +                                out_data_buffer, ret);
> +        qemu_vfree(out_data_buffer);
> +    } else {
> +        fuse_write_response(exp->fuse_fd, req_id, out_hdr,
> +                            ret < 0 ? ret : 0,
> +                            ret < 0 ? 0 : ret);
> +    }
> +}

In summary, I don't really see anything wrong, but just some things that
maybe look more complicated than strictly necessarily.

Kevin



  reply	other threads:[~2025-10-22 12:22 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-01 11:44 [PATCH v3 00/21] export/fuse: Use coroutines and multi-threading Hanna Czenczek
2025-07-01 11:44 ` [PATCH v3 01/21] fuse: Copy write buffer content before polling Hanna Czenczek
2025-10-13 13:48   ` Kevin Wolf
2025-07-01 11:44 ` [PATCH v3 02/21] fuse: Ensure init clean-up even with error_fatal Hanna Czenczek
2025-07-01 11:44 ` [PATCH v3 03/21] fuse: Remove superfluous empty line Hanna Czenczek
2025-07-01 11:44 ` [PATCH v3 04/21] fuse: Explicitly set inode ID to 1 Hanna Czenczek
2025-07-01 11:44 ` [PATCH v3 05/21] fuse: Change setup_... to mount_fuse_export() Hanna Czenczek
2025-10-13 13:50   ` Kevin Wolf
2025-07-01 11:44 ` [PATCH v3 06/21] fuse: Fix mount options Hanna Czenczek
2025-07-01 11:44 ` [PATCH v3 07/21] fuse: Set direct_io and parallel_direct_writes Hanna Czenczek
2025-07-01 11:44 ` [PATCH v3 08/21] fuse: Introduce fuse_{at,de}tach_handlers() Hanna Czenczek
2025-07-01 11:44 ` [PATCH v3 09/21] fuse: Introduce fuse_{inc,dec}_in_flight() Hanna Czenczek
2025-07-01 11:44 ` [PATCH v3 10/21] fuse: Add halted flag Hanna Czenczek
2025-10-13 16:18   ` Kevin Wolf
2025-07-01 11:44 ` [PATCH v3 11/21] fuse: Rename length to blk_len in fuse_write() Hanna Czenczek
2025-07-01 11:44 ` [PATCH v3 12/21] block: Move qemu_fcntl_addfl() into osdep.c Hanna Czenczek
2025-07-30 17:10   ` Stefan Hajnoczi
2025-07-01 11:44 ` [PATCH v3 13/21] fuse: Manually process requests (without libfuse) Hanna Czenczek
2025-10-22 12:19   ` Kevin Wolf [this message]
2025-10-31 11:55     ` Hanna Czenczek
2025-07-01 11:44 ` [PATCH v3 14/21] fuse: Reduce max read size Hanna Czenczek
2025-07-01 11:44 ` [PATCH v3 15/21] fuse: Process requests in coroutines Hanna Czenczek
2025-10-22 12:53   ` Kevin Wolf
2025-10-31 11:55     ` Hanna Czenczek
2025-07-01 11:44 ` [PATCH v3 16/21] block/export: Add multi-threading interface Hanna Czenczek
2025-10-22 12:57   ` Kevin Wolf
2025-10-31 11:56     ` Hanna Czenczek
2025-07-01 11:44 ` [PATCH v3 17/21] iotests/307: Test multi-thread export interface Hanna Czenczek
2025-07-30 17:12   ` Stefan Hajnoczi
2025-07-01 11:44 ` [PATCH v3 18/21] fuse: Implement multi-threading Hanna Czenczek
2025-07-30 17:18   ` Stefan Hajnoczi
2025-10-23 11:47   ` Kevin Wolf
2025-10-31 12:05     ` Hanna Czenczek
2025-07-01 11:44 ` [PATCH v3 19/21] qapi/block-export: Document FUSE's multi-threading Hanna Czenczek
2025-07-30 17:19   ` Stefan Hajnoczi
2025-07-01 11:44 ` [PATCH v3 20/21] iotests/308: Add multi-threading sanity test Hanna Czenczek
2025-07-01 11:44 ` [PATCH v3 21/21] fuse: Increase MAX_WRITE_SIZE with a second buffer Hanna Czenczek
2025-10-23 15:11   ` Kevin Wolf
2025-10-31 12:13     ` Hanna Czenczek
2025-10-31 12:33       ` Kevin Wolf
2025-10-31 15:03         ` Hanna Czenczek
2025-10-31 15:50           ` Kevin Wolf
2025-07-30 17:19 ` [PATCH v3 00/21] export/fuse: Use coroutines and multi-threading Stefan Hajnoczi
2025-10-23 15:15 ` Kevin Wolf
2025-10-31 12:14   ` Hanna Czenczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aPjLywlk8LyerNna@redhat.com \
    --to=kwolf@redhat.com \
    --cc=armbru@redhat.com \
    --cc=hibriansong@gmail.com \
    --cc=hreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.