From: Stefan Hajnoczi <stefanha@gmail.com>
To: Hanna Czenczek <hreitz@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org,
Kevin Wolf <kwolf@redhat.com>,
qemu-stable@nongnu.org
Subject: Re: [PATCH 01/15] fuse: Copy write buffer content before polling
Date: Thu, 27 Mar 2025 10:47:31 -0400 [thread overview]
Message-ID: <20250327144731.GA37458@fedora> (raw)
In-Reply-To: <20250325160635.118812-1-hreitz@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 5513 bytes --]
On Tue, Mar 25, 2025 at 05:06:35PM +0100, Hanna Czenczek wrote:
> Polling in I/O functions can lead to nested read_from_fuse_export()
"Polling" means several different things. "aio_poll()" or "nested event
loop" would be clearer.
> calls, overwriting the request buffer's content. The only function
> affected by this is fuse_write(), which therefore must use a bounce
> buffer or corruption may occur.
>
> Note that in addition we do not know whether libfuse-internal structures
> can cope with this nesting, and even if we did, we probably cannot rely
> on it in the future. This is the main reason why we want to remove
> libfuse from the I/O path.
>
> I do not have a good reproducer for this other than:
>
> $ dd if=/dev/urandom of=image bs=1M count=4096
> $ dd if=/dev/zero of=copy bs=1M count=4096
> $ touch fuse-export
> $ qemu-storage-daemon \
> --blockdev file,node-name=file,filename=copy \
> --export \
> fuse,id=exp,node-name=file,mountpoint=fuse-export,writable=true \
> &
>
> Other shell:
> $ qemu-img convert -p -n -f raw -O raw -t none image fuse-export
> $ killall -SIGINT qemu-storage-daemon
> $ qemu-img compare image copy
> Content mismatch at offset 0!
>
> (The -t none in qemu-img convert is important.)
>
> I tried reproducing this with throttle and small aio_write requests from
> another qemu-io instance, but for some reason all requests are perfectly
> serialized then.
>
> I think in theory we should get parallel writes only if we set
> fi->parallel_direct_writes in fuse_open(). In fact, I can confirm that
> if we do that, that throttle-based reproducer works (i.e. does get
> parallel (nested) write requests). I have no idea why we still get
> parallel requests with qemu-img convert anyway.
>
> Also, a later patch in this series will set fi->parallel_direct_writes
> and note that it makes basically no difference when running fio on the
> current libfuse-based version of our code. It does make a difference
> without libfuse. So something quite fishy is going on.
>
> I will try to investigate further what the root cause is, but I think
> for now let's assume that calling blk_pwrite() can invalidate the buffer
> contents through nested polling.
>
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
> block/export/fuse.c | 24 +++++++++++++++++++++---
> 1 file changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/block/export/fuse.c b/block/export/fuse.c
> index 465cc9891d..a12f479492 100644
> --- a/block/export/fuse.c
> +++ b/block/export/fuse.c
> @@ -301,6 +301,12 @@ static void read_from_fuse_export(void *opaque)
> goto out;
> }
>
> + /*
> + * Note that polling in any request-processing function can lead to a nested
> + * read_from_fuse_export() call, which will overwrite the contents of
> + * exp->fuse_buf. Anything that takes a buffer needs to take care that the
> + * content is copied before potentially polling.
> + */
> fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf);
It seems safer to allocate a fuse_buf per request instead copying the
data buffer only for write requests. Other request types might be
affected too (e.g. nested reads of different sizes).
I guess later on in this series a per-request fuse_buf will be
introduced anyway, so it doesn't matter what we do in this commit.
>
> out:
> @@ -624,6 +630,7 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, const char *buf,
> size_t size, off_t offset, struct fuse_file_info *fi)
> {
> FuseExport *exp = fuse_req_userdata(req);
> + void *copied;
> int64_t length;
> int ret;
>
> @@ -638,6 +645,14 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, const char *buf,
> return;
> }
>
> + /*
> + * Heed the note on read_from_fuse_export(): If we poll (which any blk_*()
> + * I/O function may do), read_from_fuse_export() may be nested, overwriting
> + * the request buffer content. Therefore, we must copy it here.
> + */
> + copied = blk_blockalign(exp->common.blk, size);
> + memcpy(copied, buf, size);
> +
> /**
> * Clients will expect short writes at EOF, so we have to limit
> * offset+size to the image length.
> @@ -645,7 +660,7 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, const char *buf,
> length = blk_getlength(exp->common.blk);
> if (length < 0) {
> fuse_reply_err(req, -length);
> - return;
> + goto free_buffer;
> }
>
> if (offset + size > length) {
> @@ -653,19 +668,22 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, const char *buf,
> ret = fuse_do_truncate(exp, offset + size, true, PREALLOC_MODE_OFF);
> if (ret < 0) {
> fuse_reply_err(req, -ret);
> - return;
> + goto free_buffer;
> }
> } else {
> size = length - offset;
> }
> }
>
> - ret = blk_pwrite(exp->common.blk, offset, size, buf, 0);
> + ret = blk_pwrite(exp->common.blk, offset, size, copied, 0);
> if (ret >= 0) {
> fuse_reply_write(req, size);
> } else {
> fuse_reply_err(req, -ret);
> }
> +
> +free_buffer:
> + qemu_vfree(copied);
> }
>
> /**
> --
> 2.48.1
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2025-03-27 14:48 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-25 16:05 [PATCH 00/15] export/fuse: Use coroutines and multi-threading Hanna Czenczek
2025-03-25 16:06 ` [PATCH 01/15] fuse: Copy write buffer content before polling Hanna Czenczek
2025-03-27 14:47 ` Stefan Hajnoczi [this message]
2025-04-04 11:17 ` Hanna Czenczek
2025-04-01 13:44 ` Eric Blake
2025-04-04 11:18 ` Hanna Czenczek
2025-03-25 16:06 ` [PATCH 02/15] fuse: Ensure init clean-up even with error_fatal Hanna Czenczek
2025-03-26 5:47 ` Markus Armbruster
2025-03-26 9:49 ` Hanna Czenczek
2025-03-27 14:51 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 03/15] fuse: Remove superfluous empty line Hanna Czenczek
2025-03-27 14:53 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 04/15] fuse: Explicitly set inode ID to 1 Hanna Czenczek
2025-03-27 14:54 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 05/15] fuse: Change setup_... to mount_fuse_export() Hanna Czenczek
2025-03-27 14:55 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 06/15] fuse: Fix mount options Hanna Czenczek
2025-03-27 14:58 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 07/15] fuse: Set direct_io and parallel_direct_writes Hanna Czenczek
2025-03-27 15:09 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 08/15] fuse: Introduce fuse_{at,de}tach_handlers() Hanna Czenczek
2025-03-27 15:12 ` Stefan Hajnoczi
2025-04-01 13:55 ` Eric Blake
2025-04-04 11:24 ` Hanna Czenczek
2025-03-25 16:06 ` [PATCH 09/15] fuse: Introduce fuse_{inc,dec}_in_flight() Hanna Czenczek
2025-03-27 15:13 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 10/15] fuse: Add halted flag Hanna Czenczek
2025-03-27 15:15 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 11/15] fuse: Manually process requests (without libfuse) Hanna Czenczek
2025-03-27 15:35 ` Stefan Hajnoczi
2025-04-04 12:36 ` Hanna Czenczek
2025-04-01 14:35 ` Eric Blake
2025-04-04 11:30 ` Hanna Czenczek
2025-04-04 11:42 ` Hanna Czenczek
2025-03-25 16:06 ` [PATCH 12/15] fuse: Reduce max read size Hanna Czenczek
2025-03-27 15:35 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 13/15] fuse: Process requests in coroutines Hanna Czenczek
2025-03-27 15:38 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 14/15] fuse: Implement multi-threading Hanna Czenczek
2025-03-26 5:38 ` Markus Armbruster
2025-03-26 9:55 ` Hanna Czenczek
2025-03-26 11:41 ` Markus Armbruster
2025-03-26 13:56 ` Hanna Czenczek
2025-03-27 12:18 ` Markus Armbruster via
2025-03-27 13:45 ` Hanna Czenczek
2025-04-01 12:05 ` Kevin Wolf
2025-04-01 20:31 ` Eric Blake
2025-04-04 12:45 ` Hanna Czenczek
2025-03-27 15:55 ` Stefan Hajnoczi
2025-04-01 20:36 ` Eric Blake
2025-04-02 13:20 ` Stefan Hajnoczi
2025-04-03 17:59 ` Eric Blake
2025-04-04 12:49 ` Hanna Czenczek
2025-04-07 14:02 ` Stefan Hajnoczi
2025-04-01 14:58 ` Eric Blake
2025-03-25 16:06 ` [PATCH 15/15] fuse: Increase MAX_WRITE_SIZE with a second buffer Hanna Czenczek
2025-03-27 15:59 ` Stefan Hajnoczi
2025-04-01 20:24 ` Eric Blake
2025-04-04 13:04 ` Hanna Czenczek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250327144731.GA37458@fedora \
--to=stefanha@gmail.com \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-stable@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.