From: Stefan Hajnoczi <stefanha@gmail.com>
To: Hanna Czenczek <hreitz@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org,
Kevin Wolf <kwolf@redhat.com>,
qemu-stable@nongnu.org
Subject: Re: [PATCH 01/15] fuse: Copy write buffer content before polling
Date: Thu, 27 Mar 2025 10:47:31 -0400 [thread overview]
Message-ID: <20250327144731.GA37458@fedora> (raw)
In-Reply-To: <20250325160635.118812-1-hreitz@redhat.com>
[-- Attachment #1: Type: text/plain, Size: 5513 bytes --]
On Tue, Mar 25, 2025 at 05:06:35PM +0100, Hanna Czenczek wrote:
> Polling in I/O functions can lead to nested read_from_fuse_export()
"Polling" means several different things. "aio_poll()" or "nested event
loop" would be clearer.
> calls, overwriting the request buffer's content. The only function
> affected by this is fuse_write(), which therefore must use a bounce
> buffer or corruption may occur.
>
> Note that in addition we do not know whether libfuse-internal structures
> can cope with this nesting, and even if we did, we probably cannot rely
> on it in the future. This is the main reason why we want to remove
> libfuse from the I/O path.
>
> I do not have a good reproducer for this other than:
>
> $ dd if=/dev/urandom of=image bs=1M count=4096
> $ dd if=/dev/zero of=copy bs=1M count=4096
> $ touch fuse-export
> $ qemu-storage-daemon \
> --blockdev file,node-name=file,filename=copy \
> --export \
> fuse,id=exp,node-name=file,mountpoint=fuse-export,writable=true \
> &
>
> Other shell:
> $ qemu-img convert -p -n -f raw -O raw -t none image fuse-export
> $ killall -SIGINT qemu-storage-daemon
> $ qemu-img compare image copy
> Content mismatch at offset 0!
>
> (The -t none in qemu-img convert is important.)
>
> I tried reproducing this with throttle and small aio_write requests from
> another qemu-io instance, but for some reason all requests are perfectly
> serialized then.
>
> I think in theory we should get parallel writes only if we set
> fi->parallel_direct_writes in fuse_open(). In fact, I can confirm that
> if we do that, that throttle-based reproducer works (i.e. does get
> parallel (nested) write requests). I have no idea why we still get
> parallel requests with qemu-img convert anyway.
>
> Also, a later patch in this series will set fi->parallel_direct_writes
> and note that it makes basically no difference when running fio on the
> current libfuse-based version of our code. It does make a difference
> without libfuse. So something quite fishy is going on.
>
> I will try to investigate further what the root cause is, but I think
> for now let's assume that calling blk_pwrite() can invalidate the buffer
> contents through nested polling.
>
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
> block/export/fuse.c | 24 +++++++++++++++++++++---
> 1 file changed, 21 insertions(+), 3 deletions(-)
>
> diff --git a/block/export/fuse.c b/block/export/fuse.c
> index 465cc9891d..a12f479492 100644
> --- a/block/export/fuse.c
> +++ b/block/export/fuse.c
> @@ -301,6 +301,12 @@ static void read_from_fuse_export(void *opaque)
> goto out;
> }
>
> + /*
> + * Note that polling in any request-processing function can lead to a nested
> + * read_from_fuse_export() call, which will overwrite the contents of
> + * exp->fuse_buf. Anything that takes a buffer needs to take care that the
> + * content is copied before potentially polling.
> + */
> fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf);
It seems safer to allocate a fuse_buf per request instead copying the
data buffer only for write requests. Other request types might be
affected too (e.g. nested reads of different sizes).
I guess later on in this series a per-request fuse_buf will be
introduced anyway, so it doesn't matter what we do in this commit.
>
> out:
> @@ -624,6 +630,7 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, const char *buf,
> size_t size, off_t offset, struct fuse_file_info *fi)
> {
> FuseExport *exp = fuse_req_userdata(req);
> + void *copied;
> int64_t length;
> int ret;
>
> @@ -638,6 +645,14 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, const char *buf,
> return;
> }
>
> + /*
> + * Heed the note on read_from_fuse_export(): If we poll (which any blk_*()
> + * I/O function may do), read_from_fuse_export() may be nested, overwriting
> + * the request buffer content. Therefore, we must copy it here.
> + */
> + copied = blk_blockalign(exp->common.blk, size);
> + memcpy(copied, buf, size);
> +
> /**
> * Clients will expect short writes at EOF, so we have to limit
> * offset+size to the image length.
> @@ -645,7 +660,7 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, const char *buf,
> length = blk_getlength(exp->common.blk);
> if (length < 0) {
> fuse_reply_err(req, -length);
> - return;
> + goto free_buffer;
> }
>
> if (offset + size > length) {
> @@ -653,19 +668,22 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, const char *buf,
> ret = fuse_do_truncate(exp, offset + size, true, PREALLOC_MODE_OFF);
> if (ret < 0) {
> fuse_reply_err(req, -ret);
> - return;
> + goto free_buffer;
> }
> } else {
> size = length - offset;
> }
> }
>
> - ret = blk_pwrite(exp->common.blk, offset, size, buf, 0);
> + ret = blk_pwrite(exp->common.blk, offset, size, copied, 0);
> if (ret >= 0) {
> fuse_reply_write(req, size);
> } else {
> fuse_reply_err(req, -ret);
> }
> +
> +free_buffer:
> + qemu_vfree(copied);
> }
>
> /**
> --
> 2.48.1
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
next prev parent reply other threads:[~2025-03-27 14:48 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-03-25 16:05 [PATCH 00/15] export/fuse: Use coroutines and multi-threading Hanna Czenczek
2025-03-25 16:06 ` [PATCH 01/15] fuse: Copy write buffer content before polling Hanna Czenczek
2025-03-27 14:47 ` Stefan Hajnoczi [this message]
2025-04-04 11:17 ` Hanna Czenczek
2025-04-01 13:44 ` Eric Blake
2025-04-04 11:18 ` Hanna Czenczek
2025-03-25 16:06 ` [PATCH 02/15] fuse: Ensure init clean-up even with error_fatal Hanna Czenczek
2025-03-26 5:47 ` Markus Armbruster
2025-03-26 9:49 ` Hanna Czenczek
2025-03-27 14:51 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 03/15] fuse: Remove superfluous empty line Hanna Czenczek
2025-03-27 14:53 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 04/15] fuse: Explicitly set inode ID to 1 Hanna Czenczek
2025-03-27 14:54 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 05/15] fuse: Change setup_... to mount_fuse_export() Hanna Czenczek
2025-03-27 14:55 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 06/15] fuse: Fix mount options Hanna Czenczek
2025-03-27 14:58 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 07/15] fuse: Set direct_io and parallel_direct_writes Hanna Czenczek
2025-03-27 15:09 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 08/15] fuse: Introduce fuse_{at,de}tach_handlers() Hanna Czenczek
2025-03-27 15:12 ` Stefan Hajnoczi
2025-04-01 13:55 ` Eric Blake
2025-04-04 11:24 ` Hanna Czenczek
2025-03-25 16:06 ` [PATCH 09/15] fuse: Introduce fuse_{inc,dec}_in_flight() Hanna Czenczek
2025-03-27 15:13 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 10/15] fuse: Add halted flag Hanna Czenczek
2025-03-27 15:15 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 11/15] fuse: Manually process requests (without libfuse) Hanna Czenczek
2025-03-27 15:35 ` Stefan Hajnoczi
2025-04-04 12:36 ` Hanna Czenczek
2025-04-01 14:35 ` Eric Blake
2025-04-04 11:30 ` Hanna Czenczek
2025-04-04 11:42 ` Hanna Czenczek
2025-03-25 16:06 ` [PATCH 12/15] fuse: Reduce max read size Hanna Czenczek
2025-03-27 15:35 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 13/15] fuse: Process requests in coroutines Hanna Czenczek
2025-03-27 15:38 ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 14/15] fuse: Implement multi-threading Hanna Czenczek
2025-03-26 5:38 ` Markus Armbruster
2025-03-26 9:55 ` Hanna Czenczek
2025-03-26 11:41 ` Markus Armbruster
2025-03-26 13:56 ` Hanna Czenczek
2025-03-27 12:18 ` Markus Armbruster via
2025-03-27 13:45 ` Hanna Czenczek
2025-04-01 12:05 ` Kevin Wolf
2025-04-01 20:31 ` Eric Blake
2025-04-04 12:45 ` Hanna Czenczek
2025-03-27 15:55 ` Stefan Hajnoczi
2025-04-01 20:36 ` Eric Blake
2025-04-02 13:20 ` Stefan Hajnoczi
2025-04-03 17:59 ` Eric Blake
2025-04-04 12:49 ` Hanna Czenczek
2025-04-07 14:02 ` Stefan Hajnoczi
2025-04-01 14:58 ` Eric Blake
2025-03-25 16:06 ` [PATCH 15/15] fuse: Increase MAX_WRITE_SIZE with a second buffer Hanna Czenczek
2025-03-27 15:59 ` Stefan Hajnoczi
2025-04-01 20:24 ` Eric Blake
2025-04-04 13:04 ` Hanna Czenczek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250327144731.GA37458@fedora \
--to=stefanha@gmail.com \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-stable@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).