qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Hanna Czenczek <hreitz@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Kevin Wolf <kwolf@redhat.com>,
	qemu-stable@nongnu.org
Subject: Re: [PATCH 01/15] fuse: Copy write buffer content before polling
Date: Fri, 4 Apr 2025 13:17:17 +0200	[thread overview]
Message-ID: <68d0bd72-3a72-4370-8178-45dda9889a56@redhat.com> (raw)
In-Reply-To: <20250327144731.GA37458@fedora>

On 27.03.25 15:47, Stefan Hajnoczi wrote:
> On Tue, Mar 25, 2025 at 05:06:35PM +0100, Hanna Czenczek wrote:
>> Polling in I/O functions can lead to nested read_from_fuse_export()
> "Polling" means several different things. "aio_poll()" or "nested event
> loop" would be clearer.

Sure!

>> calls, overwriting the request buffer's content.  The only function
>> affected by this is fuse_write(), which therefore must use a bounce
>> buffer or corruption may occur.
>>
>> Note that in addition we do not know whether libfuse-internal structures
>> can cope with this nesting, and even if we did, we probably cannot rely
>> on it in the future.  This is the main reason why we want to remove
>> libfuse from the I/O path.
>>
>> I do not have a good reproducer for this other than:
>>
>> $ dd if=/dev/urandom of=image bs=1M count=4096
>> $ dd if=/dev/zero of=copy bs=1M count=4096
>> $ touch fuse-export
>> $ qemu-storage-daemon \
>>      --blockdev file,node-name=file,filename=copy \
>>      --export \
>>      fuse,id=exp,node-name=file,mountpoint=fuse-export,writable=true \
>>      &
>>
>> Other shell:
>> $ qemu-img convert -p -n -f raw -O raw -t none image fuse-export
>> $ killall -SIGINT qemu-storage-daemon
>> $ qemu-img compare image copy
>> Content mismatch at offset 0!
>>
>> (The -t none in qemu-img convert is important.)
>>
>> I tried reproducing this with throttle and small aio_write requests from
>> another qemu-io instance, but for some reason all requests are perfectly
>> serialized then.
>>
>> I think in theory we should get parallel writes only if we set
>> fi->parallel_direct_writes in fuse_open().  In fact, I can confirm that
>> if we do that, that throttle-based reproducer works (i.e. does get
>> parallel (nested) write requests).  I have no idea why we still get
>> parallel requests with qemu-img convert anyway.
>>
>> Also, a later patch in this series will set fi->parallel_direct_writes
>> and note that it makes basically no difference when running fio on the
>> current libfuse-based version of our code.  It does make a difference
>> without libfuse.  So something quite fishy is going on.
>>
>> I will try to investigate further what the root cause is, but I think
>> for now let's assume that calling blk_pwrite() can invalidate the buffer
>> contents through nested polling.
>>
>> Cc: qemu-stable@nongnu.org
>> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
>> ---
>>   block/export/fuse.c | 24 +++++++++++++++++++++---
>>   1 file changed, 21 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/export/fuse.c b/block/export/fuse.c
>> index 465cc9891d..a12f479492 100644
>> --- a/block/export/fuse.c
>> +++ b/block/export/fuse.c
>> @@ -301,6 +301,12 @@ static void read_from_fuse_export(void *opaque)
>>           goto out;
>>       }
>>   
>> +    /*
>> +     * Note that polling in any request-processing function can lead to a nested
>> +     * read_from_fuse_export() call, which will overwrite the contents of
>> +     * exp->fuse_buf.  Anything that takes a buffer needs to take care that the
>> +     * content is copied before potentially polling.
>> +     */
>>       fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf);
> It seems safer to allocate a fuse_buf per request instead copying the
> data buffer only for write requests. Other request types might be
> affected too (e.g. nested reads of different sizes).

I don’t think anything else is affected, but I absolutely agree that it 
would be more obviously safe to have a dedicated buffer for each request.

However, I decided to do it this way so I wouldn’t negatively affect 
performance before converting to coroutines – I felt it would be a bit 
unfair.  But if you think that’s alright, I’m happy to use a dedicated 
buffer per request!

>
> I guess later on in this series a per-request fuse_buf will be
> introduced anyway, so it doesn't matter what we do in this commit.

Kind of; this one is CC-ed to qemu-stable (the rest is not), so it does 
matter for stable releases.

Hanna

>
>>   
>>   out:
>> @@ -624,6 +630,7 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, const char *buf,
>>                          size_t size, off_t offset, struct fuse_file_info *fi)
>>   {
>>       FuseExport *exp = fuse_req_userdata(req);
>> +    void *copied;
>>       int64_t length;
>>       int ret;
>>   
>> @@ -638,6 +645,14 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, const char *buf,
>>           return;
>>       }
>>   
>> +    /*
>> +     * Heed the note on read_from_fuse_export(): If we poll (which any blk_*()
>> +     * I/O function may do), read_from_fuse_export() may be nested, overwriting
>> +     * the request buffer content.  Therefore, we must copy it here.
>> +     */
>> +    copied = blk_blockalign(exp->common.blk, size);
>> +    memcpy(copied, buf, size);
>> +
>>       /**
>>        * Clients will expect short writes at EOF, so we have to limit
>>        * offset+size to the image length.
>> @@ -645,7 +660,7 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, const char *buf,
>>       length = blk_getlength(exp->common.blk);
>>       if (length < 0) {
>>           fuse_reply_err(req, -length);
>> -        return;
>> +        goto free_buffer;
>>       }
>>   
>>       if (offset + size > length) {
>> @@ -653,19 +668,22 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, const char *buf,
>>               ret = fuse_do_truncate(exp, offset + size, true, PREALLOC_MODE_OFF);
>>               if (ret < 0) {
>>                   fuse_reply_err(req, -ret);
>> -                return;
>> +                goto free_buffer;
>>               }
>>           } else {
>>               size = length - offset;
>>           }
>>       }
>>   
>> -    ret = blk_pwrite(exp->common.blk, offset, size, buf, 0);
>> +    ret = blk_pwrite(exp->common.blk, offset, size, copied, 0);
>>       if (ret >= 0) {
>>           fuse_reply_write(req, size);
>>       } else {
>>           fuse_reply_err(req, -ret);
>>       }
>> +
>> +free_buffer:
>> +    qemu_vfree(copied);
>>   }
>>   
>>   /**
>> -- 
>> 2.48.1
>>
>>



  reply	other threads:[~2025-04-04 11:18 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-25 16:05 [PATCH 00/15] export/fuse: Use coroutines and multi-threading Hanna Czenczek
2025-03-25 16:06 ` [PATCH 01/15] fuse: Copy write buffer content before polling Hanna Czenczek
2025-03-27 14:47   ` Stefan Hajnoczi
2025-04-04 11:17     ` Hanna Czenczek [this message]
2025-04-01 13:44   ` Eric Blake
2025-04-04 11:18     ` Hanna Czenczek
2025-03-25 16:06 ` [PATCH 02/15] fuse: Ensure init clean-up even with error_fatal Hanna Czenczek
2025-03-26  5:47   ` Markus Armbruster
2025-03-26  9:49     ` Hanna Czenczek
2025-03-27 14:51   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 03/15] fuse: Remove superfluous empty line Hanna Czenczek
2025-03-27 14:53   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 04/15] fuse: Explicitly set inode ID to 1 Hanna Czenczek
2025-03-27 14:54   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 05/15] fuse: Change setup_... to mount_fuse_export() Hanna Czenczek
2025-03-27 14:55   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 06/15] fuse: Fix mount options Hanna Czenczek
2025-03-27 14:58   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 07/15] fuse: Set direct_io and parallel_direct_writes Hanna Czenczek
2025-03-27 15:09   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 08/15] fuse: Introduce fuse_{at,de}tach_handlers() Hanna Czenczek
2025-03-27 15:12   ` Stefan Hajnoczi
2025-04-01 13:55   ` Eric Blake
2025-04-04 11:24     ` Hanna Czenczek
2025-03-25 16:06 ` [PATCH 09/15] fuse: Introduce fuse_{inc,dec}_in_flight() Hanna Czenczek
2025-03-27 15:13   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 10/15] fuse: Add halted flag Hanna Czenczek
2025-03-27 15:15   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 11/15] fuse: Manually process requests (without libfuse) Hanna Czenczek
2025-03-27 15:35   ` Stefan Hajnoczi
2025-04-04 12:36     ` Hanna Czenczek
2025-04-01 14:35   ` Eric Blake
2025-04-04 11:30     ` Hanna Czenczek
2025-04-04 11:42     ` Hanna Czenczek
2025-03-25 16:06 ` [PATCH 12/15] fuse: Reduce max read size Hanna Czenczek
2025-03-27 15:35   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 13/15] fuse: Process requests in coroutines Hanna Czenczek
2025-03-27 15:38   ` Stefan Hajnoczi
2025-03-25 16:06 ` [PATCH 14/15] fuse: Implement multi-threading Hanna Czenczek
2025-03-26  5:38   ` Markus Armbruster
2025-03-26  9:55     ` Hanna Czenczek
2025-03-26 11:41       ` Markus Armbruster
2025-03-26 13:56         ` Hanna Czenczek
2025-03-27 12:18           ` Markus Armbruster via
2025-03-27 13:45             ` Hanna Czenczek
2025-04-01 12:05               ` Kevin Wolf
2025-04-01 20:31                 ` Eric Blake
2025-04-04 12:45                 ` Hanna Czenczek
2025-03-27 15:55   ` Stefan Hajnoczi
2025-04-01 20:36     ` Eric Blake
2025-04-02 13:20       ` Stefan Hajnoczi
2025-04-03 17:59         ` Eric Blake
2025-04-04 12:49     ` Hanna Czenczek
2025-04-07 14:02       ` Stefan Hajnoczi
2025-04-01 14:58   ` Eric Blake
2025-03-25 16:06 ` [PATCH 15/15] fuse: Increase MAX_WRITE_SIZE with a second buffer Hanna Czenczek
2025-03-27 15:59   ` Stefan Hajnoczi
2025-04-01 20:24   ` Eric Blake
2025-04-04 13:04     ` Hanna Czenczek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=68d0bd72-3a72-4370-8178-45dda9889a56@redhat.com \
    --to=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    --cc=stefanha@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).