From: Kevin Wolf <kwolf@redhat.com>
To: Hanna Czenczek <hreitz@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org,
Aarushi Mehta <mehta.aaru20@gmail.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Stefano Garzarella <sgarzare@redhat.com>
Subject: Re: [PATCH for-11.0 3/3] io-uring: Resubmit tails of short writes
Date: Mon, 23 Mar 2026 20:05:10 +0100 [thread overview]
Message-ID: <acGO2kZ6Z5cHeWiu@redhat.com> (raw)
In-Reply-To: <20260318153206.171494-4-hreitz@redhat.com>
Am 18.03.2026 um 16:32 hat Hanna Czenczek geschrieben:
> Short writes can happen, too, not just short reads. The difference to
> aio=native is that the kernel will actually retry the tail of short
> requests internally already -- so it is harder to reproduce. But if the
> tail of a short request returns an error to the kernel, we will see it
> in userspace still. To reproduce this, apply the following patch on top
> of the one shown in HEAD^ (again %s/escaped // to apply):
>
> escaped diff --git a/block/export/fuse.c b/block/export/fuse.c
> escaped index 67dc50a412..2b98489a32 100644
> escaped --- a/block/export/fuse.c
> escaped +++ b/block/export/fuse.c
> @@ -1059,8 +1059,15 @@ fuse_co_read(FuseExport *exp, void **bufptr, uint64_t offset, uint32_t size)
> int64_t blk_len;
> void *buf;
> int ret;
> + static uint32_t error_size;
>
> - size = MIN(size, 4096);
> + if (error_size == size) {
> + error_size = 0;
> + return -EIO;
> + } else if (size > 4096) {
> + error_size = size - 4096;
> + size = 4096;
> + }
>
> /* Limited by max_read, should not happen */
> if (size > FUSE_MAX_READ_BYTES) {
> @@ -1111,8 +1118,15 @@ fuse_co_write(FuseExport *exp, struct fuse_write_out *out,
> {
> int64_t blk_len;
> int ret;
> + static uint32_t error_size;
>
> - size = MIN(size, 4096);
> + if (error_size == size) {
> + error_size = 0;
> + return -EIO;
> + } else if (size > 4096) {
> + error_size = size - 4096;
> + size = 4096;
> + }
>
> QEMU_BUILD_BUG_ON(FUSE_MAX_WRITE_BYTES > BDRV_REQUEST_MAX_BYTES);
> /* Limited by max_write, should not happen */
>
> I know this is a bit artificial because to produce this, there must be
> an I/O error somewhere anyway, but if it does happen, qemu will
> understand it to mean ENOSPC for short writes, which is incorrect. So I
> believe we need to resubmit the tail to maybe have it succeed now, or at
> least get the correct error code.
>
> Reproducer as before:
> $ ./qemu-img create -f raw test.raw 8k
> Formatting 'test.raw', fmt=raw size=8192
> $ ./qemu-io -f raw -c 'write -P 42 0 8k' test.raw
> wrote 8192/8192 bytes at offset 0
> 8 KiB, 1 ops; 00.00 sec (64.804 MiB/sec and 8294.9003 ops/sec)
> $ hexdump -C test.raw
> 00000000 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a |****************|
> *
> 00002000
> $ storage-daemon/qemu-storage-daemon \
> --blockdev file,node-name=test,filename=test.raw \
> --export fuse,id=exp,node-name=test,mountpoint=test.raw,writable=true
>
> $ ./qemu-io --image-opts -c 'read -P 23 0 8k' \
> driver=file,filename=test.raw,cache.direct=on,aio=io_uring
> read 8192/8192 bytes at offset 0
> 8 KiB, 1 ops; 00.00 sec (58.481 MiB/sec and 7485.5342 ops/sec)
> $ ./qemu-io --image-opts -c 'write -P 23 0 8k' \
> driver=file,filename=test.raw,cache.direct=on,aio=io_uring
> write failed: No space left on device
> $ hexdump -C test.raw
> 00000000 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 17 |................|
> *
> 00001000 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a 2a |****************|
> *
> 00002000
>
> So short reads already work (because there is code for that), but short
> writes incorrectly produce ENOSPC. This patch fixes that by
> resubmitting not only the tail of short reads but short writes also.
>
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> @@ -44,6 +44,10 @@ static void luring_prep_sqe(struct io_uring_sqe *sqe, void *opaque)
uint64_t offset = req->offset;
> int fd = req->fd;
> BdrvRequestFlags flags = req->flags;
>
> + if (req->resubmit_qiov.iov != NULL) {
> + qiov = &req->resubmit_qiov;
> + }
> +
We could have offset = req->offset + req->total_done again instead of
adding them in each case below, like I already commented on linux-aio.c.
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
next prev parent reply other threads:[~2026-03-23 19:06 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-18 15:32 [PATCH for-11.0 0/3] linux-aio/io-uring: Resubmit tails of short requests Hanna Czenczek
2026-03-18 15:32 ` [PATCH for-11.0 1/3] linux-aio: Put all parameters into qemu_laiocb Hanna Czenczek
2026-03-23 16:36 ` Kevin Wolf
2026-03-23 17:02 ` Hanna Czenczek
2026-03-23 17:04 ` Hanna Czenczek
2026-03-23 19:10 ` Kevin Wolf
2026-03-18 15:32 ` [PATCH for-11.0 2/3] linux-aio: Resubmit tails of short reads/writes Hanna Czenczek
2026-03-23 17:12 ` Kevin Wolf
2026-03-24 8:12 ` Hanna Czenczek
2026-03-24 8:22 ` Hanna Czenczek
2026-03-24 9:22 ` Kevin Wolf
2026-03-24 10:04 ` Hanna Czenczek
2026-03-18 15:32 ` [PATCH for-11.0 3/3] io-uring: Resubmit tails of short writes Hanna Czenczek
2026-03-23 19:05 ` Kevin Wolf [this message]
2026-03-23 16:28 ` [PATCH for-11.0 0/3] linux-aio/io-uring: Resubmit tails of short requests Kevin Wolf
2026-03-23 16:59 ` Hanna Czenczek
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acGO2kZ6Z5cHeWiu@redhat.com \
--to=kwolf@redhat.com \
--cc=hreitz@redhat.com \
--cc=mehta.aaru20@gmail.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=sgarzare@redhat.com \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox