* [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl
@ 2026-01-09 12:08 Fiona Ebner
2026-01-09 12:08 ` [PATCH 1/6] block/io: pass alignment to bdrv_init_padding() Fiona Ebner
` (7 more replies)
0 siblings, 8 replies; 18+ messages in thread
From: Fiona Ebner @ 2026-01-09 12:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, hreitz, kwolf, fam, stefanha
Previous discussion here:
https://lore.kernel.org/qemu-devel/20260105143416.737482-1-f.ebner@proxmox.com/
Commit 5634622bcb ("file-posix: allow BLKZEROOUT with -t writeback")
enables the BLKZEROOUT ioctl when using 'writeback' cache, regressing
certain 'qemu-img convert' invocations, because of a pre-existing
issue. Namely, the BLKZEROOUT ioctl might fail with errno EINVAL when
the request is shorter than the block size of the block device.
Stefan suggested prioritizing bl.pwrite_zeroes_alignment in
bdrv_co_do_zero_pwritev(). This RFC explores that approach and the
issues with qcow2 I encountered, where
bl.pwrite_zeroes_alignment = s->subcluster_size;
I would be happy to discuss potential solutions and whether we should
use this approach after all.
For example, in iotest 154 and 271, there are assertion failures,
because the padded request extends beyond the end of the image:
Assertion `offset + bytes <= bs->total_sectors * BDRV_SECTOR_SIZE ||
child->perm & BLK_PERM_RESIZE' failed.
The total image length is not necessarily aligned to the cluster size.
This could be solved by shortening the relevant requests in
bdrv_co_do_zero_pwritev() and submitting them without the
BDRV_REQ_ZERO_WRITE flag and with bl.request_alignment as the
alignment see patch 5/6.
For iotest 179, I would need to avoid clearing BDRV_REQ_ZERO_WRITE for
the head and tail parts as long as the buffer is fully zero.
Otherwise, we end up with more 'data' sectors in the target map. See
patch 6/6. With or without that, iotests 154 and 271 produces
different output (I think it might be expected, but haven't checked in
detail yet).
Another issue is exposed by iotest 177, where the (sub-)cluster size
is 1MiB, but max-transfer is only 64KiB leading to assertion failures,
because max_transfer =
QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX), align);
evaluates to 0 (because align > bs->bl.max_transfer). This could be
fixed by safeguarding doing the QEMU_ALIGN_DOWN only if the value is
bigger than align, see patch 4/6.
I'm also not sure what to do about iotest 204 and 177 which use
'opt-write-zero=15M' for the blkdebug driver (which assigns that value
to pwrite_zeroes_alignment) making an is_power_of_2(align) assertion
fail.
Yet another issue is the 'detect_zeroes' option. If the option is set,
bdrv_aligned_pwritev() might set the BDRV_REQ_ZERO_WRITE flag even if
the request is not aligned to pwrite_zeroes_alignment and the original
bug could resurface.
Best Regards,
Fiona
Fiona Ebner (6):
block/io: pass alignment to bdrv_init_padding()
block/io: add 'bytes' parameter to bdrv_padding_rmw_read()
block/io: honor pwrite_zeroes_alignment in bdrv_co_do_zero_pwritev()
block/io: safeguard max transfer calculation in bdrv_aligned_pwritev()
block/io: handle image length not aligned to write zeroes alignment in
bdrv_co_do_zero_pwritev()
block/io: keep zero flag for head/tail parts of misaligned zero write
when possible
block/io.c | 78 ++++++++++++++++++++++++++++++++++++++----------------
1 file changed, 55 insertions(+), 23 deletions(-)
--
2.47.3
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/6] block/io: pass alignment to bdrv_init_padding()
2026-01-09 12:08 [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl Fiona Ebner
@ 2026-01-09 12:08 ` Fiona Ebner
2026-01-09 12:08 ` [PATCH 2/6] block/io: add 'bytes' parameter to bdrv_padding_rmw_read() Fiona Ebner
` (6 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Fiona Ebner @ 2026-01-09 12:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, hreitz, kwolf, fam, stefanha
Currently, all callers use bs->bl.request_alignment. This will
change in the next commit when making bdrv_co_do_zero_pwritev() honor
the block driver's pwrite_zeroes_alignment.
uint64_t is chosen as the type, because that is more common in other
functions already and in particular, in the bdrv_co_do_zero_pwritev()
caller.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/io.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/block/io.c b/block/io.c
index cace297f22..e80aecf194 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1474,10 +1474,9 @@ typedef struct BdrvRequestPadding {
static bool bdrv_init_padding(BlockDriverState *bs,
int64_t offset, int64_t bytes,
- bool write,
+ uint64_t align, bool write,
BdrvRequestPadding *pad)
{
- int64_t align = bs->bl.request_alignment;
int64_t sum;
bdrv_check_request(offset, bytes, &error_abort);
@@ -1716,6 +1715,7 @@ static int bdrv_pad_request(BlockDriverState *bs,
struct iovec *sliced_iov;
int sliced_niov;
size_t sliced_head, sliced_tail;
+ uint64_t align = bs->bl.request_alignment;
/* Should have been checked by the caller already */
ret = bdrv_check_request32(*offset, *bytes, *qiov, *qiov_offset);
@@ -1723,7 +1723,7 @@ static int bdrv_pad_request(BlockDriverState *bs,
return ret;
}
- if (!bdrv_init_padding(bs, *offset, *bytes, write, pad)) {
+ if (!bdrv_init_padding(bs, *offset, *bytes, align, write, pad)) {
if (padded) {
*padded = false;
}
@@ -2161,7 +2161,7 @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
/* This flag doesn't make sense for padding or zero writes */
flags &= ~BDRV_REQ_REGISTERED_BUF;
- padding = bdrv_init_padding(bs, offset, bytes, true, &pad);
+ padding = bdrv_init_padding(bs, offset, bytes, align, true, &pad);
if (padding) {
assert(!(flags & BDRV_REQ_NO_WAIT));
bdrv_make_request_serialising(req, align);
--
2.47.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 2/6] block/io: add 'bytes' parameter to bdrv_padding_rmw_read()
2026-01-09 12:08 [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl Fiona Ebner
2026-01-09 12:08 ` [PATCH 1/6] block/io: pass alignment to bdrv_init_padding() Fiona Ebner
@ 2026-01-09 12:08 ` Fiona Ebner
2026-01-09 12:08 ` [PATCH 3/6] block/io: honor pwrite_zeroes_alignment in bdrv_co_do_zero_pwritev() Fiona Ebner
` (5 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Fiona Ebner @ 2026-01-09 12:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, hreitz, kwolf, fam, stefanha
Currently, all callers use bs->bl.request_alignment. This will change
when making bdrv_co_do_zero_pwritev() honor the block driver's
pwrite_zeroes_alignment. Note that the alignment for reading is not
necessarily the alignment for the request. In particular with qcow2
images, the total image length is not necessarily aligned to
pwrite_zeroes_alignment, and in such cases, a request with that bigger
alignment would cause an assertion failure.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/io.c | 16 +++++++++-------
1 file changed, 9 insertions(+), 7 deletions(-)
diff --git a/block/io.c b/block/io.c
index e80aecf194..403a45f91d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1511,7 +1511,7 @@ static bool bdrv_init_padding(BlockDriverState *bs,
}
static int coroutine_fn GRAPH_RDLOCK
-bdrv_padding_rmw_read(BdrvChild *child, BdrvTrackedRequest *req,
+bdrv_padding_rmw_read(BdrvChild *child, BdrvTrackedRequest *req, int64_t bytes,
BdrvRequestPadding *pad, bool zero_middle)
{
QEMUIOVector local_qiov;
@@ -1522,7 +1522,9 @@ bdrv_padding_rmw_read(BdrvChild *child, BdrvTrackedRequest *req,
assert(req->serialising && pad->buf);
if (pad->head || pad->merge_reads) {
- int64_t bytes = pad->merge_reads ? pad->buf_len : align;
+ if (pad->merge_reads) {
+ bytes = pad->buf_len;
+ }
qemu_iovec_init_buf(&local_qiov, pad->buf, bytes);
@@ -1550,13 +1552,13 @@ bdrv_padding_rmw_read(BdrvChild *child, BdrvTrackedRequest *req,
}
if (pad->tail) {
- qemu_iovec_init_buf(&local_qiov, pad->tail_buf, align);
+ qemu_iovec_init_buf(&local_qiov, pad->tail_buf, bytes);
bdrv_co_debug_event(bs, BLKDBG_PWRITEV_RMW_TAIL);
ret = bdrv_aligned_preadv(
child, req,
- req->overlap_offset + req->overlap_bytes - align,
- align, align, &local_qiov, 0, 0);
+ req->overlap_offset + req->overlap_bytes - bytes,
+ bytes, align, &local_qiov, 0, 0);
if (ret < 0) {
return ret;
}
@@ -2166,7 +2168,7 @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
assert(!(flags & BDRV_REQ_NO_WAIT));
bdrv_make_request_serialising(req, align);
- bdrv_padding_rmw_read(child, req, &pad, true);
+ bdrv_padding_rmw_read(child, req, align, &pad, true);
if (pad.head || pad.merge_reads) {
int64_t aligned_offset = offset & ~(align - 1);
@@ -2302,7 +2304,7 @@ int coroutine_fn bdrv_co_pwritev_part(BdrvChild *child,
*/
assert(!(flags & BDRV_REQ_NO_WAIT));
bdrv_make_request_serialising(&req, align);
- bdrv_padding_rmw_read(child, &req, &pad, false);
+ bdrv_padding_rmw_read(child, &req, align, &pad, false);
}
ret = bdrv_aligned_pwritev(child, &req, offset, bytes, align,
--
2.47.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 3/6] block/io: honor pwrite_zeroes_alignment in bdrv_co_do_zero_pwritev()
2026-01-09 12:08 [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl Fiona Ebner
2026-01-09 12:08 ` [PATCH 1/6] block/io: pass alignment to bdrv_init_padding() Fiona Ebner
2026-01-09 12:08 ` [PATCH 2/6] block/io: add 'bytes' parameter to bdrv_padding_rmw_read() Fiona Ebner
@ 2026-01-09 12:08 ` Fiona Ebner
2026-01-09 12:08 ` [PATCH 4/6] block/io: safeguard max transfer calculation in bdrv_aligned_pwritev() Fiona Ebner
` (4 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Fiona Ebner @ 2026-01-09 12:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, hreitz, kwolf, fam, stefanha
Commit 5634622bcb ("file-posix: allow BLKZEROOUT with -t writeback")
enables the BLKZEROOUT ioctl when using 'writeback' cache, regressing
certain 'qemu-img convert' invocations, because of a pre-existing
issue. Namely, the BLKZEROOUT ioctl might fail with errno EINVAL when
the request is shorter than the block size of the block device. Ensure
that bdrv_co_do_zero_pwritev() only issues requests respecting the
block driver's pwrite_zeroes_alignment. The file-posix host_device
block driver already initializes that value.
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/3257
Resolves: https://bugzilla.proxmox.com/show_bug.cgi?id=7197
Cc: qemu-stable@nongnu.org
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
This breaks the following iotests with qcow2:
154 177 179 204 271
See the following commits for details.
If we decide to go with the current approach, I will re-order in the
next version to avoid the temporary breakage, but I felt like it would
be easier to look at the issues with this patch order here.
block/io.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/block/io.c b/block/io.c
index 403a45f91d..12dc153573 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2155,7 +2155,8 @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
{
BlockDriverState *bs = child->bs;
QEMUIOVector local_qiov;
- uint64_t align = bs->bl.request_alignment;
+ uint64_t align = MAX(bs->bl.pwrite_zeroes_alignment,
+ bs->bl.request_alignment);
int ret = 0;
bool padding;
BdrvRequestPadding pad;
--
2.47.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 4/6] block/io: safeguard max transfer calculation in bdrv_aligned_pwritev()
2026-01-09 12:08 [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl Fiona Ebner
` (2 preceding siblings ...)
2026-01-09 12:08 ` [PATCH 3/6] block/io: honor pwrite_zeroes_alignment in bdrv_co_do_zero_pwritev() Fiona Ebner
@ 2026-01-09 12:08 ` Fiona Ebner
2026-01-19 19:34 ` Stefan Hajnoczi
2026-01-09 12:08 ` [PATCH 5/6] block/io: handle image length not aligned to write zeroes alignment in bdrv_co_do_zero_pwritev() Fiona Ebner
` (3 subsequent siblings)
7 siblings, 1 reply; 18+ messages in thread
From: Fiona Ebner @ 2026-01-09 12:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, hreitz, kwolf, fam, stefanha
This partially fixes iotest 177 with qcow2, where max_transfer is
64KiB, but the cluster size and thus pwrite_zeroes_alignment is 1MiB.
Previously, max_transfer would be calculated as 0, triggering an
assertion later.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/io.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/block/io.c b/block/io.c
index 12dc153573..233b2617ea 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2087,8 +2087,10 @@ bdrv_aligned_pwritev(BdrvChild *child, BdrvTrackedRequest *req,
assert(is_power_of_2(align));
assert((offset & (align - 1)) == 0);
assert((bytes & (align - 1)) == 0);
- max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
- align);
+ max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX);
+ if (max_transfer > align) {
+ max_transfer = QEMU_ALIGN_DOWN(max_transfer, align);
+ }
ret = bdrv_co_write_req_prepare(child, offset, bytes, req, flags);
--
2.47.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 5/6] block/io: handle image length not aligned to write zeroes alignment in bdrv_co_do_zero_pwritev()
2026-01-09 12:08 [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl Fiona Ebner
` (3 preceding siblings ...)
2026-01-09 12:08 ` [PATCH 4/6] block/io: safeguard max transfer calculation in bdrv_aligned_pwritev() Fiona Ebner
@ 2026-01-09 12:08 ` Fiona Ebner
2026-01-09 12:08 ` [PATCH 6/6] block/io: keep zero flag for head/tail parts of misaligned zero write when possible Fiona Ebner
` (2 subsequent siblings)
7 siblings, 0 replies; 18+ messages in thread
From: Fiona Ebner @ 2026-01-09 12:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, hreitz, kwolf, fam, stefanha
When making the request shorter, use bl.request_alignment as the
alignment. This partially fixes iotests 154 and 271 with qcow2.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/io.c | 38 +++++++++++++++++++++++++++++---------
1 file changed, 29 insertions(+), 9 deletions(-)
diff --git a/block/io.c b/block/io.c
index 233b2617ea..d92b30bce5 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2162,6 +2162,7 @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
int ret = 0;
bool padding;
BdrvRequestPadding pad;
+ uint64_t total_length = bs->total_sectors * BDRV_SECTOR_SIZE;
/* This flag doesn't make sense for padding or zero writes */
flags &= ~BDRV_REQ_REGISTERED_BUF;
@@ -2177,10 +2178,21 @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
int64_t aligned_offset = offset & ~(align - 1);
int64_t write_bytes = pad.merge_reads ? pad.buf_len : align;
- qemu_iovec_init_buf(&local_qiov, pad.buf, write_bytes);
- ret = bdrv_aligned_pwritev(child, req, aligned_offset, write_bytes,
- align, &local_qiov, 0,
- flags & ~BDRV_REQ_ZERO_WRITE);
+ if (total_length >= aligned_offset + write_bytes) {
+ qemu_iovec_init_buf(&local_qiov, pad.buf, write_bytes);
+ ret = bdrv_aligned_pwritev(child, req, aligned_offset,
+ write_bytes, align, &local_qiov, 0,
+ flags & ~BDRV_REQ_ZERO_WRITE);
+ } else {
+ write_bytes = total_length - aligned_offset;
+ qemu_iovec_init_buf(&local_qiov, pad.buf, write_bytes);
+ ret = bdrv_aligned_pwritev(child, req, aligned_offset,
+ write_bytes,
+ bs->bl.request_alignment,
+ &local_qiov, 0,
+ flags & ~BDRV_REQ_ZERO_WRITE);
+ }
+
if (ret < 0 || pad.merge_reads) {
/* Error or all work is done */
goto out;
@@ -2205,12 +2217,20 @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
assert(!bytes || (offset & (align - 1)) == 0);
if (bytes) {
- assert(align == pad.tail + bytes);
+ if (total_length >= offset + align) {
+ assert(align == pad.tail + bytes);
- qemu_iovec_init_buf(&local_qiov, pad.tail_buf, align);
- ret = bdrv_aligned_pwritev(child, req, offset, align, align,
- &local_qiov, 0,
- flags & ~BDRV_REQ_ZERO_WRITE);
+ qemu_iovec_init_buf(&local_qiov, pad.tail_buf, align);
+ ret = bdrv_aligned_pwritev(child, req, offset, align, align,
+ &local_qiov, 0,
+ flags & ~BDRV_REQ_ZERO_WRITE);
+ } else {
+ int64_t write_bytes = total_length - offset;
+ qemu_iovec_init_buf(&local_qiov, pad.tail_buf, write_bytes);
+ ret = bdrv_aligned_pwritev(child, req, offset, write_bytes,
+ bs->bl.request_alignment, &local_qiov, 0,
+ flags & ~BDRV_REQ_ZERO_WRITE);
+ }
}
out:
--
2.47.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH 6/6] block/io: keep zero flag for head/tail parts of misaligned zero write when possible
2026-01-09 12:08 [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl Fiona Ebner
` (4 preceding siblings ...)
2026-01-09 12:08 ` [PATCH 5/6] block/io: handle image length not aligned to write zeroes alignment in bdrv_co_do_zero_pwritev() Fiona Ebner
@ 2026-01-09 12:08 ` Fiona Ebner
2026-02-02 22:10 ` Stefan Hajnoczi
2026-01-19 19:38 ` [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl Stefan Hajnoczi
2026-02-02 22:16 ` Stefan Hajnoczi
7 siblings, 1 reply; 18+ messages in thread
From: Fiona Ebner @ 2026-01-09 12:08 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, hreitz, kwolf, fam, stefanha
This fixes io-test 179 with qcow2. Otherwise, there would be more 'data'
sectors.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
Not sure if we should even attempt this without considering
detect-zeroes. While it does keep 179 working, it is a change in
behavior, since it could turn previously data sectors (with zeroes)
into zero sectors when they are part of the head or tail of a zero
write.
Would it even be tolerable from a performance perspective?
Remaining test failures after this patch are:
154 177 204 271
177 and 204 use opt-write-zero=15M with blkdebug, making an
is_power_of_2(align) assertion fail.
Changes in the output of 154 and 271 might be expected, but I didn't
look into it in detail yet, as I wanted to discuss everything first.
block/io.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/block/io.c b/block/io.c
index d92b30bce5..34b523c951 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2179,10 +2179,14 @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
int64_t write_bytes = pad.merge_reads ? pad.buf_len : align;
if (total_length >= aligned_offset + write_bytes) {
+ BdrvRequestFlags head_flags =
+ buffer_is_zero(pad.buf, write_bytes) ?
+ flags : flags & ~BDRV_REQ_ZERO_WRITE;
+
qemu_iovec_init_buf(&local_qiov, pad.buf, write_bytes);
ret = bdrv_aligned_pwritev(child, req, aligned_offset,
write_bytes, align, &local_qiov, 0,
- flags & ~BDRV_REQ_ZERO_WRITE);
+ head_flags);
} else {
write_bytes = total_length - aligned_offset;
qemu_iovec_init_buf(&local_qiov, pad.buf, write_bytes);
@@ -2220,10 +2224,13 @@ bdrv_co_do_zero_pwritev(BdrvChild *child, int64_t offset, int64_t bytes,
if (total_length >= offset + align) {
assert(align == pad.tail + bytes);
+ BdrvRequestFlags tail_flags =
+ buffer_is_zero(pad.tail_buf + bytes, pad.tail) ?
+ flags : flags & ~BDRV_REQ_ZERO_WRITE;
+
qemu_iovec_init_buf(&local_qiov, pad.tail_buf, align);
ret = bdrv_aligned_pwritev(child, req, offset, align, align,
- &local_qiov, 0,
- flags & ~BDRV_REQ_ZERO_WRITE);
+ &local_qiov, 0, tail_flags);
} else {
int64_t write_bytes = total_length - offset;
qemu_iovec_init_buf(&local_qiov, pad.tail_buf, write_bytes);
--
2.47.3
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 4/6] block/io: safeguard max transfer calculation in bdrv_aligned_pwritev()
2026-01-09 12:08 ` [PATCH 4/6] block/io: safeguard max transfer calculation in bdrv_aligned_pwritev() Fiona Ebner
@ 2026-01-19 19:34 ` Stefan Hajnoczi
2026-02-05 15:57 ` Kevin Wolf
0 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2026-01-19 19:34 UTC (permalink / raw)
To: Fiona Ebner; +Cc: qemu-devel, qemu-block, hreitz, kwolf, fam
[-- Attachment #1: Type: text/plain, Size: 1854 bytes --]
On Fri, Jan 09, 2026 at 01:08:31PM +0100, Fiona Ebner wrote:
> This partially fixes iotest 177 with qcow2, where max_transfer is
> 64KiB, but the cluster size and thus pwrite_zeroes_alignment is 1MiB.
> Previously, max_transfer would be calculated as 0, triggering an
> assertion later.
>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> block/io.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/block/io.c b/block/io.c
> index 12dc153573..233b2617ea 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2087,8 +2087,10 @@ bdrv_aligned_pwritev(BdrvChild *child, BdrvTrackedRequest *req,
> assert(is_power_of_2(align));
> assert((offset & (align - 1)) == 0);
> assert((bytes & (align - 1)) == 0);
> - max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
> - align);
> + max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX);
> + if (max_transfer > align) {
> + max_transfer = QEMU_ALIGN_DOWN(max_transfer, align);
> + }
max_transfer < align seems paradoxical. It's a situation where the
largest allowed I/O request cannot meet alignment requirements.
Every other place that uses max_transfer in QEMU would also need to cope
with this.
block/blkdebug.c:blkdebug_open() fails if max_transfer is not aligned,
indicating that there are assumptions at least in some places that
max_transfer is aligned.
I hesitate to make this change because I fear it will break more things.
Why was max_transfer 64KB while pwrite_zeroes_alignment was 1MiB? Either
max_transfer should fit the alignment or pwrite_zeroes code needs to
distinguish between actual write zeroes operations and read-write-modify
I/O (which is plain read/write and not subject to
pwrite_zeroes_alignment).
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl
2026-01-09 12:08 [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl Fiona Ebner
` (5 preceding siblings ...)
2026-01-09 12:08 ` [PATCH 6/6] block/io: keep zero flag for head/tail parts of misaligned zero write when possible Fiona Ebner
@ 2026-01-19 19:38 ` Stefan Hajnoczi
2026-02-02 22:16 ` Stefan Hajnoczi
7 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2026-01-19 19:38 UTC (permalink / raw)
To: Fiona Ebner; +Cc: qemu-devel, qemu-block, hreitz, kwolf, fam
[-- Attachment #1: Type: text/plain, Size: 3844 bytes --]
On Fri, Jan 09, 2026 at 01:08:27PM +0100, Fiona Ebner wrote:
> Previous discussion here:
> https://lore.kernel.org/qemu-devel/20260105143416.737482-1-f.ebner@proxmox.com/
>
> Commit 5634622bcb ("file-posix: allow BLKZEROOUT with -t writeback")
> enables the BLKZEROOUT ioctl when using 'writeback' cache, regressing
> certain 'qemu-img convert' invocations, because of a pre-existing
> issue. Namely, the BLKZEROOUT ioctl might fail with errno EINVAL when
> the request is shorter than the block size of the block device.
>
> Stefan suggested prioritizing bl.pwrite_zeroes_alignment in
> bdrv_co_do_zero_pwritev(). This RFC explores that approach and the
> issues with qcow2 I encountered, where
> bl.pwrite_zeroes_alignment = s->subcluster_size;
> I would be happy to discuss potential solutions and whether we should
> use this approach after all.
These issues are a headache, but I think it's important for us to
consider them. They indicate that QEMU does not properly distinguish
between read/write and pwrite_zeroes constraints.
If we can agree on how the block layer should handle pwrite_zeroes
constraints in a consistent way that makes the tests pass, then that
should serve the QEMU block layer well in the future.
I will mention this patch series to Kevin as well so we can get his
opinion.
>
> For example, in iotest 154 and 271, there are assertion failures,
> because the padded request extends beyond the end of the image:
> Assertion `offset + bytes <= bs->total_sectors * BDRV_SECTOR_SIZE ||
> child->perm & BLK_PERM_RESIZE' failed.
> The total image length is not necessarily aligned to the cluster size.
> This could be solved by shortening the relevant requests in
> bdrv_co_do_zero_pwritev() and submitting them without the
> BDRV_REQ_ZERO_WRITE flag and with bl.request_alignment as the
> alignment see patch 5/6.
>
> For iotest 179, I would need to avoid clearing BDRV_REQ_ZERO_WRITE for
> the head and tail parts as long as the buffer is fully zero.
> Otherwise, we end up with more 'data' sectors in the target map. See
> patch 6/6. With or without that, iotests 154 and 271 produces
> different output (I think it might be expected, but haven't checked in
> detail yet).
>
> Another issue is exposed by iotest 177, where the (sub-)cluster size
> is 1MiB, but max-transfer is only 64KiB leading to assertion failures,
> because max_transfer =
> QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX), align);
> evaluates to 0 (because align > bs->bl.max_transfer). This could be
> fixed by safeguarding doing the QEMU_ALIGN_DOWN only if the value is
> bigger than align, see patch 4/6.
>
> I'm also not sure what to do about iotest 204 and 177 which use
> 'opt-write-zero=15M' for the blkdebug driver (which assigns that value
> to pwrite_zeroes_alignment) making an is_power_of_2(align) assertion
> fail.
>
> Yet another issue is the 'detect_zeroes' option. If the option is set,
> bdrv_aligned_pwritev() might set the BDRV_REQ_ZERO_WRITE flag even if
> the request is not aligned to pwrite_zeroes_alignment and the original
> bug could resurface.
>
> Best Regards,
> Fiona
>
>
> Fiona Ebner (6):
> block/io: pass alignment to bdrv_init_padding()
> block/io: add 'bytes' parameter to bdrv_padding_rmw_read()
> block/io: honor pwrite_zeroes_alignment in bdrv_co_do_zero_pwritev()
> block/io: safeguard max transfer calculation in bdrv_aligned_pwritev()
> block/io: handle image length not aligned to write zeroes alignment in
> bdrv_co_do_zero_pwritev()
> block/io: keep zero flag for head/tail parts of misaligned zero write
> when possible
>
> block/io.c | 78 ++++++++++++++++++++++++++++++++++++++----------------
> 1 file changed, 55 insertions(+), 23 deletions(-)
>
> --
> 2.47.3
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 6/6] block/io: keep zero flag for head/tail parts of misaligned zero write when possible
2026-01-09 12:08 ` [PATCH 6/6] block/io: keep zero flag for head/tail parts of misaligned zero write when possible Fiona Ebner
@ 2026-02-02 22:10 ` Stefan Hajnoczi
0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2026-02-02 22:10 UTC (permalink / raw)
To: Fiona Ebner; +Cc: qemu-devel, qemu-block, hreitz, kwolf, fam
[-- Attachment #1: Type: text/plain, Size: 934 bytes --]
On Fri, Jan 09, 2026 at 01:08:33PM +0100, Fiona Ebner wrote:
> This fixes io-test 179 with qcow2. Otherwise, there would be more 'data'
> sectors.
>
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
>
> Not sure if we should even attempt this without considering
> detect-zeroes. While it does keep 179 working, it is a change in
> behavior, since it could turn previously data sectors (with zeroes)
> into zero sectors when they are part of the head or tail of a zero
> write.
>
> Would it even be tolerable from a performance perspective?
179 compares qemu-img map output and this makes changes to allocation status
fragile. In a case like this I would either adjust the test case and
regenerate the output, or if the output variance is now unavoidable then
a more sophisticated filter function is needed.
I'd avoid buffer_is_zero() in bdrv_co_do_zero_pwritev() since it eats
CPU cycles.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl
2026-01-09 12:08 [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl Fiona Ebner
` (6 preceding siblings ...)
2026-01-19 19:38 ` [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl Stefan Hajnoczi
@ 2026-02-02 22:16 ` Stefan Hajnoczi
2026-02-05 12:13 ` Fiona Ebner
7 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2026-02-02 22:16 UTC (permalink / raw)
To: Fiona Ebner; +Cc: qemu-devel, qemu-block, hreitz, kwolf, fam
[-- Attachment #1: Type: text/plain, Size: 1337 bytes --]
On Fri, Jan 09, 2026 at 01:08:27PM +0100, Fiona Ebner wrote:
> Previous discussion here:
> https://lore.kernel.org/qemu-devel/20260105143416.737482-1-f.ebner@proxmox.com/
>
> Commit 5634622bcb ("file-posix: allow BLKZEROOUT with -t writeback")
> enables the BLKZEROOUT ioctl when using 'writeback' cache, regressing
> certain 'qemu-img convert' invocations, because of a pre-existing
> issue. Namely, the BLKZEROOUT ioctl might fail with errno EINVAL when
> the request is shorter than the block size of the block device.
>
> Stefan suggested prioritizing bl.pwrite_zeroes_alignment in
> bdrv_co_do_zero_pwritev(). This RFC explores that approach and the
> issues with qcow2 I encountered, where
> bl.pwrite_zeroes_alignment = s->subcluster_size;
> I would be happy to discuss potential solutions and whether we should
> use this approach after all.
Hi Fiona,
I wanted to continue this discussion. My thoughts are that making
bdrv_co_do_zero_pwritev() use bl.pwrite_zeroes_alignment is the right
long-term solution to keep all the padding logic in one place.
On the other hand, your series shows it involves fixing a bunch of test
failures and that's not fun. The original bug that is being solved here
is my doing, so feel free to hand this over to me if you decide you
don't want to work on it.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl
2026-02-02 22:16 ` Stefan Hajnoczi
@ 2026-02-05 12:13 ` Fiona Ebner
2026-02-05 15:26 ` Stefan Hajnoczi
2026-02-05 16:02 ` Kevin Wolf
0 siblings, 2 replies; 18+ messages in thread
From: Fiona Ebner @ 2026-02-05 12:13 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: qemu-devel, qemu-block, hreitz, kwolf, fam
Hi Stefan,
Am 02.02.26 um 11:15 PM schrieb Stefan Hajnoczi:
> On Fri, Jan 09, 2026 at 01:08:27PM +0100, Fiona Ebner wrote:
>> Previous discussion here:
>> https://lore.kernel.org/qemu-devel/20260105143416.737482-1-f.ebner@proxmox.com/
>>
>> Commit 5634622bcb ("file-posix: allow BLKZEROOUT with -t writeback")
>> enables the BLKZEROOUT ioctl when using 'writeback' cache, regressing
>> certain 'qemu-img convert' invocations, because of a pre-existing
>> issue. Namely, the BLKZEROOUT ioctl might fail with errno EINVAL when
>> the request is shorter than the block size of the block device.
>>
>> Stefan suggested prioritizing bl.pwrite_zeroes_alignment in
>> bdrv_co_do_zero_pwritev(). This RFC explores that approach and the
>> issues with qcow2 I encountered, where
>> bl.pwrite_zeroes_alignment = s->subcluster_size;
>> I would be happy to discuss potential solutions and whether we should
>> use this approach after all.
>
> Hi Fiona,
> I wanted to continue this discussion. My thoughts are that making
> bdrv_co_do_zero_pwritev() use bl.pwrite_zeroes_alignment is the right
> long-term solution to keep all the padding logic in one place.
>
> On the other hand, your series shows it involves fixing a bunch of test
> failures and that's not fun. The original bug that is being solved here
> is my doing, so feel free to hand this over to me if you decide you
> don't want to work on it.
in your other mail, you mentioned you'll ask Kevin for his opinion. So
in part, I was waiting for that. But I also was side-tracked by other
things, and it will be 1-2 more weeks until I can really focus on this
again. If that is too long, please go ahead and pick it up.
Best Regards,
Fiona
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl
2026-02-05 12:13 ` Fiona Ebner
@ 2026-02-05 15:26 ` Stefan Hajnoczi
2026-02-05 16:02 ` Kevin Wolf
1 sibling, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2026-02-05 15:26 UTC (permalink / raw)
To: Fiona Ebner; +Cc: qemu-devel, qemu-block, hreitz, kwolf, fam
[-- Attachment #1: Type: text/plain, Size: 1988 bytes --]
On Thu, Feb 05, 2026 at 01:13:57PM +0100, Fiona Ebner wrote:
> Hi Stefan,
>
> Am 02.02.26 um 11:15 PM schrieb Stefan Hajnoczi:
> > On Fri, Jan 09, 2026 at 01:08:27PM +0100, Fiona Ebner wrote:
> >> Previous discussion here:
> >> https://lore.kernel.org/qemu-devel/20260105143416.737482-1-f.ebner@proxmox.com/
> >>
> >> Commit 5634622bcb ("file-posix: allow BLKZEROOUT with -t writeback")
> >> enables the BLKZEROOUT ioctl when using 'writeback' cache, regressing
> >> certain 'qemu-img convert' invocations, because of a pre-existing
> >> issue. Namely, the BLKZEROOUT ioctl might fail with errno EINVAL when
> >> the request is shorter than the block size of the block device.
> >>
> >> Stefan suggested prioritizing bl.pwrite_zeroes_alignment in
> >> bdrv_co_do_zero_pwritev(). This RFC explores that approach and the
> >> issues with qcow2 I encountered, where
> >> bl.pwrite_zeroes_alignment = s->subcluster_size;
> >> I would be happy to discuss potential solutions and whether we should
> >> use this approach after all.
> >
> > Hi Fiona,
> > I wanted to continue this discussion. My thoughts are that making
> > bdrv_co_do_zero_pwritev() use bl.pwrite_zeroes_alignment is the right
> > long-term solution to keep all the padding logic in one place.
> >
> > On the other hand, your series shows it involves fixing a bunch of test
> > failures and that's not fun. The original bug that is being solved here
> > is my doing, so feel free to hand this over to me if you decide you
> > don't want to work on it.
>
> in your other mail, you mentioned you'll ask Kevin for his opinion. So
> in part, I was waiting for that. But I also was side-tracked by other
> things, and it will be 1-2 more weeks until I can really focus on this
> again. If that is too long, please go ahead and pick it up.
I have pinged him now.
My timeframe is similar. I can look into this as a background task and
if I make progress I'll share it with you.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 4/6] block/io: safeguard max transfer calculation in bdrv_aligned_pwritev()
2026-01-19 19:34 ` Stefan Hajnoczi
@ 2026-02-05 15:57 ` Kevin Wolf
0 siblings, 0 replies; 18+ messages in thread
From: Kevin Wolf @ 2026-02-05 15:57 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: Fiona Ebner, qemu-devel, qemu-block, hreitz, fam
[-- Attachment #1: Type: text/plain, Size: 2879 bytes --]
Am 19.01.2026 um 20:34 hat Stefan Hajnoczi geschrieben:
> On Fri, Jan 09, 2026 at 01:08:31PM +0100, Fiona Ebner wrote:
> > This partially fixes iotest 177 with qcow2, where max_transfer is
> > 64KiB, but the cluster size and thus pwrite_zeroes_alignment is 1MiB.
> > Previously, max_transfer would be calculated as 0, triggering an
> > assertion later.
> >
> > Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> > ---
> > block/io.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/block/io.c b/block/io.c
> > index 12dc153573..233b2617ea 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -2087,8 +2087,10 @@ bdrv_aligned_pwritev(BdrvChild *child, BdrvTrackedRequest *req,
> > assert(is_power_of_2(align));
> > assert((offset & (align - 1)) == 0);
> > assert((bytes & (align - 1)) == 0);
> > - max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX),
> > - align);
> > + max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX);
> > + if (max_transfer > align) {
> > + max_transfer = QEMU_ALIGN_DOWN(max_transfer, align);
> > + }
>
> max_transfer < align seems paradoxical. It's a situation where the
> largest allowed I/O request cannot meet alignment requirements.
>
> Every other place that uses max_transfer in QEMU would also need to cope
> with this.
>
> block/blkdebug.c:blkdebug_open() fails if max_transfer is not aligned,
> indicating that there are assumptions at least in some places that
> max_transfer is aligned.
>
> I hesitate to make this change because I fear it will break more things.
> Why was max_transfer 64KB while pwrite_zeroes_alignment was 1MiB? Either
> max_transfer should fit the alignment or pwrite_zeroes code needs to
> distinguish between actual write zeroes operations and read-write-modify
> I/O (which is plain read/write and not subject to
> pwrite_zeroes_alignment).
I don't think haing a 64k max_transfer for normal I/O where data is
actually transferred and much larger write_zeroes requests should be
inherently incompatible.
In fact, in this specific case, I'm wondering how we even hit a
problematic case where align == pwrite_zeroes_alignment, but
max_transfer is still used. Shouldn't we take the BDRV_REQ_ZERO_WRITE
code path, which leaves max_transfer completely unused?
Do we somehow end up with normal writes using pwrite_zeroes_alignment?
Ah, yes. This seems to be a problem with patch 3. It doesn't consider
that bdrv_co_do_zero_pwritev() deals both with write_zeroes requests
(for the bulk in the middle) and with normal writes for the padding,
which require different alignments. So changing 'align' for all calls
seems wrong, it should probably be only for those requests that keep the
BDRV_REQ_ZERO_WRITE flag set.
Kevin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl
2026-02-05 12:13 ` Fiona Ebner
2026-02-05 15:26 ` Stefan Hajnoczi
@ 2026-02-05 16:02 ` Kevin Wolf
2026-05-27 21:06 ` Stefan Hajnoczi
1 sibling, 1 reply; 18+ messages in thread
From: Kevin Wolf @ 2026-02-05 16:02 UTC (permalink / raw)
To: Fiona Ebner; +Cc: Stefan Hajnoczi, qemu-devel, qemu-block, hreitz, fam
Am 05.02.2026 um 13:13 hat Fiona Ebner geschrieben:
> Hi Stefan,
>
> Am 02.02.26 um 11:15 PM schrieb Stefan Hajnoczi:
> > On Fri, Jan 09, 2026 at 01:08:27PM +0100, Fiona Ebner wrote:
> >> Previous discussion here:
> >> https://lore.kernel.org/qemu-devel/20260105143416.737482-1-f.ebner@proxmox.com/
> >>
> >> Commit 5634622bcb ("file-posix: allow BLKZEROOUT with -t writeback")
> >> enables the BLKZEROOUT ioctl when using 'writeback' cache, regressing
> >> certain 'qemu-img convert' invocations, because of a pre-existing
> >> issue. Namely, the BLKZEROOUT ioctl might fail with errno EINVAL when
> >> the request is shorter than the block size of the block device.
> >>
> >> Stefan suggested prioritizing bl.pwrite_zeroes_alignment in
> >> bdrv_co_do_zero_pwritev(). This RFC explores that approach and the
> >> issues with qcow2 I encountered, where
> >> bl.pwrite_zeroes_alignment = s->subcluster_size;
> >> I would be happy to discuss potential solutions and whether we should
> >> use this approach after all.
> >
> > Hi Fiona,
> > I wanted to continue this discussion. My thoughts are that making
> > bdrv_co_do_zero_pwritev() use bl.pwrite_zeroes_alignment is the right
> > long-term solution to keep all the padding logic in one place.
> >
> > On the other hand, your series shows it involves fixing a bunch of test
> > failures and that's not fun. The original bug that is being solved here
> > is my doing, so feel free to hand this over to me if you decide you
> > don't want to work on it.
>
> in your other mail, you mentioned you'll ask Kevin for his opinion. So
> in part, I was waiting for that. But I also was side-tracked by other
> things, and it will be 1-2 more weeks until I can really focus on this
> again. If that is too long, please go ahead and pick it up.
I didn't review this thoroughly yet, but I agree that considering the
alignment from the start is the better solution and also more consistent
with what we're already doing for normal reads and writes.
We just need to make sure that we use the right alignments in the right
places, which can be a bit confusing with the fallbacks to buffered zero
writes here and there.
I assume that there is enough time left to do this before the 11.0
release and there is no need to take something like v1 as an
intermediate solution?
Kevin
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl
2026-02-05 16:02 ` Kevin Wolf
@ 2026-05-27 21:06 ` Stefan Hajnoczi
2026-05-28 8:32 ` Fiona Ebner
0 siblings, 1 reply; 18+ messages in thread
From: Stefan Hajnoczi @ 2026-05-27 21:06 UTC (permalink / raw)
To: Kevin Wolf
Cc: Fiona Ebner, Stefan Hajnoczi, qemu-devel, qemu-block, hreitz, fam
On Thu, Feb 5, 2026 at 11:03 AM Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 05.02.2026 um 13:13 hat Fiona Ebner geschrieben:
> > Hi Stefan,
> >
> > Am 02.02.26 um 11:15 PM schrieb Stefan Hajnoczi:
> > > On Fri, Jan 09, 2026 at 01:08:27PM +0100, Fiona Ebner wrote:
> > >> Previous discussion here:
> > >> https://lore.kernel.org/qemu-devel/20260105143416.737482-1-f.ebner@proxmox.com/
> > >>
> > >> Commit 5634622bcb ("file-posix: allow BLKZEROOUT with -t writeback")
> > >> enables the BLKZEROOUT ioctl when using 'writeback' cache, regressing
> > >> certain 'qemu-img convert' invocations, because of a pre-existing
> > >> issue. Namely, the BLKZEROOUT ioctl might fail with errno EINVAL when
> > >> the request is shorter than the block size of the block device.
> > >>
> > >> Stefan suggested prioritizing bl.pwrite_zeroes_alignment in
> > >> bdrv_co_do_zero_pwritev(). This RFC explores that approach and the
> > >> issues with qcow2 I encountered, where
> > >> bl.pwrite_zeroes_alignment = s->subcluster_size;
> > >> I would be happy to discuss potential solutions and whether we should
> > >> use this approach after all.
> > >
> > > Hi Fiona,
> > > I wanted to continue this discussion. My thoughts are that making
> > > bdrv_co_do_zero_pwritev() use bl.pwrite_zeroes_alignment is the right
> > > long-term solution to keep all the padding logic in one place.
> > >
> > > On the other hand, your series shows it involves fixing a bunch of test
> > > failures and that's not fun. The original bug that is being solved here
> > > is my doing, so feel free to hand this over to me if you decide you
> > > don't want to work on it.
> >
> > in your other mail, you mentioned you'll ask Kevin for his opinion. So
> > in part, I was waiting for that. But I also was side-tracked by other
> > things, and it will be 1-2 more weeks until I can really focus on this
> > again. If that is too long, please go ahead and pick it up.
>
> I didn't review this thoroughly yet, but I agree that considering the
> alignment from the start is the better solution and also more consistent
> with what we're already doing for normal reads and writes.
>
> We just need to make sure that we use the right alignments in the right
> places, which can be a bit confusing with the fallbacks to buffered zero
> writes here and there.
>
> I assume that there is enough time left to do this before the 11.0
> release and there is no need to take something like v1 as an
> intermediate solution?
No progress has been made and I'm unable to commit time at the moment.
Is everyone okay with merging Fiona's smaller original patch? While it
would be nice to implement a deeper fix, it's better to have a fix and
I think there is no great risk merging the original patch.
https://lore.kernel.org/qemu-devel/20260105143416.737482-1-f.ebner@proxmox.com/
Stefan
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl
2026-05-27 21:06 ` Stefan Hajnoczi
@ 2026-05-28 8:32 ` Fiona Ebner
2026-05-28 13:26 ` Stefan Hajnoczi
0 siblings, 1 reply; 18+ messages in thread
From: Fiona Ebner @ 2026-05-28 8:32 UTC (permalink / raw)
To: Stefan Hajnoczi, Kevin Wolf
Cc: Stefan Hajnoczi, qemu-devel, qemu-block, hreitz, fam
Am 27.05.26 um 11:06 PM schrieb Stefan Hajnoczi:
> On Thu, Feb 5, 2026 at 11:03 AM Kevin Wolf <kwolf@redhat.com> wrote:
>>
>> Am 05.02.2026 um 13:13 hat Fiona Ebner geschrieben:
>>> Hi Stefan,
>>>
>>> Am 02.02.26 um 11:15 PM schrieb Stefan Hajnoczi:
>>>> On Fri, Jan 09, 2026 at 01:08:27PM +0100, Fiona Ebner wrote:
>>>>> Previous discussion here:
>>>>> https://lore.kernel.org/qemu-devel/20260105143416.737482-1-f.ebner@proxmox.com/
>>>>>
>>>>> Commit 5634622bcb ("file-posix: allow BLKZEROOUT with -t writeback")
>>>>> enables the BLKZEROOUT ioctl when using 'writeback' cache, regressing
>>>>> certain 'qemu-img convert' invocations, because of a pre-existing
>>>>> issue. Namely, the BLKZEROOUT ioctl might fail with errno EINVAL when
>>>>> the request is shorter than the block size of the block device.
>>>>>
>>>>> Stefan suggested prioritizing bl.pwrite_zeroes_alignment in
>>>>> bdrv_co_do_zero_pwritev(). This RFC explores that approach and the
>>>>> issues with qcow2 I encountered, where
>>>>> bl.pwrite_zeroes_alignment = s->subcluster_size;
>>>>> I would be happy to discuss potential solutions and whether we should
>>>>> use this approach after all.
>>>>
>>>> Hi Fiona,
>>>> I wanted to continue this discussion. My thoughts are that making
>>>> bdrv_co_do_zero_pwritev() use bl.pwrite_zeroes_alignment is the right
>>>> long-term solution to keep all the padding logic in one place.
>>>>
>>>> On the other hand, your series shows it involves fixing a bunch of test
>>>> failures and that's not fun. The original bug that is being solved here
>>>> is my doing, so feel free to hand this over to me if you decide you
>>>> don't want to work on it.
>>>
>>> in your other mail, you mentioned you'll ask Kevin for his opinion. So
>>> in part, I was waiting for that. But I also was side-tracked by other
>>> things, and it will be 1-2 more weeks until I can really focus on this
>>> again. If that is too long, please go ahead and pick it up.
>>
>> I didn't review this thoroughly yet, but I agree that considering the
>> alignment from the start is the better solution and also more consistent
>> with what we're already doing for normal reads and writes.
>>
>> We just need to make sure that we use the right alignments in the right
>> places, which can be a bit confusing with the fallbacks to buffered zero
>> writes here and there.
>>
>> I assume that there is enough time left to do this before the 11.0
>> release and there is no need to take something like v1 as an
>> intermediate solution?
>
> No progress has been made and I'm unable to commit time at the moment.
Me neither unfortunately.
> Is everyone okay with merging Fiona's smaller original patch? While it
> would be nice to implement a deeper fix, it's better to have a fix and
> I think there is no great risk merging the original patch.
>
> https://lore.kernel.org/qemu-devel/20260105143416.737482-1-f.ebner@proxmox.com/
FWIW, we have been using that patch downstream since mid-February [0]
and nobody reported issues with it as far as I'm aware. I suppose it
could use a 'TODO' code comment that the proper fix looks different. If
you choose to go with that patch, should I send a v2 with such a comment
or can that be added when applying?
Best Regards,
Fiona
[0]:
https://git.proxmox.com/?p=pve-qemu.git;a=commitdiff;h=6960b5e033fa911f9882751950df28a193255683
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl
2026-05-28 8:32 ` Fiona Ebner
@ 2026-05-28 13:26 ` Stefan Hajnoczi
0 siblings, 0 replies; 18+ messages in thread
From: Stefan Hajnoczi @ 2026-05-28 13:26 UTC (permalink / raw)
To: Fiona Ebner
Cc: Stefan Hajnoczi, Kevin Wolf, qemu-devel, qemu-block, hreitz, fam
[-- Attachment #1: Type: text/plain, Size: 3788 bytes --]
On Thu, May 28, 2026 at 10:32:32AM +0200, Fiona Ebner wrote:
> Am 27.05.26 um 11:06 PM schrieb Stefan Hajnoczi:
> > On Thu, Feb 5, 2026 at 11:03 AM Kevin Wolf <kwolf@redhat.com> wrote:
> >>
> >> Am 05.02.2026 um 13:13 hat Fiona Ebner geschrieben:
> >>> Hi Stefan,
> >>>
> >>> Am 02.02.26 um 11:15 PM schrieb Stefan Hajnoczi:
> >>>> On Fri, Jan 09, 2026 at 01:08:27PM +0100, Fiona Ebner wrote:
> >>>>> Previous discussion here:
> >>>>> https://lore.kernel.org/qemu-devel/20260105143416.737482-1-f.ebner@proxmox.com/
> >>>>>
> >>>>> Commit 5634622bcb ("file-posix: allow BLKZEROOUT with -t writeback")
> >>>>> enables the BLKZEROOUT ioctl when using 'writeback' cache, regressing
> >>>>> certain 'qemu-img convert' invocations, because of a pre-existing
> >>>>> issue. Namely, the BLKZEROOUT ioctl might fail with errno EINVAL when
> >>>>> the request is shorter than the block size of the block device.
> >>>>>
> >>>>> Stefan suggested prioritizing bl.pwrite_zeroes_alignment in
> >>>>> bdrv_co_do_zero_pwritev(). This RFC explores that approach and the
> >>>>> issues with qcow2 I encountered, where
> >>>>> bl.pwrite_zeroes_alignment = s->subcluster_size;
> >>>>> I would be happy to discuss potential solutions and whether we should
> >>>>> use this approach after all.
> >>>>
> >>>> Hi Fiona,
> >>>> I wanted to continue this discussion. My thoughts are that making
> >>>> bdrv_co_do_zero_pwritev() use bl.pwrite_zeroes_alignment is the right
> >>>> long-term solution to keep all the padding logic in one place.
> >>>>
> >>>> On the other hand, your series shows it involves fixing a bunch of test
> >>>> failures and that's not fun. The original bug that is being solved here
> >>>> is my doing, so feel free to hand this over to me if you decide you
> >>>> don't want to work on it.
> >>>
> >>> in your other mail, you mentioned you'll ask Kevin for his opinion. So
> >>> in part, I was waiting for that. But I also was side-tracked by other
> >>> things, and it will be 1-2 more weeks until I can really focus on this
> >>> again. If that is too long, please go ahead and pick it up.
> >>
> >> I didn't review this thoroughly yet, but I agree that considering the
> >> alignment from the start is the better solution and also more consistent
> >> with what we're already doing for normal reads and writes.
> >>
> >> We just need to make sure that we use the right alignments in the right
> >> places, which can be a bit confusing with the fallbacks to buffered zero
> >> writes here and there.
> >>
> >> I assume that there is enough time left to do this before the 11.0
> >> release and there is no need to take something like v1 as an
> >> intermediate solution?
> >
> > No progress has been made and I'm unable to commit time at the moment.
>
> Me neither unfortunately.
>
> > Is everyone okay with merging Fiona's smaller original patch? While it
> > would be nice to implement a deeper fix, it's better to have a fix and
> > I think there is no great risk merging the original patch.
> >
> > https://lore.kernel.org/qemu-devel/20260105143416.737482-1-f.ebner@proxmox.com/
>
> FWIW, we have been using that patch downstream since mid-February [0]
> and nobody reported issues with it as far as I'm aware. I suppose it
> could use a 'TODO' code comment that the proper fix looks different. If
> you choose to go with that patch, should I send a v2 with such a comment
> or can that be added when applying?
I can add the comment when merging the patch. I will merge it on Monday
so Kevin still has some time to reply in case he has an opinion.
Stefan
>
> Best Regards,
> Fiona
>
> [0]:
> https://git.proxmox.com/?p=pve-qemu.git;a=commitdiff;h=6960b5e033fa911f9882751950df28a193255683
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2026-05-28 13:27 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-09 12:08 [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl Fiona Ebner
2026-01-09 12:08 ` [PATCH 1/6] block/io: pass alignment to bdrv_init_padding() Fiona Ebner
2026-01-09 12:08 ` [PATCH 2/6] block/io: add 'bytes' parameter to bdrv_padding_rmw_read() Fiona Ebner
2026-01-09 12:08 ` [PATCH 3/6] block/io: honor pwrite_zeroes_alignment in bdrv_co_do_zero_pwritev() Fiona Ebner
2026-01-09 12:08 ` [PATCH 4/6] block/io: safeguard max transfer calculation in bdrv_aligned_pwritev() Fiona Ebner
2026-01-19 19:34 ` Stefan Hajnoczi
2026-02-05 15:57 ` Kevin Wolf
2026-01-09 12:08 ` [PATCH 5/6] block/io: handle image length not aligned to write zeroes alignment in bdrv_co_do_zero_pwritev() Fiona Ebner
2026-01-09 12:08 ` [PATCH 6/6] block/io: keep zero flag for head/tail parts of misaligned zero write when possible Fiona Ebner
2026-02-02 22:10 ` Stefan Hajnoczi
2026-01-19 19:38 ` [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl Stefan Hajnoczi
2026-02-02 22:16 ` Stefan Hajnoczi
2026-02-05 12:13 ` Fiona Ebner
2026-02-05 15:26 ` Stefan Hajnoczi
2026-02-05 16:02 ` Kevin Wolf
2026-05-27 21:06 ` Stefan Hajnoczi
2026-05-28 8:32 ` Fiona Ebner
2026-05-28 13:26 ` Stefan Hajnoczi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.