All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Fiona Ebner <f.ebner@proxmox.com>
Cc: qemu-devel@nongnu.org, qemu-block@nongnu.org, hreitz@redhat.com,
	kwolf@redhat.com, fam@euphon.net
Subject: Re: [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl
Date: Mon, 19 Jan 2026 14:38:44 -0500	[thread overview]
Message-ID: <20260119193844.GD834718@fedora> (raw)
In-Reply-To: <20260109120837.2772961-1-f.ebner@proxmox.com>

[-- Attachment #1: Type: text/plain, Size: 3844 bytes --]

On Fri, Jan 09, 2026 at 01:08:27PM +0100, Fiona Ebner wrote:
> Previous discussion here:
> https://lore.kernel.org/qemu-devel/20260105143416.737482-1-f.ebner@proxmox.com/
> 
> Commit 5634622bcb ("file-posix: allow BLKZEROOUT with -t writeback")
> enables the BLKZEROOUT ioctl when using 'writeback' cache, regressing
> certain 'qemu-img convert' invocations, because of a pre-existing
> issue. Namely, the BLKZEROOUT ioctl might fail with errno EINVAL when
> the request is shorter than the block size of the block device.
> 
> Stefan suggested prioritizing bl.pwrite_zeroes_alignment in
> bdrv_co_do_zero_pwritev(). This RFC explores that approach and the
> issues with qcow2 I encountered, where
> bl.pwrite_zeroes_alignment = s->subcluster_size;
> I would be happy to discuss potential solutions and whether we should
> use this approach after all.

These issues are a headache, but I think it's important for us to
consider them. They indicate that QEMU does not properly distinguish
between read/write and pwrite_zeroes constraints.

If we can agree on how the block layer should handle pwrite_zeroes
constraints in a consistent way that makes the tests pass, then that
should serve the QEMU block layer well in the future.

I will mention this patch series to Kevin as well so we can get his
opinion.

> 
> For example, in iotest 154 and 271, there are assertion failures,
> because the padded request extends beyond the end of the image:
> Assertion `offset + bytes <= bs->total_sectors * BDRV_SECTOR_SIZE ||
> child->perm & BLK_PERM_RESIZE' failed.
> The total image length is not necessarily aligned to the cluster size.
> This could be solved by shortening the relevant requests in
> bdrv_co_do_zero_pwritev() and submitting them without the
> BDRV_REQ_ZERO_WRITE flag and with bl.request_alignment as the
> alignment see patch 5/6.
> 
> For iotest 179, I would need to avoid clearing BDRV_REQ_ZERO_WRITE for
> the head and tail parts as long as the buffer is fully zero.
> Otherwise, we end up with more 'data' sectors in the target map. See
> patch 6/6. With or without that, iotests 154 and 271 produces
> different output (I think it might be expected, but haven't checked in
> detail yet).
> 
> Another issue is exposed by iotest 177, where the (sub-)cluster size
> is 1MiB, but max-transfer is only 64KiB leading to assertion failures,
> because max_transfer =
> QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX), align);
> evaluates to 0 (because align > bs->bl.max_transfer). This could be
> fixed by safeguarding doing the QEMU_ALIGN_DOWN only if the value is
> bigger than align, see patch 4/6.
> 
> I'm also not sure what to do about iotest 204 and 177 which use
> 'opt-write-zero=15M' for the blkdebug driver (which assigns that value
> to pwrite_zeroes_alignment) making an is_power_of_2(align) assertion
> fail.
> 
> Yet another issue is the 'detect_zeroes' option. If the option is set,
> bdrv_aligned_pwritev() might set the BDRV_REQ_ZERO_WRITE flag even if
> the request is not aligned to pwrite_zeroes_alignment and the original
> bug could resurface.
> 
> Best Regards,
> Fiona
> 
> 
> Fiona Ebner (6):
>   block/io: pass alignment to bdrv_init_padding()
>   block/io: add 'bytes' parameter to bdrv_padding_rmw_read()
>   block/io: honor pwrite_zeroes_alignment in bdrv_co_do_zero_pwritev()
>   block/io: safeguard max transfer calculation in bdrv_aligned_pwritev()
>   block/io: handle image length not aligned to write zeroes alignment in
>     bdrv_co_do_zero_pwritev()
>   block/io: keep zero flag for head/tail parts of misaligned zero write
>     when possible
> 
>  block/io.c | 78 ++++++++++++++++++++++++++++++++++++++----------------
>  1 file changed, 55 insertions(+), 23 deletions(-)
> 
> -- 
> 2.47.3
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

  parent reply	other threads:[~2026-01-19 19:39 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-01-09 12:08 [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl Fiona Ebner
2026-01-09 12:08 ` [PATCH 1/6] block/io: pass alignment to bdrv_init_padding() Fiona Ebner
2026-01-09 12:08 ` [PATCH 2/6] block/io: add 'bytes' parameter to bdrv_padding_rmw_read() Fiona Ebner
2026-01-09 12:08 ` [PATCH 3/6] block/io: honor pwrite_zeroes_alignment in bdrv_co_do_zero_pwritev() Fiona Ebner
2026-01-09 12:08 ` [PATCH 4/6] block/io: safeguard max transfer calculation in bdrv_aligned_pwritev() Fiona Ebner
2026-01-19 19:34   ` Stefan Hajnoczi
2026-02-05 15:57     ` Kevin Wolf
2026-01-09 12:08 ` [PATCH 5/6] block/io: handle image length not aligned to write zeroes alignment in bdrv_co_do_zero_pwritev() Fiona Ebner
2026-01-09 12:08 ` [PATCH 6/6] block/io: keep zero flag for head/tail parts of misaligned zero write when possible Fiona Ebner
2026-02-02 22:10   ` Stefan Hajnoczi
2026-01-19 19:38 ` Stefan Hajnoczi [this message]
2026-02-02 22:16 ` [RFC v2 0/6] block/io: avoid failure caused by misaligned BLKZEROOUT ioctl Stefan Hajnoczi
2026-02-05 12:13   ` Fiona Ebner
2026-02-05 15:26     ` Stefan Hajnoczi
2026-02-05 16:02     ` Kevin Wolf
2026-05-27 21:06       ` Stefan Hajnoczi
2026-05-28  8:32         ` Fiona Ebner
2026-05-28 13:26           ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260119193844.GD834718@fedora \
    --to=stefanha@redhat.com \
    --cc=f.ebner@proxmox.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.