qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] block: use pwrite_zeroes_alignment when writing first sector
@ 2025-10-02 18:39 Stefan Hajnoczi
  2025-10-02 18:39 ` [PATCH 1/3] file-posix: populate pwrite_zeroes_alignment Stefan Hajnoczi
                   ` (3 more replies)
  0 siblings, 4 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2025-10-02 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Jean-Louis Dupond, Hanna Reitz,
	Stefan Hajnoczi

This series fixes a bug I introduced in commit 5634622bcb33 ("file-posix: allow
BLKZEROOUT with -t writeback"). The Linux fallocate(2) and ioctl(BLKZEROOUT)
syscalls require logical block size alignment of the offset and length, even
when the file is opened in buffered I/O mode where read/write operations do not
require alignment.

The fix is to populate the pwrite_zeroes_alignment block limits field and to
use that limit in create_file_fallback_zero_first_sector().

One issue I want to raise is that pwrite_zeroes_alignment is an "optimal
alignment" hint. Hence create_file_fallback_zero_first_sector() had to be
modified to honor the limit explicitly. The block layer doesn't automatically
apply padding in order to align requests. This is different from how QEMU's
block layer pwrite/pread works, where it does automatically apply padding and
read/modify/write as necessary. If you want consistency, please let me know.

Stefan Hajnoczi (3):
  file-posix: populate pwrite_zeroes_alignment
  block: use pwrite_zeroes_alignment when writing first sector
  iotests: add Linux loop device image creation test

 include/system/block-backend-io.h             |  1 +
 block.c                                       |  3 +-
 block/block-backend.c                         | 11 ++++
 block/file-posix.c                            | 17 ++++++
 tests/qemu-iotests/tests/loop-create-file     | 59 +++++++++++++++++++
 tests/qemu-iotests/tests/loop-create-file.out |  8 +++
 6 files changed, 98 insertions(+), 1 deletion(-)
 create mode 100755 tests/qemu-iotests/tests/loop-create-file
 create mode 100644 tests/qemu-iotests/tests/loop-create-file.out

-- 
2.51.0



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] file-posix: populate pwrite_zeroes_alignment
  2025-10-02 18:39 [PATCH 0/3] block: use pwrite_zeroes_alignment when writing first sector Stefan Hajnoczi
@ 2025-10-02 18:39 ` Stefan Hajnoczi
  2025-10-03  7:55   ` Vladimir Sementsov-Ogievskiy
  2025-10-03  8:04   ` Vladimir Sementsov-Ogievskiy
  2025-10-02 18:39 ` [PATCH 2/3] block: use pwrite_zeroes_alignment when writing first sector Stefan Hajnoczi
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2025-10-02 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Jean-Louis Dupond, Hanna Reitz,
	Stefan Hajnoczi

Linux block devices require write zeroes alignment whereas files do not.

It may come as a surprise that block devices opened in buffered I/O mode
require the alignment although regular read/write requests do not.

Therefore it is necessary to populate the pwrite_zeroes_alignment field.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/file-posix.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/block/file-posix.c b/block/file-posix.c
index 8c738674ce..05c92c824d 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1602,6 +1602,23 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
 
             bs->bl.pdiscard_alignment = dalign;
         }
+
+#ifdef __linux__
+        /*
+         * When request_alignment > 1, pwrite_zeroes_alignment does not need to
+         * be set explicitly. When request_alignment == 1, it must be set
+         * explicitly because Linux requires logical block size alignment.
+         */
+        if (bs->bl.request_alignment == 1) {
+            ret = probe_logical_blocksize(s->fd,
+                                          &bs->bl.pwrite_zeroes_alignment);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret,
+                                 "Failed to probe logical block size");
+                return;
+            }
+        }
+#endif /* __linux__ */
     }
 
     raw_refresh_zoned_limits(bs, &st, errp);
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] block: use pwrite_zeroes_alignment when writing first sector
  2025-10-02 18:39 [PATCH 0/3] block: use pwrite_zeroes_alignment when writing first sector Stefan Hajnoczi
  2025-10-02 18:39 ` [PATCH 1/3] file-posix: populate pwrite_zeroes_alignment Stefan Hajnoczi
@ 2025-10-02 18:39 ` Stefan Hajnoczi
  2025-10-03  7:56   ` Vladimir Sementsov-Ogievskiy
  2025-10-02 18:40 ` [PATCH 3/3] iotests: add Linux loop device image creation test Stefan Hajnoczi
  2025-10-03  7:52 ` [PATCH 0/3] block: use pwrite_zeroes_alignment when writing first sector Vladimir Sementsov-Ogievskiy
  3 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2025-10-02 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Jean-Louis Dupond, Hanna Reitz,
	Stefan Hajnoczi

Since commit 5634622bcb33 ("file-posix: allow BLKZEROOUT with -t
writeback"), qemu-img create errors out on a Linux loop block device
with a 4 KB sector size:

  # dd if=/dev/zero of=blockfile bs=1M count=1024
  # losetup --sector-size 4096 /dev/loop0 blockfile
  # qemu-img create -f raw /dev/loop0 1G
  Formatting '/dev/loop0', fmt=raw size=1073741824
  qemu-img: /dev/loop0: Failed to clear the new image's first sector: Invalid argument

Use the pwrite_zeroes_alignment block limit to avoid misaligned
fallocate(2) or ioctl(BLKZEROOUT) in the block/file-posix.c block
driver.

Fixes: 5634622bcb33 ("file-posix: allow BLKZEROOUT with -t writeback")
Reported-by: Jean-Louis Dupond <jean-louis@dupond.be>
Buglink: https://gitlab.com/qemu-project/qemu/-/issues/3127
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/system/block-backend-io.h |  1 +
 block.c                           |  3 ++-
 block/block-backend.c             | 11 +++++++++++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/include/system/block-backend-io.h b/include/system/block-backend-io.h
index ba8dfcc7d0..6d5ac476fc 100644
--- a/include/system/block-backend-io.h
+++ b/include/system/block-backend-io.h
@@ -116,6 +116,7 @@ BlockAIOCB *blk_abort_aio_request(BlockBackend *blk,
                                   void *opaque, int ret);
 
 uint32_t blk_get_request_alignment(BlockBackend *blk);
+uint32_t blk_get_pwrite_zeroes_alignment(BlockBackend *blk);
 uint32_t blk_get_max_transfer(BlockBackend *blk);
 uint64_t blk_get_max_hw_transfer(BlockBackend *blk);
 
diff --git a/block.c b/block.c
index 8848e9a7ed..be77e03904 100644
--- a/block.c
+++ b/block.c
@@ -606,12 +606,13 @@ create_file_fallback_zero_first_sector(BlockBackend *blk,
                                        int64_t current_size,
                                        Error **errp)
 {
+    uint32_t alignment = blk_get_pwrite_zeroes_alignment(blk);
     int64_t bytes_to_clear;
     int ret;
 
     GLOBAL_STATE_CODE();
 
-    bytes_to_clear = MIN(current_size, BDRV_SECTOR_SIZE);
+    bytes_to_clear = MIN(current_size, MAX(BDRV_SECTOR_SIZE, alignment));
     if (bytes_to_clear) {
         ret = blk_co_pwrite_zeroes(blk, 0, bytes_to_clear, BDRV_REQ_MAY_UNMAP);
         if (ret < 0) {
diff --git a/block/block-backend.c b/block/block-backend.c
index f8d6ba65c1..239d6eca37 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2305,6 +2305,17 @@ uint32_t blk_get_request_alignment(BlockBackend *blk)
     return bs ? bs->bl.request_alignment : BDRV_SECTOR_SIZE;
 }
 
+/* Returns the optimal write zeroes alignment, in bytes; guaranteed nonzero */
+uint32_t blk_get_pwrite_zeroes_alignment(BlockBackend *blk)
+{
+    BlockDriverState *bs = blk_bs(blk);
+    IO_CODE();
+    if (!bs) {
+        return BDRV_SECTOR_SIZE;
+    }
+    return bs->bl.pwrite_zeroes_alignment ?: bs->bl.request_alignment;
+}
+
 /* Returns the maximum hardware transfer length, in bytes; guaranteed nonzero */
 uint64_t blk_get_max_hw_transfer(BlockBackend *blk)
 {
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] iotests: add Linux loop device image creation test
  2025-10-02 18:39 [PATCH 0/3] block: use pwrite_zeroes_alignment when writing first sector Stefan Hajnoczi
  2025-10-02 18:39 ` [PATCH 1/3] file-posix: populate pwrite_zeroes_alignment Stefan Hajnoczi
  2025-10-02 18:39 ` [PATCH 2/3] block: use pwrite_zeroes_alignment when writing first sector Stefan Hajnoczi
@ 2025-10-02 18:40 ` Stefan Hajnoczi
  2025-10-03  7:52 ` [PATCH 0/3] block: use pwrite_zeroes_alignment when writing first sector Vladimir Sementsov-Ogievskiy
  3 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2025-10-02 18:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, qemu-block, Jean-Louis Dupond, Hanna Reitz,
	Stefan Hajnoczi

This qemu-iotests test case is based on the reproducer that Jean-Louis
Dupond <jean-louis@dupond.be> shared in
https://gitlab.com/qemu-project/qemu/-/issues/3127.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/tests/loop-create-file     | 59 +++++++++++++++++++
 tests/qemu-iotests/tests/loop-create-file.out |  8 +++
 2 files changed, 67 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/loop-create-file
 create mode 100644 tests/qemu-iotests/tests/loop-create-file.out

diff --git a/tests/qemu-iotests/tests/loop-create-file b/tests/qemu-iotests/tests/loop-create-file
new file mode 100755
index 0000000000..5ec75b046b
--- /dev/null
+++ b/tests/qemu-iotests/tests/loop-create-file
@@ -0,0 +1,59 @@
+#!/usr/bin/env bash
+# group: quick
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+#
+# Copyright Red Hat, Inc.
+#
+# Test Linux loop device image creation
+#
+# This test verifies #3127 "qemu-img create fails on loop device with sector size 4096"
+# https://gitlab.com/qemu-project/qemu/-/issues/3127
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup() {
+    if [ -n "$loopdev" ]; then
+        sudo losetup --detach "$loopdev"
+    fi
+
+    _cleanup_test_img
+}
+
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+cd ..
+. ./common.rc
+. ./common.filter
+
+_supported_fmt raw
+_supported_proto file
+_supported_os Linux
+
+if ! sudo -n losetup &>/dev/null; then
+    _notrun "sudo losetup not available"
+fi
+
+echo
+echo "=== Create image on a 4 KB sector size loop device ==="
+echo
+
+_make_test_img -f $IMGFMT 1M
+
+loopdev=$(sudo losetup --sector-size 4096 --find --show "$TEST_IMG")
+if [ -z "$loopdev" ]; then
+    _fail
+fi
+
+sudo $QEMU_IMG_PROG create -f raw "$loopdev" 1M | \
+    sed -e "s#/dev/loop[0-9]\\+#LOOPDEV#g"
+
+# success, all done
+echo
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/loop-create-file.out b/tests/qemu-iotests/tests/loop-create-file.out
new file mode 100644
index 0000000000..32d4155695
--- /dev/null
+++ b/tests/qemu-iotests/tests/loop-create-file.out
@@ -0,0 +1,8 @@
+QA output created by loop-create-file
+
+=== Create image on a 4 KB sector size loop device ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1048576
+Formatting 'LOOPDEV', fmt=raw size=1048576
+
+*** done
-- 
2.51.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 0/3] block: use pwrite_zeroes_alignment when writing first sector
  2025-10-02 18:39 [PATCH 0/3] block: use pwrite_zeroes_alignment when writing first sector Stefan Hajnoczi
                   ` (2 preceding siblings ...)
  2025-10-02 18:40 ` [PATCH 3/3] iotests: add Linux loop device image creation test Stefan Hajnoczi
@ 2025-10-03  7:52 ` Vladimir Sementsov-Ogievskiy
  3 siblings, 0 replies; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-03  7:52 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Kevin Wolf, qemu-block, Jean-Louis Dupond, Hanna Reitz

On 02.10.25 21:39, Stefan Hajnoczi wrote:
> This series fixes a bug I introduced in commit 5634622bcb33 ("file-posix: allow
> BLKZEROOUT with -t writeback"). The Linux fallocate(2) and ioctl(BLKZEROOUT)
> syscalls require logical block size alignment of the offset and length, even
> when the file is opened in buffered I/O mode where read/write operations do not
> require alignment.
> 
> The fix is to populate the pwrite_zeroes_alignment block limits field and to
> use that limit in create_file_fallback_zero_first_sector().
> 
> One issue I want to raise is that pwrite_zeroes_alignment is an "optimal
> alignment" hint. Hence create_file_fallback_zero_first_sector() had to be
> modified to honor the limit explicitly. 

Probably, this place had to be modified anyway, even if we support "required
write zeroes alignment" generically, it seems better to do write-zeroes
on first smallest "write-zero-able" sector than fallback to normal write of zero
data (or even to read/modify/write).


> The block layer doesn't automatically
> apply padding in order to align requests. This is different from how QEMU's
> block layer pwrite/pread works, where it does automatically apply padding and
> read/modify/write as necessary. If you want consistency, please let me know.
> 
> Stefan Hajnoczi (3):
>    file-posix: populate pwrite_zeroes_alignment
>    block: use pwrite_zeroes_alignment when writing first sector
>    iotests: add Linux loop device image creation test
> 
>   include/system/block-backend-io.h             |  1 +
>   block.c                                       |  3 +-
>   block/block-backend.c                         | 11 ++++
>   block/file-posix.c                            | 17 ++++++
>   tests/qemu-iotests/tests/loop-create-file     | 59 +++++++++++++++++++
>   tests/qemu-iotests/tests/loop-create-file.out |  8 +++
>   6 files changed, 98 insertions(+), 1 deletion(-)
>   create mode 100755 tests/qemu-iotests/tests/loop-create-file
>   create mode 100644 tests/qemu-iotests/tests/loop-create-file.out
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] file-posix: populate pwrite_zeroes_alignment
  2025-10-02 18:39 ` [PATCH 1/3] file-posix: populate pwrite_zeroes_alignment Stefan Hajnoczi
@ 2025-10-03  7:55   ` Vladimir Sementsov-Ogievskiy
  2025-10-06 14:57     ` Stefan Hajnoczi
  2025-10-03  8:04   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-03  7:55 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Kevin Wolf, qemu-block, Jean-Louis Dupond, Hanna Reitz

On 02.10.25 21:39, Stefan Hajnoczi wrote:
> Linux block devices require write zeroes alignment whereas files do not.
> 
> It may come as a surprise that block devices opened in buffered I/O mode
> require the alignment although regular read/write requests do not.
> 
> Therefore it is necessary to populate the pwrite_zeroes_alignment field.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   block/file-posix.c | 17 +++++++++++++++++
>   1 file changed, 17 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 8c738674ce..05c92c824d 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1602,6 +1602,23 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>   
>               bs->bl.pdiscard_alignment = dalign;
>           }
> +
> +#ifdef __linux__
> +        /*
> +         * When request_alignment > 1, pwrite_zeroes_alignment does not need to
> +         * be set explicitly. When request_alignment == 1, it must be set
> +         * explicitly because Linux requires logical block size alignment.
> +         */
> +        if (bs->bl.request_alignment == 1) {
> +            ret = probe_logical_blocksize(s->fd,
> +                                          &bs->bl.pwrite_zeroes_alignment);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret,
> +                                 "Failed to probe logical block size");

Isn't it too restrictive? Could we consider failed attempt to probe as permission
to proceed without write-zeroes alignment? In raw_probe_alignment, we fallback
to guessing request_alignment from memalign.

> +                return;
> +            }
> +        }
> +#endif /* __linux__ */
>       }
>   
>       raw_refresh_zoned_limits(bs, &st, errp);


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] block: use pwrite_zeroes_alignment when writing first sector
  2025-10-02 18:39 ` [PATCH 2/3] block: use pwrite_zeroes_alignment when writing first sector Stefan Hajnoczi
@ 2025-10-03  7:56   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-03  7:56 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Kevin Wolf, qemu-block, Jean-Louis Dupond, Hanna Reitz

On 02.10.25 21:39, Stefan Hajnoczi wrote:
> Since commit 5634622bcb33 ("file-posix: allow BLKZEROOUT with -t
> writeback"), qemu-img create errors out on a Linux loop block device
> with a 4 KB sector size:
> 
>    # dd if=/dev/zero of=blockfile bs=1M count=1024
>    # losetup --sector-size 4096 /dev/loop0 blockfile
>    # qemu-img create -f raw /dev/loop0 1G
>    Formatting '/dev/loop0', fmt=raw size=1073741824
>    qemu-img: /dev/loop0: Failed to clear the new image's first sector: Invalid argument
> 
> Use the pwrite_zeroes_alignment block limit to avoid misaligned
> fallocate(2) or ioctl(BLKZEROOUT) in the block/file-posix.c block
> driver.
> 
> Fixes: 5634622bcb33 ("file-posix: allow BLKZEROOUT with -t writeback")
> Reported-by: Jean-Louis Dupond<jean-louis@dupond.be>
> Buglink:https://gitlab.com/qemu-project/qemu/-/issues/3127
> Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>


Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] file-posix: populate pwrite_zeroes_alignment
  2025-10-02 18:39 ` [PATCH 1/3] file-posix: populate pwrite_zeroes_alignment Stefan Hajnoczi
  2025-10-03  7:55   ` Vladimir Sementsov-Ogievskiy
@ 2025-10-03  8:04   ` Vladimir Sementsov-Ogievskiy
  2025-10-06 14:46     ` Stefan Hajnoczi
  1 sibling, 1 reply; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-03  8:04 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Kevin Wolf, qemu-block, Jean-Louis Dupond, Hanna Reitz

On 02.10.25 21:39, Stefan Hajnoczi wrote:
> Linux block devices require write zeroes alignment whereas files do not.
> 
> It may come as a surprise that block devices opened in buffered I/O mode
> require the alignment although regular read/write requests do not.
> 
> Therefore it is necessary to populate the pwrite_zeroes_alignment field.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>   block/file-posix.c | 17 +++++++++++++++++
>   1 file changed, 17 insertions(+)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 8c738674ce..05c92c824d 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -1602,6 +1602,23 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>   
>               bs->bl.pdiscard_alignment = dalign;
>           }
> +
> +#ifdef __linux__
> +        /*
> +         * When request_alignment > 1, pwrite_zeroes_alignment does not need to
> +         * be set explicitly. When request_alignment == 1, it must be set
> +         * explicitly because Linux requires logical block size alignment.
> +         */
> +        if (bs->bl.request_alignment == 1) {

would "if (!s->needs_alignment) {" be a more visual check? This way reader will not have to analyze
raw_probe_alignment, and understand that needs_alignment=false is the only path for block device
to have request_alignment==1.

> +            ret = probe_logical_blocksize(s->fd,
> +                                          &bs->bl.pwrite_zeroes_alignment);
> +            if (ret < 0) {
> +                error_setg_errno(errp, -ret,
> +                                 "Failed to probe logical block size");
> +                return;
> +            }
> +        }
> +#endif /* __linux__ */
>       }
>   
>       raw_refresh_zoned_limits(bs, &st, errp);


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] file-posix: populate pwrite_zeroes_alignment
  2025-10-03  8:04   ` Vladimir Sementsov-Ogievskiy
@ 2025-10-06 14:46     ` Stefan Hajnoczi
  0 siblings, 0 replies; 11+ messages in thread
From: Stefan Hajnoczi @ 2025-10-06 14:46 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, Kevin Wolf, qemu-block, Jean-Louis Dupond,
	Hanna Reitz

[-- Attachment #1: Type: text/plain, Size: 2109 bytes --]

On Fri, Oct 03, 2025 at 11:04:38AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 02.10.25 21:39, Stefan Hajnoczi wrote:
> > Linux block devices require write zeroes alignment whereas files do not.
> > 
> > It may come as a surprise that block devices opened in buffered I/O mode
> > require the alignment although regular read/write requests do not.
> > 
> > Therefore it is necessary to populate the pwrite_zeroes_alignment field.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >   block/file-posix.c | 17 +++++++++++++++++
> >   1 file changed, 17 insertions(+)
> > 
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 8c738674ce..05c92c824d 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -1602,6 +1602,23 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
> >               bs->bl.pdiscard_alignment = dalign;
> >           }
> > +
> > +#ifdef __linux__
> > +        /*
> > +         * When request_alignment > 1, pwrite_zeroes_alignment does not need to
> > +         * be set explicitly. When request_alignment == 1, it must be set
> > +         * explicitly because Linux requires logical block size alignment.
> > +         */
> > +        if (bs->bl.request_alignment == 1) {
> 
> would "if (!s->needs_alignment) {" be a more visual check? This way reader will not have to analyze
> raw_probe_alignment, and understand that needs_alignment=false is the only path for block device
> to have request_alignment==1.

Yes, that would make it easier to understand. Thanks!

> 
> > +            ret = probe_logical_blocksize(s->fd,
> > +                                          &bs->bl.pwrite_zeroes_alignment);
> > +            if (ret < 0) {
> > +                error_setg_errno(errp, -ret,
> > +                                 "Failed to probe logical block size");
> > +                return;
> > +            }
> > +        }
> > +#endif /* __linux__ */
> >       }
> >       raw_refresh_zoned_limits(bs, &st, errp);
> 
> 
> -- 
> Best regards,
> Vladimir
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] file-posix: populate pwrite_zeroes_alignment
  2025-10-03  7:55   ` Vladimir Sementsov-Ogievskiy
@ 2025-10-06 14:57     ` Stefan Hajnoczi
  2025-10-07  9:08       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 11+ messages in thread
From: Stefan Hajnoczi @ 2025-10-06 14:57 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, Kevin Wolf, qemu-block, Jean-Louis Dupond,
	Hanna Reitz

[-- Attachment #1: Type: text/plain, Size: 2457 bytes --]

On Fri, Oct 03, 2025 at 10:55:09AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 02.10.25 21:39, Stefan Hajnoczi wrote:
> > Linux block devices require write zeroes alignment whereas files do not.
> > 
> > It may come as a surprise that block devices opened in buffered I/O mode
> > require the alignment although regular read/write requests do not.
> > 
> > Therefore it is necessary to populate the pwrite_zeroes_alignment field.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >   block/file-posix.c | 17 +++++++++++++++++
> >   1 file changed, 17 insertions(+)
> > 
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 8c738674ce..05c92c824d 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -1602,6 +1602,23 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
> >               bs->bl.pdiscard_alignment = dalign;
> >           }
> > +
> > +#ifdef __linux__
> > +        /*
> > +         * When request_alignment > 1, pwrite_zeroes_alignment does not need to
> > +         * be set explicitly. When request_alignment == 1, it must be set
> > +         * explicitly because Linux requires logical block size alignment.
> > +         */
> > +        if (bs->bl.request_alignment == 1) {
> > +            ret = probe_logical_blocksize(s->fd,
> > +                                          &bs->bl.pwrite_zeroes_alignment);
> > +            if (ret < 0) {
> > +                error_setg_errno(errp, -ret,
> > +                                 "Failed to probe logical block size");
> 
> Isn't it too restrictive? Could we consider failed attempt to probe as permission
> to proceed without write-zeroes alignment? In raw_probe_alignment, we fallback
> to guessing request_alignment from memalign.

The logical block size alignment is required for write zeroes, otherwise
write zeroes will fail with EINVAL (not ENOTSUP).

There is no way to probe in the !needs_alignment case since read
requests don't require alignment and write zeroes would be destructive.

I think it's preferrable to fail here. This should never happen on a
Linux kernel because BLKSSZGET has been there since the initial git
import in 2005.

> 
> > +                return;
> > +            }
> > +        }
> > +#endif /* __linux__ */
> >       }
> >       raw_refresh_zoned_limits(bs, &st, errp);
> 
> 
> -- 
> Best regards,
> Vladimir
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] file-posix: populate pwrite_zeroes_alignment
  2025-10-06 14:57     ` Stefan Hajnoczi
@ 2025-10-07  9:08       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 11+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-07  9:08 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Kevin Wolf, qemu-block, Jean-Louis Dupond,
	Hanna Reitz

On 06.10.25 17:57, Stefan Hajnoczi wrote:
> On Fri, Oct 03, 2025 at 10:55:09AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 02.10.25 21:39, Stefan Hajnoczi wrote:
>>> Linux block devices require write zeroes alignment whereas files do not.
>>>
>>> It may come as a surprise that block devices opened in buffered I/O mode
>>> require the alignment although regular read/write requests do not.
>>>
>>> Therefore it is necessary to populate the pwrite_zeroes_alignment field.
>>>
>>> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>>> ---
>>>    block/file-posix.c | 17 +++++++++++++++++
>>>    1 file changed, 17 insertions(+)
>>>
>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>> index 8c738674ce..05c92c824d 100644
>>> --- a/block/file-posix.c
>>> +++ b/block/file-posix.c
>>> @@ -1602,6 +1602,23 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
>>>                bs->bl.pdiscard_alignment = dalign;
>>>            }
>>> +
>>> +#ifdef __linux__
>>> +        /*
>>> +         * When request_alignment > 1, pwrite_zeroes_alignment does not need to
>>> +         * be set explicitly. When request_alignment == 1, it must be set
>>> +         * explicitly because Linux requires logical block size alignment.
>>> +         */
>>> +        if (bs->bl.request_alignment == 1) {
>>> +            ret = probe_logical_blocksize(s->fd,
>>> +                                          &bs->bl.pwrite_zeroes_alignment);
>>> +            if (ret < 0) {
>>> +                error_setg_errno(errp, -ret,
>>> +                                 "Failed to probe logical block size");
>>
>> Isn't it too restrictive? Could we consider failed attempt to probe as permission
>> to proceed without write-zeroes alignment? In raw_probe_alignment, we fallback
>> to guessing request_alignment from memalign.
> 
> The logical block size alignment is required for write zeroes, otherwise
> write zeroes will fail with EINVAL (not ENOTSUP).
> 
> There is no way to probe in the !needs_alignment case since read
> requests don't require alignment and write zeroes would be destructive.

Theoretically, if we also implement some kind of automation for unaligned tails
(like for read/write request_alignment), to support "required write-zeroes alignment",
we could postpone probing up to the first write-zeroes operation.. But seems, that
would be too much work (and complex logic to support in future) for nothing.

> 
> I think it's preferrable to fail here. This should never happen on a
> Linux kernel because BLKSSZGET has been there since the initial git
> import in 2005.
> 

Agreed.

>>
>>> +                return;
>>> +            }
>>> +        }
>>> +#endif /* __linux__ */
>>>        }
>>>        raw_refresh_zoned_limits(bs, &st, errp);
>>
>>
>> -- 
>> Best regards,
>> Vladimir
>>


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2025-10-07  9:09 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-02 18:39 [PATCH 0/3] block: use pwrite_zeroes_alignment when writing first sector Stefan Hajnoczi
2025-10-02 18:39 ` [PATCH 1/3] file-posix: populate pwrite_zeroes_alignment Stefan Hajnoczi
2025-10-03  7:55   ` Vladimir Sementsov-Ogievskiy
2025-10-06 14:57     ` Stefan Hajnoczi
2025-10-07  9:08       ` Vladimir Sementsov-Ogievskiy
2025-10-03  8:04   ` Vladimir Sementsov-Ogievskiy
2025-10-06 14:46     ` Stefan Hajnoczi
2025-10-02 18:39 ` [PATCH 2/3] block: use pwrite_zeroes_alignment when writing first sector Stefan Hajnoczi
2025-10-03  7:56   ` Vladimir Sementsov-Ogievskiy
2025-10-02 18:40 ` [PATCH 3/3] iotests: add Linux loop device image creation test Stefan Hajnoczi
2025-10-03  7:52 ` [PATCH 0/3] block: use pwrite_zeroes_alignment when writing first sector Vladimir Sementsov-Ogievskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).