qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 00/11] Make blockdev-mirror dest sparse in more cases
@ 2025-04-25  0:52 Eric Blake
  2025-04-25  0:52 ` [PATCH v3 01/11] block: Expand block status mode from bool to flags Eric Blake
                   ` (10 more replies)
  0 siblings, 11 replies; 24+ messages in thread
From: Eric Blake @ 2025-04-25  0:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, vsementsov, stefanha

v2 was here:
https://lists.gnu.org/archive/html/qemu-devel/2025-04/msg02940.html

In v3:
 - use flags instead of enum at start of series [Stefan]
 - Don't throttle for skipped zeroes [Sunny]
 - Try harder to punch holes for "detect-zeroes":"unmap"
 - More cases in mirror-sparse iotest
 - R-b added on patches that didn't drastically change

Andrey Drobyshev (1):
  iotests/common.rc: add disk_usage function

Eric Blake (10):
  block: Expand block status mode from bool to flags
  file-posix, gluster: Handle zero block status hint better
  block: Let bdrv_co_is_zero_fast consolidate adjacent extents
  block: Add new bdrv_co_is_all_zeroes() function
  iotests: Improve iotest 194 to mirror data
  mirror: Minor refactoring
  mirror: Skip pre-zeroing destination if it is already zero
  mirror: Skip writing zeroes when target is already zero
  tests: Add iotest mirror-sparse for recent patches
  mirror: Allow QMP override to declare target already zero

 qapi/block-core.json                       |   8 +-
 block/coroutines.h                         |   4 +-
 include/block/block-common.h               |  11 +
 include/block/block-io.h                   |   2 +
 include/block/block_int-common.h           |  27 +-
 include/block/block_int-global-state.h     |   3 +-
 include/block/block_int-io.h               |   4 +-
 block/io.c                                 | 126 +++++--
 block/blkdebug.c                           |   6 +-
 block/copy-before-write.c                  |   4 +-
 block/file-posix.c                         |   5 +-
 block/gluster.c                            |   4 +-
 block/iscsi.c                              |   6 +-
 block/mirror.c                             | 150 +++++++--
 block/nbd.c                                |   4 +-
 block/null.c                               |   6 +-
 block/parallels.c                          |   6 +-
 block/qcow.c                               |   2 +-
 block/qcow2.c                              |   6 +-
 block/qed.c                                |   6 +-
 block/quorum.c                             |   4 +-
 block/raw-format.c                         |   4 +-
 block/rbd.c                                |   6 +-
 block/snapshot-access.c                    |   4 +-
 block/vdi.c                                |   4 +-
 block/vmdk.c                               |   2 +-
 block/vpc.c                                |   2 +-
 block/vvfat.c                              |   6 +-
 blockdev.c                                 |  18 +-
 tests/unit/test-block-iothread.c           |   4 +-
 tests/qemu-iotests/common.rc               |   6 +
 tests/qemu-iotests/194                     |   1 +
 tests/qemu-iotests/250                     |   5 -
 tests/qemu-iotests/tests/mirror-sparse     | 128 ++++++++
 tests/qemu-iotests/tests/mirror-sparse.out | 365 +++++++++++++++++++++
 35 files changed, 815 insertions(+), 134 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/mirror-sparse
 create mode 100644 tests/qemu-iotests/tests/mirror-sparse.out

-- 
2.49.0



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v3 01/11] block: Expand block status mode from bool to flags
  2025-04-25  0:52 [PATCH v3 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
@ 2025-04-25  0:52 ` Eric Blake
  2025-04-25  0:52 ` [PATCH v3 02/11] file-posix, gluster: Handle zero block status hint better Eric Blake
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Eric Blake @ 2025-04-25  0:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, vsementsov, stefanha, Kevin Wolf, Hanna Reitz,
	John Snow, Fam Zheng, Ronnie Sahlberg, Paolo Bonzini,
	Peter Lieven, Denis V. Lunev, Alberto Garcia, Ilya Dryomov,
	Stefan Weil, open list:GLUSTER

This patch is purely mechanical, changing bool want_zero into an
unsigned int for bitwise-or of flags.  As of this patch, all
implementations are unchanged (the old want_zero==true is now
mode==BDRV_WANT_PRECISE which is a superset of BDRV_WANT_ZERO); but
the callers in io.c that used to pass want_zero==false are now
prepared for future driver changes that can now distinguish bewteen
BDRV_WANT_ZERO vs. BDRV_WANT_ALLOCATED.  The next patch will actually
change the file-posix driver along those lines, now that we have
more-specific hints.

As for the background why this patch is useful: right now, the
file-posix driver recognizes that if allocation is being queried, the
entire image can be reported as allocated (there is no backing file to
refer to) - but this throws away information on whether the entire
image reads as zero (trivially true if lseek(SEEK_HOLE) at offset 0
returns -ENXIO, a bit more complicated to prove if the raw file was
created with 'qemu-img create' since we intentionally allocate a small
chunk of all-zero data to help with alignment probing).  Later patches
will add a generic algorithm for seeing if an entire file reads as
zeroes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/coroutines.h               |  4 +--
 include/block/block-common.h     | 11 +++++++
 include/block/block_int-common.h | 27 +++++++++--------
 include/block/block_int-io.h     |  4 +--
 block/io.c                       | 51 ++++++++++++++++----------------
 block/blkdebug.c                 |  6 ++--
 block/copy-before-write.c        |  4 +--
 block/file-posix.c               |  4 +--
 block/gluster.c                  |  4 +--
 block/iscsi.c                    |  6 ++--
 block/nbd.c                      |  4 +--
 block/null.c                     |  6 ++--
 block/parallels.c                |  6 ++--
 block/qcow.c                     |  2 +-
 block/qcow2.c                    |  6 ++--
 block/qed.c                      |  6 ++--
 block/quorum.c                   |  4 +--
 block/raw-format.c               |  4 +--
 block/rbd.c                      |  6 ++--
 block/snapshot-access.c          |  4 +--
 block/vdi.c                      |  4 +--
 block/vmdk.c                     |  2 +-
 block/vpc.c                      |  2 +-
 block/vvfat.c                    |  6 ++--
 tests/unit/test-block-iothread.c |  2 +-
 25 files changed, 99 insertions(+), 86 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index 79e5efbf752..892646bb7aa 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -47,7 +47,7 @@ int coroutine_fn GRAPH_RDLOCK
 bdrv_co_common_block_status_above(BlockDriverState *bs,
                                   BlockDriverState *base,
                                   bool include_base,
-                                  bool want_zero,
+                                  unsigned int mode,
                                   int64_t offset,
                                   int64_t bytes,
                                   int64_t *pnum,
@@ -78,7 +78,7 @@ int co_wrapper_mixed_bdrv_rdlock
 bdrv_common_block_status_above(BlockDriverState *bs,
                                BlockDriverState *base,
                                bool include_base,
-                               bool want_zero,
+                               unsigned int mode,
                                int64_t offset,
                                int64_t bytes,
                                int64_t *pnum,
diff --git a/include/block/block-common.h b/include/block/block-common.h
index 0b831ef87b1..c8c626daeaa 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -333,6 +333,17 @@ typedef enum {
 #define BDRV_BLOCK_RECURSE      0x40
 #define BDRV_BLOCK_COMPRESSED   0x80

+/*
+ * Block status hints: the bitwise-or of these flags emphasize what
+ * the caller hopes to learn, and some drivers may be able to give
+ * faster answers by doing less work when the hint permits.
+ */
+#define BDRV_WANT_ZERO          BDRV_BLOCK_ZERO
+#define BDRV_WANT_OFFSET_VALID  BDRV_BLOCK_OFFSET_VALID
+#define BDRV_WANT_ALLOCATED     BDRV_BLOCK_ALLOCATED
+#define BDRV_WANT_PRECISE       (BDRV_WANT_ZERO | BDRV_WANT_OFFSET_VALID | \
+                                 BDRV_WANT_OFFSET_VALID)
+
 typedef QTAILQ_HEAD(BlockReopenQueue, BlockReopenQueueEntry) BlockReopenQueue;

 typedef struct BDRVReopenState {
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index ebb4e56a503..a9c0daa2a4d 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -608,15 +608,16 @@ struct BlockDriver {
      * according to the current layer, and should only need to set
      * BDRV_BLOCK_DATA, BDRV_BLOCK_ZERO, BDRV_BLOCK_OFFSET_VALID,
      * and/or BDRV_BLOCK_RAW; if the current layer defers to a backing
-     * layer, the result should be 0 (and not BDRV_BLOCK_ZERO).  See
-     * block.h for the overall meaning of the bits.  As a hint, the
-     * flag want_zero is true if the caller cares more about precise
-     * mappings (favor accurate _OFFSET_VALID/_ZERO) or false for
-     * overall allocation (favor larger *pnum, perhaps by reporting
-     * _DATA instead of _ZERO).  The block layer guarantees input
-     * clamped to bdrv_getlength() and aligned to request_alignment,
-     * as well as non-NULL pnum, map, and file; in turn, the driver
-     * must return an error or set pnum to an aligned non-zero value.
+     * layer, the result should be 0 (and not BDRV_BLOCK_ZERO).  The
+     * caller will synthesize BDRV_BLOCK_ALLOCATED based on the
+     * non-zero results.  See block.h for the overall meaning of the
+     * bits.  As a hint, the flags in @mode may include a bitwise-or
+     * of BDRV_WANT_ALLOCATED, BDRV_WANT_OFFSET_VALID, or
+     * BDRV_WANT_ZERO based on what the caller is looking for in the
+     * results.  The block layer guarantees input clamped to
+     * bdrv_getlength() and aligned to request_alignment, as well as
+     * non-NULL pnum, map, and file; in turn, the driver must return
+     * an error or set pnum to an aligned non-zero value.
      *
      * Note that @bytes is just a hint on how big of a region the
      * caller wants to inspect.  It is not a limit on *pnum.
@@ -628,8 +629,8 @@ struct BlockDriver {
      * to clamping *pnum for return to its caller.
      */
     int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_block_status)(
-        BlockDriverState *bs,
-        bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
+        BlockDriverState *bs, unsigned int mode,
+        int64_t offset, int64_t bytes, int64_t *pnum,
         int64_t *map, BlockDriverState **file);

     /*
@@ -653,8 +654,8 @@ struct BlockDriver {
         QEMUIOVector *qiov, size_t qiov_offset);

     int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_snapshot_block_status)(
-        BlockDriverState *bs, bool want_zero, int64_t offset, int64_t bytes,
-        int64_t *pnum, int64_t *map, BlockDriverState **file);
+        BlockDriverState *bs, unsigned int mode, int64_t offset,
+        int64_t bytes, int64_t *pnum, int64_t *map, BlockDriverState **file);

     int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_pdiscard_snapshot)(
         BlockDriverState *bs, int64_t offset, int64_t bytes);
diff --git a/include/block/block_int-io.h b/include/block/block_int-io.h
index 4a7cf2b4fdc..4f94eb3c5a2 100644
--- a/include/block/block_int-io.h
+++ b/include/block/block_int-io.h
@@ -38,8 +38,8 @@
 int coroutine_fn GRAPH_RDLOCK bdrv_co_preadv_snapshot(BdrvChild *child,
     int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t qiov_offset);
 int coroutine_fn GRAPH_RDLOCK bdrv_co_snapshot_block_status(
-    BlockDriverState *bs, bool want_zero, int64_t offset, int64_t bytes,
-    int64_t *pnum, int64_t *map, BlockDriverState **file);
+    BlockDriverState *bs, unsigned int mode, int64_t offset,
+    int64_t bytes, int64_t *pnum, int64_t *map, BlockDriverState **file);
 int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard_snapshot(BlockDriverState *bs,
     int64_t offset, int64_t bytes);

diff --git a/block/io.c b/block/io.c
index 1ba8d1aeea1..056af8198bc 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2364,10 +2364,8 @@ int bdrv_flush_all(void)
  * Drivers not implementing the functionality are assumed to not support
  * backing files, hence all their sectors are reported as allocated.
  *
- * If 'want_zero' is true, the caller is querying for mapping
- * purposes, with a focus on valid BDRV_BLOCK_OFFSET_VALID, _DATA, and
- * _ZERO where possible; otherwise, the result favors larger 'pnum',
- * with a focus on accurate BDRV_BLOCK_ALLOCATED.
+ * 'mode' serves as a hint as to which results are favored; see the
+ * BDRV_WANT_* macros for details.
  *
  * If 'offset' is beyond the end of the disk image the return value is
  * BDRV_BLOCK_EOF and 'pnum' is set to 0.
@@ -2387,7 +2385,7 @@ int bdrv_flush_all(void)
  * set to the host mapping and BDS corresponding to the guest offset.
  */
 static int coroutine_fn GRAPH_RDLOCK
-bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
+bdrv_co_do_block_status(BlockDriverState *bs, unsigned int mode,
                         int64_t offset, int64_t bytes,
                         int64_t *pnum, int64_t *map, BlockDriverState **file)
 {
@@ -2476,7 +2474,7 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
             local_file = bs;
             local_map = aligned_offset;
         } else {
-            ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
+            ret = bs->drv->bdrv_co_block_status(bs, mode, aligned_offset,
                                                 aligned_bytes, pnum, &local_map,
                                                 &local_file);

@@ -2488,10 +2486,10 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
              * the cache requires an RCU update, so double check here to avoid
              * such an update if possible.
              *
-             * Check want_zero, because we only want to update the cache when we
+             * Check mode, because we only want to update the cache when we
              * have accurate information about what is zero and what is data.
              */
-            if (want_zero &&
+            if (mode == BDRV_WANT_PRECISE &&
                 ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
                 QLIST_EMPTY(&bs->children))
             {
@@ -2548,7 +2546,7 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,

     if (ret & BDRV_BLOCK_RAW) {
         assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
-        ret = bdrv_co_do_block_status(local_file, want_zero, local_map,
+        ret = bdrv_co_do_block_status(local_file, mode, local_map,
                                       *pnum, pnum, &local_map, &local_file);
         goto out;
     }
@@ -2560,7 +2558,7 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,

         if (!cow_bs) {
             ret |= BDRV_BLOCK_ZERO;
-        } else if (want_zero) {
+        } else if (mode == BDRV_WANT_PRECISE) {
             int64_t size2 = bdrv_co_getlength(cow_bs);

             if (size2 >= 0 && offset >= size2) {
@@ -2569,14 +2567,14 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
         }
     }

-    if (want_zero && ret & BDRV_BLOCK_RECURSE &&
+    if (mode == BDRV_WANT_PRECISE && ret & BDRV_BLOCK_RECURSE &&
         local_file && local_file != bs &&
         (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
         (ret & BDRV_BLOCK_OFFSET_VALID)) {
         int64_t file_pnum;
         int ret2;

-        ret2 = bdrv_co_do_block_status(local_file, want_zero, local_map,
+        ret2 = bdrv_co_do_block_status(local_file, mode, local_map,
                                        *pnum, &file_pnum, NULL, NULL);
         if (ret2 >= 0) {
             /* Ignore errors.  This is just providing extra information, it
@@ -2627,7 +2625,7 @@ int coroutine_fn
 bdrv_co_common_block_status_above(BlockDriverState *bs,
                                   BlockDriverState *base,
                                   bool include_base,
-                                  bool want_zero,
+                                  unsigned int mode,
                                   int64_t offset,
                                   int64_t bytes,
                                   int64_t *pnum,
@@ -2654,7 +2652,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
         return 0;
     }

-    ret = bdrv_co_do_block_status(bs, want_zero, offset, bytes, pnum,
+    ret = bdrv_co_do_block_status(bs, mode, offset, bytes, pnum,
                                   map, file);
     ++*depth;
     if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
@@ -2671,7 +2669,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
     for (p = bdrv_filter_or_cow_bs(bs); include_base || p != base;
          p = bdrv_filter_or_cow_bs(p))
     {
-        ret = bdrv_co_do_block_status(p, want_zero, offset, bytes, pnum,
+        ret = bdrv_co_do_block_status(p, mode, offset, bytes, pnum,
                                       map, file);
         ++*depth;
         if (ret < 0) {
@@ -2734,7 +2732,8 @@ int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
                                             BlockDriverState **file)
 {
     IO_CODE();
-    return bdrv_co_common_block_status_above(bs, base, false, true, offset,
+    return bdrv_co_common_block_status_above(bs, base, false,
+                                             BDRV_WANT_PRECISE, offset,
                                              bytes, pnum, map, file, NULL);
 }

@@ -2765,8 +2764,9 @@ int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
         return 1;
     }

-    ret = bdrv_co_common_block_status_above(bs, NULL, false, false, offset,
-                                            bytes, &pnum, NULL, NULL, NULL);
+    ret = bdrv_co_common_block_status_above(bs, NULL, false, BDRV_WANT_ZERO,
+                                            offset, bytes, &pnum, NULL, NULL,
+                                            NULL);

     if (ret < 0) {
         return ret;
@@ -2782,9 +2782,9 @@ int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t offset,
     int64_t dummy;
     IO_CODE();

-    ret = bdrv_co_common_block_status_above(bs, bs, true, false, offset,
-                                            bytes, pnum ? pnum : &dummy, NULL,
-                                            NULL, NULL);
+    ret = bdrv_co_common_block_status_above(bs, bs, true, BDRV_WANT_ALLOCATED,
+                                            offset, bytes, pnum ? pnum : &dummy,
+                                            NULL, NULL, NULL);
     if (ret < 0) {
         return ret;
     }
@@ -2817,7 +2817,8 @@ int coroutine_fn bdrv_co_is_allocated_above(BlockDriverState *bs,
     int ret;
     IO_CODE();

-    ret = bdrv_co_common_block_status_above(bs, base, include_base, false,
+    ret = bdrv_co_common_block_status_above(bs, base, include_base,
+                                            BDRV_WANT_ALLOCATED,
                                             offset, bytes, pnum, NULL, NULL,
                                             &depth);
     if (ret < 0) {
@@ -3709,8 +3710,8 @@ bdrv_co_preadv_snapshot(BdrvChild *child, int64_t offset, int64_t bytes,
 }

 int coroutine_fn
-bdrv_co_snapshot_block_status(BlockDriverState *bs,
-                              bool want_zero, int64_t offset, int64_t bytes,
+bdrv_co_snapshot_block_status(BlockDriverState *bs, unsigned int mode,
+                              int64_t offset, int64_t bytes,
                               int64_t *pnum, int64_t *map,
                               BlockDriverState **file)
 {
@@ -3728,7 +3729,7 @@ bdrv_co_snapshot_block_status(BlockDriverState *bs,
     }

     bdrv_inc_in_flight(bs);
-    ret = drv->bdrv_co_snapshot_block_status(bs, want_zero, offset, bytes,
+    ret = drv->bdrv_co_snapshot_block_status(bs, mode, offset, bytes,
                                              pnum, map, file);
     bdrv_dec_in_flight(bs);

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 1c1967f8e0a..c54aee0c84b 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -751,9 +751,9 @@ blkdebug_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
 }

 static int coroutine_fn GRAPH_RDLOCK
-blkdebug_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
-                         int64_t bytes, int64_t *pnum, int64_t *map,
-                         BlockDriverState **file)
+blkdebug_co_block_status(BlockDriverState *bs, unsigned int mode,
+                         int64_t offset, int64_t bytes, int64_t *pnum,
+                         int64_t *map, BlockDriverState **file)
 {
     int err;

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index fd470f5f926..2badb3a8856 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -291,8 +291,8 @@ cbw_co_preadv_snapshot(BlockDriverState *bs, int64_t offset, int64_t bytes,
 }

 static int coroutine_fn GRAPH_RDLOCK
-cbw_co_snapshot_block_status(BlockDriverState *bs,
-                             bool want_zero, int64_t offset, int64_t bytes,
+cbw_co_snapshot_block_status(BlockDriverState *bs, unsigned int mode,
+                             int64_t offset, int64_t bytes,
                              int64_t *pnum, int64_t *map,
                              BlockDriverState **file)
 {
diff --git a/block/file-posix.c b/block/file-posix.c
index 56d1972d156..91deb5bf5af 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3201,7 +3201,7 @@ static int find_allocation(BlockDriverState *bs, off_t start,
  * well exceed it.
  */
 static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
-                                            bool want_zero,
+                                            unsigned int mode,
                                             int64_t offset,
                                             int64_t bytes, int64_t *pnum,
                                             int64_t *map,
@@ -3217,7 +3217,7 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
         return ret;
     }

-    if (!want_zero) {
+    if (mode != BDRV_WANT_PRECISE) {
         *pnum = bytes;
         *map = offset;
         *file = bs;
diff --git a/block/gluster.c b/block/gluster.c
index c6d25ae7335..8197b0ecefa 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1465,7 +1465,7 @@ exit:
  * (Based on raw_co_block_status() from file-posix.c.)
  */
 static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
-                                                     bool want_zero,
+                                                     unsigned int mode,
                                                      int64_t offset,
                                                      int64_t bytes,
                                                      int64_t *pnum,
@@ -1482,7 +1482,7 @@ static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
         return ret;
     }

-    if (!want_zero) {
+    if (mode != BDRV_WANT_PRECISE) {
         *pnum = bytes;
         *map = offset;
         *file = bs;
diff --git a/block/iscsi.c b/block/iscsi.c
index 2f0f4dac097..15b96ee8800 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -694,9 +694,9 @@ out_unlock:


 static int coroutine_fn iscsi_co_block_status(BlockDriverState *bs,
-                                              bool want_zero, int64_t offset,
-                                              int64_t bytes, int64_t *pnum,
-                                              int64_t *map,
+                                              unsigned int mode,
+                                              int64_t offset, int64_t bytes,
+                                              int64_t *pnum, int64_t *map,
                                               BlockDriverState **file)
 {
     IscsiLun *iscsilun = bs->opaque;
diff --git a/block/nbd.c b/block/nbd.c
index 887841bc813..d5a2b21c6d1 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1397,8 +1397,8 @@ nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
 }

 static int coroutine_fn GRAPH_RDLOCK nbd_client_co_block_status(
-        BlockDriverState *bs, bool want_zero, int64_t offset, int64_t bytes,
-        int64_t *pnum, int64_t *map, BlockDriverState **file)
+        BlockDriverState *bs, unsigned int mode, int64_t offset,
+        int64_t bytes, int64_t *pnum, int64_t *map, BlockDriverState **file)
 {
     int ret, request_ret;
     NBDExtent64 extent = { 0 };
diff --git a/block/null.c b/block/null.c
index dc0b1fdbd9b..4e448d593d7 100644
--- a/block/null.c
+++ b/block/null.c
@@ -227,9 +227,9 @@ static int null_reopen_prepare(BDRVReopenState *reopen_state,
 }

 static int coroutine_fn null_co_block_status(BlockDriverState *bs,
-                                             bool want_zero, int64_t offset,
-                                             int64_t bytes, int64_t *pnum,
-                                             int64_t *map,
+                                             unsigned int mode,
+                                             int64_t offset, int64_t bytes,
+                                             int64_t *pnum, int64_t *map,
                                              BlockDriverState **file)
 {
     BDRVNullState *s = bs->opaque;
diff --git a/block/parallels.c b/block/parallels.c
index 347ca127f34..3a375e2a8ab 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -416,9 +416,9 @@ parallels_co_flush_to_os(BlockDriverState *bs)
 }

 static int coroutine_fn GRAPH_RDLOCK
-parallels_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
-                          int64_t bytes, int64_t *pnum, int64_t *map,
-                          BlockDriverState **file)
+parallels_co_block_status(BlockDriverState *bs, unsigned int mode,
+                          int64_t offset, int64_t bytes, int64_t *pnum,
+                          int64_t *map, BlockDriverState **file)
 {
     BDRVParallelsState *s = bs->opaque;
     int count;
diff --git a/block/qcow.c b/block/qcow.c
index da8ad4d2430..8a3e7591a92 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -530,7 +530,7 @@ get_cluster_offset(BlockDriverState *bs, uint64_t offset, int allocate,
 }

 static int coroutine_fn GRAPH_RDLOCK
-qcow_co_block_status(BlockDriverState *bs, bool want_zero,
+qcow_co_block_status(BlockDriverState *bs, unsigned int mode,
                      int64_t offset, int64_t bytes, int64_t *pnum,
                      int64_t *map, BlockDriverState **file)
 {
diff --git a/block/qcow2.c b/block/qcow2.c
index 7774e7f0909..66fba89b414 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2141,9 +2141,9 @@ static void qcow2_join_options(QDict *options, QDict *old_options)
 }

 static int coroutine_fn GRAPH_RDLOCK
-qcow2_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
-                      int64_t count, int64_t *pnum, int64_t *map,
-                      BlockDriverState **file)
+qcow2_co_block_status(BlockDriverState *bs, unsigned int mode,
+                      int64_t offset, int64_t count, int64_t *pnum,
+                      int64_t *map, BlockDriverState **file)
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t host_offset;
diff --git a/block/qed.c b/block/qed.c
index ac24449ffb3..4a36fb39294 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -833,9 +833,9 @@ fail:
 }

 static int coroutine_fn GRAPH_RDLOCK
-bdrv_qed_co_block_status(BlockDriverState *bs, bool want_zero, int64_t pos,
-                         int64_t bytes, int64_t *pnum, int64_t *map,
-                         BlockDriverState **file)
+bdrv_qed_co_block_status(BlockDriverState *bs, unsigned int mode,
+                         int64_t pos, int64_t bytes, int64_t *pnum,
+                         int64_t *map, BlockDriverState **file)
 {
     BDRVQEDState *s = bs->opaque;
     size_t len = MIN(bytes, SIZE_MAX);
diff --git a/block/quorum.c b/block/quorum.c
index 30747a6df93..ed8ce801ee3 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1226,7 +1226,7 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
  * region contains zeroes, and BDRV_BLOCK_DATA otherwise.
  */
 static int coroutine_fn GRAPH_RDLOCK
-quorum_co_block_status(BlockDriverState *bs, bool want_zero,
+quorum_co_block_status(BlockDriverState *bs, unsigned int mode,
                        int64_t offset, int64_t count,
                        int64_t *pnum, int64_t *map, BlockDriverState **file)
 {
@@ -1238,7 +1238,7 @@ quorum_co_block_status(BlockDriverState *bs, bool want_zero,
     for (i = 0; i < s->num_children; i++) {
         int64_t bytes;
         ret = bdrv_co_common_block_status_above(s->children[i]->bs, NULL, false,
-                                                want_zero, offset, count,
+                                                mode, offset, count,
                                                 &bytes, NULL, NULL, NULL);
         if (ret < 0) {
             quorum_report_bad(QUORUM_OP_TYPE_READ, offset, count,
diff --git a/block/raw-format.c b/block/raw-format.c
index e08526e2eca..df16ac1ea25 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -283,8 +283,8 @@ fail:
 }

 static int coroutine_fn GRAPH_RDLOCK
-raw_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
-                    int64_t bytes, int64_t *pnum, int64_t *map,
+raw_co_block_status(BlockDriverState *bs, unsigned int mode,
+                    int64_t offset, int64_t bytes, int64_t *pnum, int64_t *map,
                     BlockDriverState **file)
 {
     BDRVRawState *s = bs->opaque;
diff --git a/block/rbd.c b/block/rbd.c
index af984fb7db4..4f3d42a8e7f 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1504,9 +1504,9 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
 }

 static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
-                                                 bool want_zero, int64_t offset,
-                                                 int64_t bytes, int64_t *pnum,
-                                                 int64_t *map,
+                                                 unsigned int mode,
+                                                 int64_t offset, int64_t bytes,
+                                                 int64_t *pnum, int64_t *map,
                                                  BlockDriverState **file)
 {
     BDRVRBDState *s = bs->opaque;
diff --git a/block/snapshot-access.c b/block/snapshot-access.c
index 71ac83c01f0..17ed2402db8 100644
--- a/block/snapshot-access.c
+++ b/block/snapshot-access.c
@@ -41,11 +41,11 @@ snapshot_access_co_preadv_part(BlockDriverState *bs,

 static int coroutine_fn GRAPH_RDLOCK
 snapshot_access_co_block_status(BlockDriverState *bs,
-                                bool want_zero, int64_t offset,
+                                unsigned int mode, int64_t offset,
                                 int64_t bytes, int64_t *pnum,
                                 int64_t *map, BlockDriverState **file)
 {
-    return bdrv_co_snapshot_block_status(bs->file->bs, want_zero, offset,
+    return bdrv_co_snapshot_block_status(bs->file->bs, mode, offset,
                                          bytes, pnum, map, file);
 }

diff --git a/block/vdi.c b/block/vdi.c
index a2da6ecab01..3ddc62a5690 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -523,8 +523,8 @@ static int vdi_reopen_prepare(BDRVReopenState *state,
 }

 static int coroutine_fn GRAPH_RDLOCK
-vdi_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
-                    int64_t bytes, int64_t *pnum, int64_t *map,
+vdi_co_block_status(BlockDriverState *bs, unsigned int mode,
+                    int64_t offset, int64_t bytes, int64_t *pnum, int64_t *map,
                     BlockDriverState **file)
 {
     BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
diff --git a/block/vmdk.c b/block/vmdk.c
index 2adec499122..9c7ab037e14 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1777,7 +1777,7 @@ static inline uint64_t vmdk_find_offset_in_cluster(VmdkExtent *extent,
 }

 static int coroutine_fn GRAPH_RDLOCK
-vmdk_co_block_status(BlockDriverState *bs, bool want_zero,
+vmdk_co_block_status(BlockDriverState *bs, unsigned int mode,
                      int64_t offset, int64_t bytes, int64_t *pnum,
                      int64_t *map, BlockDriverState **file)
 {
diff --git a/block/vpc.c b/block/vpc.c
index 0309e319f60..801ff5793f8 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -726,7 +726,7 @@ fail:
 }

 static int coroutine_fn GRAPH_RDLOCK
-vpc_co_block_status(BlockDriverState *bs, bool want_zero,
+vpc_co_block_status(BlockDriverState *bs, unsigned int mode,
                     int64_t offset, int64_t bytes,
                     int64_t *pnum, int64_t *map,
                     BlockDriverState **file)
diff --git a/block/vvfat.c b/block/vvfat.c
index 91d69b3cc83..814796d9185 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -3134,9 +3134,9 @@ vvfat_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
 }

 static int coroutine_fn vvfat_co_block_status(BlockDriverState *bs,
-                                              bool want_zero, int64_t offset,
-                                              int64_t bytes, int64_t *n,
-                                              int64_t *map,
+                                              unsigned int mode,
+                                              int64_t offset, int64_t bytes,
+                                              int64_t *n, int64_t *map,
                                               BlockDriverState **file)
 {
     *n = bytes;
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index 2b358eaaa82..e26b3be5939 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -63,7 +63,7 @@ bdrv_test_co_truncate(BlockDriverState *bs, int64_t offset, bool exact,
 }

 static int coroutine_fn bdrv_test_co_block_status(BlockDriverState *bs,
-                                                  bool want_zero,
+                                                  unsigned int mode,
                                                   int64_t offset, int64_t count,
                                                   int64_t *pnum, int64_t *map,
                                                   BlockDriverState **file)
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 02/11] file-posix, gluster: Handle zero block status hint better
  2025-04-25  0:52 [PATCH v3 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
  2025-04-25  0:52 ` [PATCH v3 01/11] block: Expand block status mode from bool to flags Eric Blake
@ 2025-04-25  0:52 ` Eric Blake
  2025-04-25  0:52 ` [PATCH v3 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents Eric Blake
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Eric Blake @ 2025-04-25  0:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, vsementsov, stefanha, Kevin Wolf, Hanna Reitz,
	open list:GLUSTER

Although the previous patch to change 'bool want_zero' into a bitmask
made no semantic change, it is now time to differentiate.  When the
caller specifically wants to know what parts of the file read as zero,
we need to use lseek and actually reporting holes, rather than
short-circuiting and advertising full allocation.

This change will be utilized in later patches to let mirroring
optimize for the case when the destination already reads as zeroes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/file-posix.c | 3 ++-
 block/gluster.c    | 2 +-
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 91deb5bf5af..575cbfba07d 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3217,7 +3217,8 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
         return ret;
     }

-    if (mode != BDRV_WANT_PRECISE) {
+    if (!(mode & BDRV_WANT_ZERO)) {
+        /* There is no backing file - all bytes are allocated in this file.  */
         *pnum = bytes;
         *map = offset;
         *file = bs;
diff --git a/block/gluster.c b/block/gluster.c
index 8197b0ecefa..e702666cbce 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1482,7 +1482,7 @@ static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
         return ret;
     }

-    if (mode != BDRV_WANT_PRECISE) {
+    if (!(mode & BDRV_WANT_ZERO)) {
         *pnum = bytes;
         *map = offset;
         *file = bs;
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents
  2025-04-25  0:52 [PATCH v3 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
  2025-04-25  0:52 ` [PATCH v3 01/11] block: Expand block status mode from bool to flags Eric Blake
  2025-04-25  0:52 ` [PATCH v3 02/11] file-posix, gluster: Handle zero block status hint better Eric Blake
@ 2025-04-25  0:52 ` Eric Blake
  2025-05-01 17:20   ` Stefan Hajnoczi
  2025-04-25  0:52 ` [PATCH v3 04/11] block: Add new bdrv_co_is_all_zeroes() function Eric Blake
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Eric Blake @ 2025-04-25  0:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, vsementsov, stefanha, Fam Zheng, Kevin Wolf,
	Hanna Reitz

Some BDS drivers have a cap on how much block status they can supply
in one query (for example, NBD talking to an older server cannot
inspect more than 4G per query; and qcow2 tends to cap its answers
rather than cross a cluster boundary of an L1 table).  Although the
existing callers of bdrv_co_is_zero_fast are not passing in that large
of a 'bytes' parameter, an upcoming caller wants to query the entire
image at once, and will thus benefit from being able to treat adjacent
zero regions in a coalesced manner, rather than claiming the region is
non-zero merely because pnum was truncated and didn't match the
incoming bytes.

While refactoring this into a loop, note that there is no need to
assign pnum prior to calling bdrv_co_common_block_status_above() (it
is guaranteed to be assigned deeper in the callstack).

Signed-off-by: Eric Blake <eblake@redhat.com>

---

v3: also tweak function comment
---
 block/io.c | 29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/block/io.c b/block/io.c
index 056af8198bc..d3bd1211acf 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2751,28 +2751,31 @@ int coroutine_fn bdrv_co_block_status(BlockDriverState *bs, int64_t offset,
  * by @offset and @bytes is known to read as zeroes.
  * Return 1 if that is the case, 0 otherwise and -errno on error.
  * This test is meant to be fast rather than accurate so returning 0
- * does not guarantee non-zero data.
+ * does not guarantee non-zero data; but a return of 1 is reliable.
  */
 int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
                                       int64_t bytes)
 {
     int ret;
-    int64_t pnum = bytes;
+    int64_t pnum;
     IO_CODE();

-    if (!bytes) {
-        return 1;
+    while (bytes) {
+        ret = bdrv_co_common_block_status_above(bs, NULL, false,
+                                                BDRV_WANT_ZERO, offset, bytes,
+                                                &pnum, NULL, NULL, NULL);
+
+        if (ret < 0) {
+            return ret;
+        }
+        if (!(ret & BDRV_BLOCK_ZERO)) {
+            return 0;
+        }
+        offset += pnum;
+        bytes -= pnum;
     }

-    ret = bdrv_co_common_block_status_above(bs, NULL, false, BDRV_WANT_ZERO,
-                                            offset, bytes, &pnum, NULL, NULL,
-                                            NULL);
-
-    if (ret < 0) {
-        return ret;
-    }
-
-    return (pnum == bytes) && (ret & BDRV_BLOCK_ZERO);
+    return 1;
 }

 int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t offset,
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 04/11] block: Add new bdrv_co_is_all_zeroes() function
  2025-04-25  0:52 [PATCH v3 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
                   ` (2 preceding siblings ...)
  2025-04-25  0:52 ` [PATCH v3 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents Eric Blake
@ 2025-04-25  0:52 ` Eric Blake
  2025-05-01 17:24   ` Stefan Hajnoczi
  2025-04-25  0:52 ` [PATCH v3 05/11] iotests: Improve iotest 194 to mirror data Eric Blake
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Eric Blake @ 2025-04-25  0:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, vsementsov, stefanha, Fam Zheng, Kevin Wolf,
	Hanna Reitz

There are some optimizations that require knowing if an image starts
out as reading all zeroes, such as making blockdev-mirror faster by
skipping the copying of source zeroes to the destination.  The
existing bdrv_co_is_zero_fast() is a good building block for answering
this question, but it tends to give an answer of 0 for a file we just
created via QMP 'blockdev-create' or similar (such as 'qemu-img create
-f raw').  Why?  Because file-posix.c insists on allocating a tiny
header to any file rather than leaving it 100% sparse, due to some
filesystems that are unable to answer alignment probes on a hole.  But
teaching file-posix.c to read the tiny header doesn't scale - the
problem of a small header is also visible when libvirt sets up an NBD
client to a just-created file on a migration destination host.

So, we need a wrapper function that handles a bit more complexity in a
common manner for all block devices - when the BDS is mostly a hole,
but has a small non-hole header, it is still worth the time to read
that header and check if it reads as all zeroes before giving up and
returning a pessimistic answer.

Signed-off-by: Eric Blake <eblake@redhat.com>

---

v3: Use constant 128k as maximum data header size to read [Stefan]
---
 include/block/block-io.h |  2 ++
 block/io.c               | 62 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 64 insertions(+)

diff --git a/include/block/block-io.h b/include/block/block-io.h
index b49e0537dd4..b99cc98d265 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -161,6 +161,8 @@ bdrv_is_allocated_above(BlockDriverState *bs, BlockDriverState *base,

 int coroutine_fn GRAPH_RDLOCK
 bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset, int64_t bytes);
+int coroutine_fn GRAPH_RDLOCK
+bdrv_co_is_all_zeroes(BlockDriverState *bs);

 int GRAPH_RDLOCK
 bdrv_apply_auto_read_only(BlockDriverState *bs, const char *errmsg,
diff --git a/block/io.c b/block/io.c
index d3bd1211acf..69da64c5915 100644
--- a/block/io.c
+++ b/block/io.c
@@ -38,10 +38,14 @@
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "system/replay.h"
+#include "qemu/units.h"

 /* Maximum bounce buffer for copy-on-read and write zeroes, in bytes */
 #define MAX_BOUNCE_BUFFER (32768 << BDRV_SECTOR_BITS)

+/* Maximum read size for checking if data reads as zero, in bytes */
+#define MAX_ZERO_CHECK_BUFFER (128 * KiB)
+
 static void coroutine_fn GRAPH_RDLOCK
 bdrv_parent_cb_resize(BlockDriverState *bs);

@@ -2778,6 +2782,64 @@ int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
     return 1;
 }

+/*
+ * Check @bs (and its backing chain) to see if the entire image is known
+ * to read as zeroes.
+ * Return 1 if that is the case, 0 otherwise and -errno on error.
+ * This test is meant to be fast rather than accurate so returning 0
+ * does not guarantee non-zero data; however, a return of 1 is reliable,
+ * and this function can report 1 in more cases than bdrv_co_is_zero_fast.
+ */
+int coroutine_fn bdrv_co_is_all_zeroes(BlockDriverState *bs)
+{
+    int ret;
+    int64_t pnum, bytes;
+    char *buf;
+    QEMUIOVector local_qiov;
+    IO_CODE();
+
+    bytes = bdrv_co_getlength(bs);
+    if (bytes < 0) {
+        return bytes;
+    }
+
+    /* First probe - see if the entire image reads as zero */
+    ret = bdrv_co_common_block_status_above(bs, NULL, false, BDRV_WANT_ZERO,
+                                            0, bytes, &pnum, NULL, NULL,
+                                            NULL);
+    if (ret < 0) {
+        return ret;
+    }
+    if (ret & BDRV_BLOCK_ZERO) {
+        return bdrv_co_is_zero_fast(bs, pnum, bytes - pnum);
+    }
+
+    /*
+     * Because of the way 'blockdev-create' works, raw files tend to
+     * be created with a non-sparse region at the front to make
+     * alignment probing easier.  If the block starts with only a
+     * small allocated region, it is still worth the effort to see if
+     * the rest of the image is still sparse, coupled with manually
+     * reading the first region to see if it reads zero after all.
+     */
+    if (pnum > MAX_ZERO_CHECK_BUFFER) {
+        return 0;
+    }
+    ret = bdrv_co_is_zero_fast(bs, pnum, bytes - pnum);
+    if (ret <= 0) {
+        return ret;
+    }
+    /* Only the head of the image is unknown, and it's small.  Read it.  */
+    buf = qemu_blockalign(bs, pnum);
+    qemu_iovec_init_buf(&local_qiov, buf, pnum);
+    ret = bdrv_driver_preadv(bs, 0, pnum, &local_qiov, 0, 0);
+    if (ret >= 0) {
+        ret = buffer_is_zero(buf, pnum);
+    }
+    qemu_vfree(buf);
+    return ret;
+}
+
 int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t offset,
                                       int64_t bytes, int64_t *pnum)
 {
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 05/11] iotests: Improve iotest 194 to mirror data
  2025-04-25  0:52 [PATCH v3 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
                   ` (3 preceding siblings ...)
  2025-04-25  0:52 ` [PATCH v3 04/11] block: Add new bdrv_co_is_all_zeroes() function Eric Blake
@ 2025-04-25  0:52 ` Eric Blake
  2025-04-25  0:52 ` [PATCH v3 06/11] mirror: Minor refactoring Eric Blake
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Eric Blake @ 2025-04-25  0:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, vsementsov, stefanha, Kevin Wolf, Hanna Reitz

Mirroring a completely sparse image to a sparse destination should be
practically instantaneous.  It isn't yet, but the test will be more
realistic if it has some non-zero to mirror as well as the holes.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/194 | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
index c0ce82dd257..d0b9c084f5f 100755
--- a/tests/qemu-iotests/194
+++ b/tests/qemu-iotests/194
@@ -34,6 +34,7 @@ with iotests.FilePath('source.img') as source_img_path, \

     img_size = '1G'
     iotests.qemu_img_create('-f', iotests.imgfmt, source_img_path, img_size)
+    iotests.qemu_io('-f', iotests.imgfmt, '-c', 'write 512M 1M', source_img_path)
     iotests.qemu_img_create('-f', iotests.imgfmt, dest_img_path, img_size)

     iotests.log('Launching VMs...')
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 06/11] mirror: Minor refactoring
  2025-04-25  0:52 [PATCH v3 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
                   ` (4 preceding siblings ...)
  2025-04-25  0:52 ` [PATCH v3 05/11] iotests: Improve iotest 194 to mirror data Eric Blake
@ 2025-04-25  0:52 ` Eric Blake
  2025-04-25  0:52 ` [PATCH v3 07/11] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Eric Blake @ 2025-04-25  0:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, vsementsov, stefanha, John Snow, Kevin Wolf,
	Hanna Reitz

Commit 5791ba52 (v9.2) pre-initialized ret in mirror_dirty_init to
silence a false positive compiler warning, even though in all code
paths where ret is used, it was guaranteed to be reassigned
beforehand.  But since the function returns -errno, and -1 is not
always the right errno, it's better to initialize to -EIO.

An upcoming patch wants to track two bitmaps in
do_sync_target_write(); this will be easier if the current variables
related to the dirty bitmap are renamed.

Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/mirror.c | 22 +++++++++++-----------
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index a53582f17bb..34c6c5252e1 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -841,7 +841,7 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
     int64_t offset;
     BlockDriverState *bs;
     BlockDriverState *target_bs = blk_bs(s->target);
-    int ret = -1;
+    int ret = -EIO;
     int64_t count;

     bdrv_graph_co_rdlock();
@@ -1341,7 +1341,7 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
 {
     int ret;
     size_t qiov_offset = 0;
-    int64_t bitmap_offset, bitmap_end;
+    int64_t dirty_bitmap_offset, dirty_bitmap_end;

     if (!QEMU_IS_ALIGNED(offset, job->granularity) &&
         bdrv_dirty_bitmap_get(job->dirty_bitmap, offset))
@@ -1388,11 +1388,11 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
      * Tails are either clean or shrunk, so for bitmap resetting
      * we safely align the range down.
      */
-    bitmap_offset = QEMU_ALIGN_UP(offset, job->granularity);
-    bitmap_end = QEMU_ALIGN_DOWN(offset + bytes, job->granularity);
-    if (bitmap_offset < bitmap_end) {
-        bdrv_reset_dirty_bitmap(job->dirty_bitmap, bitmap_offset,
-                                bitmap_end - bitmap_offset);
+    dirty_bitmap_offset = QEMU_ALIGN_UP(offset, job->granularity);
+    dirty_bitmap_end = QEMU_ALIGN_DOWN(offset + bytes, job->granularity);
+    if (dirty_bitmap_offset < dirty_bitmap_end) {
+        bdrv_reset_dirty_bitmap(job->dirty_bitmap, dirty_bitmap_offset,
+                                dirty_bitmap_end - dirty_bitmap_offset);
     }

     job_progress_increase_remaining(&job->common.job, bytes);
@@ -1430,10 +1430,10 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
          * at function start, and they must be still dirty, as we've locked
          * the region for in-flight op.
          */
-        bitmap_offset = QEMU_ALIGN_DOWN(offset, job->granularity);
-        bitmap_end = QEMU_ALIGN_UP(offset + bytes, job->granularity);
-        bdrv_set_dirty_bitmap(job->dirty_bitmap, bitmap_offset,
-                              bitmap_end - bitmap_offset);
+        dirty_bitmap_offset = QEMU_ALIGN_DOWN(offset, job->granularity);
+        dirty_bitmap_end = QEMU_ALIGN_UP(offset + bytes, job->granularity);
+        bdrv_set_dirty_bitmap(job->dirty_bitmap, dirty_bitmap_offset,
+                              dirty_bitmap_end - dirty_bitmap_offset);
         qatomic_set(&job->actively_synced, false);

         action = mirror_error_action(job, false, -ret);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 07/11] mirror: Skip pre-zeroing destination if it is already zero
  2025-04-25  0:52 [PATCH v3 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
                   ` (5 preceding siblings ...)
  2025-04-25  0:52 ` [PATCH v3 06/11] mirror: Minor refactoring Eric Blake
@ 2025-04-25  0:52 ` Eric Blake
  2025-04-30 16:09   ` Sunny Zhu
  2025-04-25  0:52 ` [PATCH v3 08/11] mirror: Skip writing zeroes when target " Eric Blake
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Eric Blake @ 2025-04-25  0:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, vsementsov, stefanha, John Snow, Kevin Wolf,
	Hanna Reitz

When doing a sync=full mirroring, QMP drive-mirror requests full
zeroing if it did not just create the destination, and blockdev-mirror
requests full zeroing unconditionally.  This is because during a full
sync, we must ensure that the portions of the disk that are not
otherwise touched by the source still read as zero upon completion.

However, in mirror_dirty_init(), we were blindly assuming that if the
destination allows punching holes, we should pre-zero the entire
image; and if it does not allow punching holes, then treat the entire
source as dirty rather than mirroring just the allocated portions of
the source.  Without the ability to punch holes, this results in the
destination file being fully allocated; and even when punching holes
is supported, it causes duplicate I/O to the portions of the
destination corresponding to chunks of the source that are allocated
but read as zero.

Smarter is to avoid the pre-zeroing pass over the destination if it
can be proved the destination already reads as zero.  Note that a
later patch will then further improve things to skip writing to the
destination for parts of the image where the source is zero; but even
with just this patch, it is possible to see a difference for any BDS
that can quickly report that it already reads as zero.

Note, however, that if the destination was opened with
"detect-zeroes": "unmap", then the user wants us to punch holes where
possible for any zeroes in the source; in that case, we are better off
doing unmap up front in bulk.

Signed-off-by: Eric Blake <eblake@redhat.com>

---

v3: add exemption for "detect-zeroes":"unmap" on destination
---
 block/mirror.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/block/mirror.c b/block/mirror.c
index 34c6c5252e1..4059bf96854 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -849,13 +849,30 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
     bdrv_graph_co_rdunlock();

     if (s->zero_target) {
+        offset = 0;
+        bdrv_graph_co_rdlock();
+        ret = bdrv_co_is_all_zeroes(target_bs);
+        bdrv_graph_co_rdunlock();
+        if (ret < 0) {
+            return ret;
+        }
+        /*
+         * If the destination already reads as zero, and we are not
+         * requested to punch holes into existing zeroes, then we can
+         * skip pre-zeroing the destination.
+         */
+        if (ret > 0 &&
+            (target_bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP ||
+             !bdrv_can_write_zeroes_with_unmap(target_bs))) {
+            offset = s->bdev_length;
+        }
         if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
             bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
             return 0;
         }

         s->initial_zeroing_ongoing = true;
-        for (offset = 0; offset < s->bdev_length; ) {
+        while (offset < s->bdev_length) {
             int bytes = MIN(s->bdev_length - offset,
                             QEMU_ALIGN_DOWN(INT_MAX, s->granularity));

-- 
2.49.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 08/11] mirror: Skip writing zeroes when target is already zero
  2025-04-25  0:52 [PATCH v3 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
                   ` (6 preceding siblings ...)
  2025-04-25  0:52 ` [PATCH v3 07/11] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
@ 2025-04-25  0:52 ` Eric Blake
  2025-04-30 16:38   ` Re " Sunny Zhu
  2025-04-25  0:52 ` [PATCH v3 09/11] iotests/common.rc: add disk_usage function Eric Blake
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Eric Blake @ 2025-04-25  0:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, vsementsov, stefanha, John Snow, Kevin Wolf,
	Hanna Reitz

When mirroring, the goal is to ensure that the destination reads the
same as the source; this goal is met whether the destination is sparse
or fully-allocated.  However, if the destination cannot efficiently
write zeroes, then any time the mirror operation wants to copy zeroes
from the source to the destination (either during the background over
sparse regions when doing a full mirror, or in the foreground when the
guest actively writes zeroes), we were causing the destination to
fully allocate that portion of the disk, even if it already read as
zeroes.

The effect is especially pronounced when the source is a raw file.
That's because when the source is a qcow2 file, the dirty bitmap only
visits the portions of the source that are allocated, which tend to be
non-zero.  But when the source is a raw file,
bdrv_co_is_allocated_above() reports the entire file as allocated so
mirror_dirty_init sets the entire dirty bitmap, and it is only later
during mirror_iteration that we change to consulting the more precise
bdrv_co_block_status_above() to learn where the source reads as zero.

Remember that since a mirror operation can write a cluster more than
once (every time the guest changes the source, the destination is also
changed to keep up), we can't take the shortcut of relying on
s->zero_target (which is static for the life of the job) in
mirror_co_zero() to see if the destination is already zero, because
that information may be stale.  Any solution we use must be dynamic in
the face of the guest writing or discarding a cluster while the mirror
has been ongoing.

We could just teach mirror_co_zero() to do a block_status() probe of
the destination, and skip the zeroes if the destination already reads
as zero, but we know from past experience that extra block_status()
calls are not always cheap (tmpfs, anyone?), especially when they are
random access rather than linear.  Use of block_status() of the source
by the background task in a linear fashion is not our bottleneck (it's
a background task, after all); but since mirroring can be done while
the source is actively being changed, we don't want a slow
block_status() of the destination to occur on the hot path of the
guest trying to do random-access writes to the source.

So this patch takes a slightly different approach: any time we have to
transfer the full image, we know that mirror_dirty_init() is _already_
doing a pre-zero pass over the entire destination.  Therefore, if we
track which clusters of the destination are zero at any given moment,
we don't have to do a block_status() call on the destination, but can
instead just refer to the zero bitmap associated with the job.

With this patch, if I create a raw sparse destination file, connect it
with QMP 'blockdev-add' while leaving it at the default "discard":
"ignore", then run QMP 'blockdev-mirror' with "sync": "full", the
destination remains sparse rather than fully allocated.  Meanwhile, a
destination image that is already fully allocated remains so unless it
was opened with "detect-zeroes": "unmap".  If writing zeroes is
skipped, the job counters are not incremented.

Signed-off-by: Eric Blake <eblake@redhat.com>

---

v3: also skip counters when skipping I/O [Sunny]
---
 block/mirror.c | 92 ++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 81 insertions(+), 11 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 4059bf96854..69a02dfc2b7 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -73,6 +73,7 @@ typedef struct MirrorBlockJob {
     size_t buf_size;
     int64_t bdev_length;
     unsigned long *cow_bitmap;
+    unsigned long *zero_bitmap;
     BdrvDirtyBitmap *dirty_bitmap;
     BdrvDirtyBitmapIter *dbi;
     uint8_t *buf;
@@ -108,9 +109,12 @@ struct MirrorOp {
     int64_t offset;
     uint64_t bytes;

-    /* The pointee is set by mirror_co_read(), mirror_co_zero(), and
-     * mirror_co_discard() before yielding for the first time */
+    /*
+     * These pointers are set by mirror_co_read(), mirror_co_zero(), and
+     * mirror_co_discard() before yielding for the first time
+     */
     int64_t *bytes_handled;
+    bool *io_skipped;

     bool is_pseudo_op;
     bool is_active_write;
@@ -408,15 +412,34 @@ static void coroutine_fn mirror_co_read(void *opaque)
 static void coroutine_fn mirror_co_zero(void *opaque)
 {
     MirrorOp *op = opaque;
-    int ret;
+    bool write_needed = true;
+    int ret = 0;

     op->s->in_flight++;
     op->s->bytes_in_flight += op->bytes;
     *op->bytes_handled = op->bytes;
     op->is_in_flight = true;

-    ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
-                               op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
+    if (op->s->zero_bitmap) {
+        unsigned long end = DIV_ROUND_UP(op->offset + op->bytes,
+                                         op->s->granularity);
+        assert(QEMU_IS_ALIGNED(op->offset, op->s->granularity));
+        assert(QEMU_IS_ALIGNED(op->bytes, op->s->granularity) ||
+               op->offset + op->bytes == op->s->bdev_length);
+        if (find_next_zero_bit(op->s->zero_bitmap, end,
+                               op->offset / op->s->granularity) == end) {
+            write_needed = false;
+            *op->io_skipped = true;
+        }
+    }
+    if (write_needed) {
+        ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
+                                   op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
+    }
+    if (ret >= 0 && op->s->zero_bitmap) {
+        bitmap_set(op->s->zero_bitmap, op->offset / op->s->granularity,
+                   DIV_ROUND_UP(op->bytes, op->s->granularity));
+    }
     mirror_write_complete(op, ret);
 }

@@ -435,29 +458,43 @@ static void coroutine_fn mirror_co_discard(void *opaque)
 }

 static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
-                               unsigned bytes, MirrorMethod mirror_method)
+                               unsigned bytes, MirrorMethod mirror_method,
+                               bool *io_skipped)
 {
     MirrorOp *op;
     Coroutine *co;
     int64_t bytes_handled = -1;

+    assert(QEMU_IS_ALIGNED(offset, s->granularity));
+    assert(QEMU_IS_ALIGNED(bytes, s->granularity) ||
+           offset + bytes == s->bdev_length);
     op = g_new(MirrorOp, 1);
     *op = (MirrorOp){
         .s              = s,
         .offset         = offset,
         .bytes          = bytes,
         .bytes_handled  = &bytes_handled,
+        .io_skipped     = io_skipped,
     };
     qemu_co_queue_init(&op->waiting_requests);

     switch (mirror_method) {
     case MIRROR_METHOD_COPY:
+        if (s->zero_bitmap) {
+            bitmap_clear(s->zero_bitmap, offset / s->granularity,
+                         DIV_ROUND_UP(bytes, s->granularity));
+        }
         co = qemu_coroutine_create(mirror_co_read, op);
         break;
     case MIRROR_METHOD_ZERO:
+        /* s->zero_bitmap handled in mirror_co_zero */
         co = qemu_coroutine_create(mirror_co_zero, op);
         break;
     case MIRROR_METHOD_DISCARD:
+        if (s->zero_bitmap) {
+            bitmap_clear(s->zero_bitmap, offset / s->granularity,
+                         DIV_ROUND_UP(bytes, s->granularity));
+        }
         co = qemu_coroutine_create(mirror_co_discard, op);
         break;
     default:
@@ -568,6 +605,7 @@ static void coroutine_fn GRAPH_UNLOCKED mirror_iteration(MirrorBlockJob *s)
         int ret = -1;
         int64_t io_bytes;
         int64_t io_bytes_acct;
+        bool io_skipped = false;
         MirrorMethod mirror_method = MIRROR_METHOD_COPY;

         assert(!(offset % s->granularity));
@@ -611,8 +649,10 @@ static void coroutine_fn GRAPH_UNLOCKED mirror_iteration(MirrorBlockJob *s)
         }

         io_bytes = mirror_clip_bytes(s, offset, io_bytes);
-        io_bytes = mirror_perform(s, offset, io_bytes, mirror_method);
-        if (mirror_method != MIRROR_METHOD_COPY && write_zeroes_ok) {
+        io_bytes = mirror_perform(s, offset, io_bytes, mirror_method,
+                                  &io_skipped);
+        if (io_skipped ||
+            (mirror_method != MIRROR_METHOD_COPY && write_zeroes_ok)) {
             io_bytes_acct = 0;
         } else {
             io_bytes_acct = io_bytes;
@@ -849,6 +889,8 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
     bdrv_graph_co_rdunlock();

     if (s->zero_target) {
+        int64_t bitmap_length = DIV_ROUND_UP(s->bdev_length, s->granularity);
+
         offset = 0;
         bdrv_graph_co_rdlock();
         ret = bdrv_co_is_all_zeroes(target_bs);
@@ -856,6 +898,7 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
         if (ret < 0) {
             return ret;
         }
+        s->zero_bitmap = bitmap_new(bitmap_length);
         /*
          * If the destination already reads as zero, and we are not
          * requested to punch holes into existing zeroes, then we can
@@ -864,6 +907,7 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
         if (ret > 0 &&
             (target_bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP ||
              !bdrv_can_write_zeroes_with_unmap(target_bs))) {
+            bitmap_set(s->zero_bitmap, 0, bitmap_length);
             offset = s->bdev_length;
         }
         if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
@@ -875,6 +919,7 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
         while (offset < s->bdev_length) {
             int bytes = MIN(s->bdev_length - offset,
                             QEMU_ALIGN_DOWN(INT_MAX, s->granularity));
+            bool ignored;

             mirror_throttle(s);

@@ -890,7 +935,7 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
                 continue;
             }

-            mirror_perform(s, offset, bytes, MIRROR_METHOD_ZERO);
+            mirror_perform(s, offset, bytes, MIRROR_METHOD_ZERO, &ignored);
             offset += bytes;
         }

@@ -1180,6 +1225,7 @@ immediate_exit:
     assert(s->in_flight == 0);
     qemu_vfree(s->buf);
     g_free(s->cow_bitmap);
+    g_free(s->zero_bitmap);
     g_free(s->in_flight_bitmap);
     bdrv_dirty_iter_free(s->dbi);

@@ -1359,6 +1405,7 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
     int ret;
     size_t qiov_offset = 0;
     int64_t dirty_bitmap_offset, dirty_bitmap_end;
+    int64_t zero_bitmap_offset, zero_bitmap_end;

     if (!QEMU_IS_ALIGNED(offset, job->granularity) &&
         bdrv_dirty_bitmap_get(job->dirty_bitmap, offset))
@@ -1402,8 +1449,9 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
     }

     /*
-     * Tails are either clean or shrunk, so for bitmap resetting
-     * we safely align the range down.
+     * Tails are either clean or shrunk, so for dirty bitmap resetting
+     * we safely align the range narrower.  But for zero bitmap, round
+     * range wider for checking or clearing, and narrower for setting.
      */
     dirty_bitmap_offset = QEMU_ALIGN_UP(offset, job->granularity);
     dirty_bitmap_end = QEMU_ALIGN_DOWN(offset + bytes, job->granularity);
@@ -1411,22 +1459,44 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
         bdrv_reset_dirty_bitmap(job->dirty_bitmap, dirty_bitmap_offset,
                                 dirty_bitmap_end - dirty_bitmap_offset);
     }
+    zero_bitmap_offset = offset / job->granularity;
+    zero_bitmap_end = DIV_ROUND_UP(offset + bytes, job->granularity);

     job_progress_increase_remaining(&job->common.job, bytes);
     job->active_write_bytes_in_flight += bytes;

     switch (method) {
     case MIRROR_METHOD_COPY:
+        if (job->zero_bitmap) {
+            bitmap_clear(job->zero_bitmap, zero_bitmap_offset,
+                         zero_bitmap_end - zero_bitmap_offset);
+        }
         ret = blk_co_pwritev_part(job->target, offset, bytes,
                                   qiov, qiov_offset, flags);
         break;

     case MIRROR_METHOD_ZERO:
+        if (job->zero_bitmap) {
+            if (find_next_zero_bit(job->zero_bitmap, zero_bitmap_end,
+                                   zero_bitmap_offset) == zero_bitmap_end) {
+                ret = 0;
+                break;
+            }
+        }
         assert(!qiov);
         ret = blk_co_pwrite_zeroes(job->target, offset, bytes, flags);
+        if (job->zero_bitmap && ret >= 0) {
+            bitmap_set(job->zero_bitmap, dirty_bitmap_offset / job->granularity,
+                       (dirty_bitmap_end - dirty_bitmap_offset) /
+                       job->granularity);
+        }
         break;

     case MIRROR_METHOD_DISCARD:
+        if (job->zero_bitmap) {
+            bitmap_clear(job->zero_bitmap, zero_bitmap_offset,
+                         zero_bitmap_end - zero_bitmap_offset);
+        }
         assert(!qiov);
         ret = blk_co_pdiscard(job->target, offset, bytes);
         break;
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 09/11] iotests/common.rc: add disk_usage function
  2025-04-25  0:52 [PATCH v3 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
                   ` (7 preceding siblings ...)
  2025-04-25  0:52 ` [PATCH v3 08/11] mirror: Skip writing zeroes when target " Eric Blake
@ 2025-04-25  0:52 ` Eric Blake
  2025-04-25  0:52 ` [PATCH v3 10/11] tests: Add iotest mirror-sparse for recent patches Eric Blake
  2025-04-25  0:52 ` [PATCH v3 11/11] mirror: Allow QMP override to declare target already zero Eric Blake
  10 siblings, 0 replies; 24+ messages in thread
From: Eric Blake @ 2025-04-25  0:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, vsementsov, stefanha, Andrey Drobyshev,
	Alexander Ivanov, Alberto Garcia, Kevin Wolf, Hanna Reitz

From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

Move the definition from iotests/250 to common.rc.  This is used to
detect real disk usage of sparse files.  In particular, we want to use
it for checking subclusters-based discards.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-ID: <20240913163942.423050-6-andrey.drobyshev@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/common.rc | 6 ++++++
 tests/qemu-iotests/250       | 5 -----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 95c12577dd4..237f746af88 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -140,6 +140,12 @@ _optstr_add()
     fi
 }

+# report real disk usage for sparse files
+disk_usage()
+{
+    du --block-size=1 "$1" | awk '{print $1}'
+}
+
 # Set the variables to the empty string to turn Valgrind off
 # for specific processes, e.g.
 # $ VALGRIND_QEMU_IO= ./check -qcow2 -valgrind 015
diff --git a/tests/qemu-iotests/250 b/tests/qemu-iotests/250
index af48f83abac..c0a0dbc0ff1 100755
--- a/tests/qemu-iotests/250
+++ b/tests/qemu-iotests/250
@@ -52,11 +52,6 @@ _unsupported_imgopts data_file
 # bdrv_co_truncate(bs->file) call in qcow2_co_truncate(), which might succeed
 # anyway.

-disk_usage()
-{
-    du --block-size=1 $1 | awk '{print $1}'
-}
-
 size=2100M

 _make_test_img -o "cluster_size=1M,preallocation=metadata" $size
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 10/11] tests: Add iotest mirror-sparse for recent patches
  2025-04-25  0:52 [PATCH v3 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
                   ` (8 preceding siblings ...)
  2025-04-25  0:52 ` [PATCH v3 09/11] iotests/common.rc: add disk_usage function Eric Blake
@ 2025-04-25  0:52 ` Eric Blake
  2025-05-01 17:34   ` Stefan Hajnoczi
                     ` (2 more replies)
  2025-04-25  0:52 ` [PATCH v3 11/11] mirror: Allow QMP override to declare target already zero Eric Blake
  10 siblings, 3 replies; 24+ messages in thread
From: Eric Blake @ 2025-04-25  0:52 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, vsementsov, stefanha, Kevin Wolf, Hanna Reitz

Prove that blockdev-mirror can now result in sparse raw destination
files, regardless of whether the source is raw or qcow2.  By making
this a separate test, it was possible to test effects of individual
patches for the various pieces that all have to work together for a
sparse mirror to be successful.

Note that ./check -file produces different job lengths than ./check
-qcow2 (the test uses a filter to normalize); that's because when
deciding how much of the image to be mirrored, the code looks at how
much of the source image was allocated (for qcow2, this is only the
written clusters; for raw, it is the entire file).  But the important
part is that the destination file ends up smaller than 3M, rather than
the 20M it used to be before this patch series.

Signed-off-by: Eric Blake <eblake@redhat.com>

---

Improve test with more cases
---
 tests/qemu-iotests/tests/mirror-sparse     | 128 ++++++++
 tests/qemu-iotests/tests/mirror-sparse.out | 365 +++++++++++++++++++++
 2 files changed, 493 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/mirror-sparse
 create mode 100644 tests/qemu-iotests/tests/mirror-sparse.out

diff --git a/tests/qemu-iotests/tests/mirror-sparse b/tests/qemu-iotests/tests/mirror-sparse
new file mode 100755
index 00000000000..33cc7a1e5f3
--- /dev/null
+++ b/tests/qemu-iotests/tests/mirror-sparse
@@ -0,0 +1,128 @@
+#!/usr/bin/env bash
+# group: rw auto quick
+#
+# Test blockdev-mirror with raw sparse destination
+#
+# Copyright (C) 2025 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1 # failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+    _cleanup_qemu
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+cd ..
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt qcow2 raw  # Format of the source. dst is always raw file
+_supported_proto file
+_supported_os Linux
+
+filter_len() {
+    sed -e 's/"len": [0-9]*/"len": LEN/g' \
+        -e 's/"offset": [0-9]*/"offset": OFFSET/g'
+}
+
+echo
+echo "=== Initial image setup ==="
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img 20M
+$QEMU_IO -c 'w 8M 2M' -f $IMGFMT "$TEST_IMG.base" | _filter_qemu_io
+
+_launch_qemu -machine q35 \
+    -blockdev '{"driver":"file", "cache":{"direct":true, "no-flush":false},
+                "filename":"'"$TEST_IMG.base"'", "node-name":"src-file"}' \
+    -blockdev '{"driver":"'$IMGFMT'", "node-name":"src", "file":"src-file"}'
+h1=$QEMU_HANDLE
+_send_qemu_cmd $h1 '{"execute": "qmp_capabilities"}' 'return'
+
+# Check several combinations; most should result in a sparse destination;
+# the destination should only be fully allocated if pre-allocated
+# and not punching holes due to detect-zeroes
+# do_test creation discard zeroes result
+do_test() {
+    creation=$1
+    discard=$2
+    zeroes=$3
+    expected=$4
+
+echo
+echo "=== Testing creation=$creation discard=$discard zeroes=$zeroes ==="
+echo
+
+rm -f $TEST_IMG
+if test $creation = external; then
+    truncate --size=20M $TEST_IMG
+else
+    _send_qemu_cmd $h1 '{"execute": "blockdev-create", "arguments":
+          {"options": {"driver":"file", "filename":"'$TEST_IMG'",
+            "size":'$((20*1024*1024))', "preallocation":"'$creation'"},
+           "job-id":"job1"}}' 'concluded'
+    _send_qemu_cmd $h1 '{"execute": "job-dismiss", "arguments":
+          {"id": "job1"}}' 'return'
+fi
+_send_qemu_cmd $h1 '{"execute": "blockdev-add", "arguments":
+                     {"node-name": "dst", "driver":"file",
+                      "filename":"'$TEST_IMG'", "aio":"threads",
+                      "auto-read-only":true, "discard":"'$discard'",
+                      "detect-zeroes":"'$zeroes'"}}' 'return'
+_send_qemu_cmd $h1 '{"execute":"blockdev-mirror", "arguments":
+                     {"sync":"full", "device":"src", "target":"dst",
+                      "job-id":"job2"}}' 'return'
+_timed_wait_for $h1 '"ready"'
+_send_qemu_cmd $h1 '{"execute": "job-complete", "arguments":
+               {"id":"job2"}}' 'return' | filter_len
+_send_qemu_cmd $h1 '{"execute": "blockdev-del", "arguments":
+                {"node-name": "dst"}}' 'return' | filter_len
+$QEMU_IMG compare -U -f $IMGFMT -F raw $TEST_IMG.base $TEST_IMG
+result=$(disk_usage $TEST_IMG)
+if test $result -lt $((3*1024*1024)); then
+    actual=sparse
+elif test $result = $((20*1024*1024)); then
+    actual=full
+else
+    actual=unknown
+fi
+echo "Destination is $actual; expected $expected"
+}
+
+do_test external ignore off sparse
+do_test external unmap off sparse
+do_test external unmap unmap sparse
+do_test off ignore off sparse
+do_test off unmap off sparse
+do_test off unmap unmap sparse
+do_test full ignore off full
+do_test full unmap off sparse
+do_test full unmap unmap sparse
+
+_send_qemu_cmd $h1 '{"execute":"quit"}' ''
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/mirror-sparse.out b/tests/qemu-iotests/tests/mirror-sparse.out
new file mode 100644
index 00000000000..2103b891c3f
--- /dev/null
+++ b/tests/qemu-iotests/tests/mirror-sparse.out
@@ -0,0 +1,365 @@
+QA output created by mirror-sparse
+
+=== Initial image setup ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=20971520
+wrote 2097152/2097152 bytes at offset 8388608
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+{"execute": "qmp_capabilities"}
+{"return": {}}
+
+=== Testing creation=external discard=ignore zeroes=off ===
+
+{"execute": "blockdev-add", "arguments":
+                     {"node-name": "dst", "driver":"file",
+                      "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+                      "auto-read-only":true, "discard":"ignore",
+                      "detect-zeroes":"off"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+                     {"sync":"full", "device":"src", "target":"dst",
+                      "job-id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+               {"id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+                {"node-name": "dst"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+Destination is sparse; expected sparse
+
+=== Testing creation=external discard=unmap zeroes=off ===
+
+{"execute": "blockdev-add", "arguments":
+                     {"node-name": "dst", "driver":"file",
+                      "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+                      "auto-read-only":true, "discard":"unmap",
+                      "detect-zeroes":"off"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+                     {"sync":"full", "device":"src", "target":"dst",
+                      "job-id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+               {"id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+                {"node-name": "dst"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+Destination is sparse; expected sparse
+
+=== Testing creation=external discard=unmap zeroes=unmap ===
+
+{"execute": "blockdev-add", "arguments":
+                     {"node-name": "dst", "driver":"file",
+                      "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+                      "auto-read-only":true, "discard":"unmap",
+                      "detect-zeroes":"unmap"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+                     {"sync":"full", "device":"src", "target":"dst",
+                      "job-id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+               {"id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+                {"node-name": "dst"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+Destination is sparse; expected sparse
+
+=== Testing creation=off discard=ignore zeroes=off ===
+
+{"execute": "blockdev-create", "arguments":
+          {"options": {"driver":"file", "filename":"TEST_DIR/t.IMGFMT",
+            "size":20971520, "preallocation":"off"},
+           "job-id":"job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job1"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job1"}}
+{"execute": "job-dismiss", "arguments":
+          {"id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job1"}}
+{"return": {}}
+{"execute": "blockdev-add", "arguments":
+                     {"node-name": "dst", "driver":"file",
+                      "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+                      "auto-read-only":true, "discard":"ignore",
+                      "detect-zeroes":"off"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+                     {"sync":"full", "device":"src", "target":"dst",
+                      "job-id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+               {"id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+                {"node-name": "dst"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+Destination is sparse; expected sparse
+
+=== Testing creation=off discard=unmap zeroes=off ===
+
+{"execute": "blockdev-create", "arguments":
+          {"options": {"driver":"file", "filename":"TEST_DIR/t.IMGFMT",
+            "size":20971520, "preallocation":"off"},
+           "job-id":"job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job1"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job1"}}
+{"execute": "job-dismiss", "arguments":
+          {"id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job1"}}
+{"return": {}}
+{"execute": "blockdev-add", "arguments":
+                     {"node-name": "dst", "driver":"file",
+                      "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+                      "auto-read-only":true, "discard":"unmap",
+                      "detect-zeroes":"off"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+                     {"sync":"full", "device":"src", "target":"dst",
+                      "job-id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+               {"id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+                {"node-name": "dst"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+Destination is sparse; expected sparse
+
+=== Testing creation=off discard=unmap zeroes=unmap ===
+
+{"execute": "blockdev-create", "arguments":
+          {"options": {"driver":"file", "filename":"TEST_DIR/t.IMGFMT",
+            "size":20971520, "preallocation":"off"},
+           "job-id":"job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job1"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job1"}}
+{"execute": "job-dismiss", "arguments":
+          {"id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job1"}}
+{"return": {}}
+{"execute": "blockdev-add", "arguments":
+                     {"node-name": "dst", "driver":"file",
+                      "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+                      "auto-read-only":true, "discard":"unmap",
+                      "detect-zeroes":"unmap"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+                     {"sync":"full", "device":"src", "target":"dst",
+                      "job-id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+               {"id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+                {"node-name": "dst"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+Destination is sparse; expected sparse
+
+=== Testing creation=full discard=ignore zeroes=off ===
+
+{"execute": "blockdev-create", "arguments":
+          {"options": {"driver":"file", "filename":"TEST_DIR/t.IMGFMT",
+            "size":20971520, "preallocation":"full"},
+           "job-id":"job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job1"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job1"}}
+{"execute": "job-dismiss", "arguments":
+          {"id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job1"}}
+{"return": {}}
+{"execute": "blockdev-add", "arguments":
+                     {"node-name": "dst", "driver":"file",
+                      "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+                      "auto-read-only":true, "discard":"ignore",
+                      "detect-zeroes":"off"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+                     {"sync":"full", "device":"src", "target":"dst",
+                      "job-id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+               {"id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+                {"node-name": "dst"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+Destination is full; expected full
+
+=== Testing creation=full discard=unmap zeroes=off ===
+
+{"execute": "blockdev-create", "arguments":
+          {"options": {"driver":"file", "filename":"TEST_DIR/t.IMGFMT",
+            "size":20971520, "preallocation":"full"},
+           "job-id":"job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job1"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job1"}}
+{"execute": "job-dismiss", "arguments":
+          {"id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job1"}}
+{"return": {}}
+{"execute": "blockdev-add", "arguments":
+                     {"node-name": "dst", "driver":"file",
+                      "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+                      "auto-read-only":true, "discard":"unmap",
+                      "detect-zeroes":"off"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+                     {"sync":"full", "device":"src", "target":"dst",
+                      "job-id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+               {"id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+                {"node-name": "dst"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+Destination is sparse; expected sparse
+
+=== Testing creation=full discard=unmap zeroes=unmap ===
+
+{"execute": "blockdev-create", "arguments":
+          {"options": {"driver":"file", "filename":"TEST_DIR/t.IMGFMT",
+            "size":20971520, "preallocation":"full"},
+           "job-id":"job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job1"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job1"}}
+{"execute": "job-dismiss", "arguments":
+          {"id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job1"}}
+{"return": {}}
+{"execute": "blockdev-add", "arguments":
+                     {"node-name": "dst", "driver":"file",
+                      "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+                      "auto-read-only":true, "discard":"unmap",
+                      "detect-zeroes":"unmap"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+                     {"sync":"full", "device":"src", "target":"dst",
+                      "job-id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+               {"id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+                {"node-name": "dst"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+Destination is sparse; expected sparse
+{"execute":"quit"}
+*** done
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v3 11/11] mirror: Allow QMP override to declare target already zero
  2025-04-25  0:52 [PATCH v3 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
                   ` (9 preceding siblings ...)
  2025-04-25  0:52 ` [PATCH v3 10/11] tests: Add iotest mirror-sparse for recent patches Eric Blake
@ 2025-04-25  0:52 ` Eric Blake
  10 siblings, 0 replies; 24+ messages in thread
From: Eric Blake @ 2025-04-25  0:52 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, vsementsov, stefanha, Markus Armbruster, John Snow,
	Kevin Wolf, Hanna Reitz

QEMU's attempts to learn whether a destination file starts life with
all zero contents are just a heuristic.  There may be cases where the
caller is aware of information that QEMU cannot learn quickly, in
which case telling QEMU what to assume about the destination can make
the mirror operation faster.  Given our existing example of "qemu-img
convert --target-is-zero", it is time to expose this override in QMP
for blockdev-mirror as well.

Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 qapi/block-core.json                   |  8 +++++++-
 include/block/block_int-global-state.h |  3 ++-
 block/mirror.c                         | 23 +++++++++++++++--------
 blockdev.c                             | 18 +++++++++++-------
 tests/unit/test-block-iothread.c       |  2 +-
 5 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index b1937780e19..7f70ec6d3cb 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2538,6 +2538,11 @@
 #     disappear from the query list without user intervention.
 #     Defaults to true.  (Since 3.1)
 #
+# @target-is-zero: Assume the destination reads as all zeroes before
+#     the mirror started.  Setting this to true can speed up the
+#     mirror.  Setting this to true when the destination is not
+#     actually all zero can corrupt the destination.  (Since 10.1)
+#
 # Since: 2.6
 #
 # .. qmp-example::
@@ -2557,7 +2562,8 @@
             '*on-target-error': 'BlockdevOnError',
             '*filter-node-name': 'str',
             '*copy-mode': 'MirrorCopyMode',
-            '*auto-finalize': 'bool', '*auto-dismiss': 'bool' },
+            '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
+            '*target-is-zero': 'bool'},
   'allow-preconfig': true }

 ##
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index eb2d92a2261..a2b96f90d44 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -140,6 +140,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
  * @mode: Whether to collapse all images in the chain to the target.
  * @backing_mode: How to establish the target's backing chain after completion.
  * @zero_target: Whether the target should be explicitly zero-initialized
+ * @target_is_zero: Whether the target already is zero-initialized
  * @on_source_error: The action to take upon error reading from the source.
  * @on_target_error: The action to take upon error writing to the target.
  * @unmap: Whether to unmap target where source sectors only contain zeroes.
@@ -159,7 +160,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
                   int creation_flags, int64_t speed,
                   uint32_t granularity, int64_t buf_size,
                   MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
-                  bool zero_target,
+                  bool zero_target, bool target_is_zero,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   bool unmap, const char *filter_node_name,
diff --git a/block/mirror.c b/block/mirror.c
index 69a02dfc2b7..69467db0f79 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -55,6 +55,8 @@ typedef struct MirrorBlockJob {
     BlockMirrorBackingMode backing_mode;
     /* Whether the target image requires explicit zero-initialization */
     bool zero_target;
+    /* Whether the target should be assumed to be already zero initialized */
+    bool target_is_zero;
     /*
      * To be accesssed with atomics. Written only under the BQL (required by the
      * current implementation of mirror_change()).
@@ -892,9 +894,13 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
         int64_t bitmap_length = DIV_ROUND_UP(s->bdev_length, s->granularity);

         offset = 0;
-        bdrv_graph_co_rdlock();
-        ret = bdrv_co_is_all_zeroes(target_bs);
-        bdrv_graph_co_rdunlock();
+        if (s->target_is_zero) {
+            ret = 1;
+        } else {
+            bdrv_graph_co_rdlock();
+            ret = bdrv_co_is_all_zeroes(target_bs);
+            bdrv_graph_co_rdunlock();
+        }
         if (ret < 0) {
             return ret;
         }
@@ -1799,7 +1805,7 @@ static BlockJob *mirror_start_job(
                              const char *replaces, int64_t speed,
                              uint32_t granularity, int64_t buf_size,
                              BlockMirrorBackingMode backing_mode,
-                             bool zero_target,
+                             bool zero_target, bool target_is_zero,
                              BlockdevOnError on_source_error,
                              BlockdevOnError on_target_error,
                              bool unmap,
@@ -1968,6 +1974,7 @@ static BlockJob *mirror_start_job(
     s->is_none_mode = is_none_mode;
     s->backing_mode = backing_mode;
     s->zero_target = zero_target;
+    s->target_is_zero = target_is_zero;
     qatomic_set(&s->copy_mode, copy_mode);
     s->base = base;
     s->base_overlay = bdrv_find_overlay(bs, base);
@@ -2096,7 +2103,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
                   int creation_flags, int64_t speed,
                   uint32_t granularity, int64_t buf_size,
                   MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
-                  bool zero_target,
+                  bool zero_target, bool target_is_zero,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   bool unmap, const char *filter_node_name,
@@ -2121,8 +2128,8 @@ void mirror_start(const char *job_id, BlockDriverState *bs,

     mirror_start_job(job_id, bs, creation_flags, target, replaces,
                      speed, granularity, buf_size, backing_mode, zero_target,
-                     on_source_error, on_target_error, unmap, NULL, NULL,
-                     &mirror_job_driver, is_none_mode, base, false,
+                     target_is_zero, on_source_error, on_target_error, unmap,
+                     NULL, NULL, &mirror_job_driver, is_none_mode, base, false,
                      filter_node_name, true, copy_mode, false, errp);
 }

@@ -2148,7 +2155,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,

     job = mirror_start_job(
                      job_id, bs, creation_flags, base, NULL, speed, 0, 0,
-                     MIRROR_LEAVE_BACKING_CHAIN, false,
+                     MIRROR_LEAVE_BACKING_CHAIN, false, false,
                      on_error, on_error, true, cb, opaque,
                      &commit_active_job_driver, false, base, auto_complete,
                      filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
diff --git a/blockdev.c b/blockdev.c
index 1d1f27cfff6..6f5373991c8 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2798,7 +2798,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
                                    const char *replaces,
                                    enum MirrorSyncMode sync,
                                    BlockMirrorBackingMode backing_mode,
-                                   bool zero_target,
+                                   bool zero_target, bool target_is_zero,
                                    bool has_speed, int64_t speed,
                                    bool has_granularity, uint32_t granularity,
                                    bool has_buf_size, int64_t buf_size,
@@ -2909,11 +2909,10 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
     /* pass the node name to replace to mirror start since it's loose coupling
      * and will allow to check whether the node still exist at mirror completion
      */
-    mirror_start(job_id, bs, target,
-                 replaces, job_flags,
+    mirror_start(job_id, bs, target, replaces, job_flags,
                  speed, granularity, buf_size, sync, backing_mode, zero_target,
-                 on_source_error, on_target_error, unmap, filter_node_name,
-                 copy_mode, errp);
+                 target_is_zero, on_source_error, on_target_error, unmap,
+                 filter_node_name, copy_mode, errp);
 }

 void qmp_drive_mirror(DriveMirror *arg, Error **errp)
@@ -2928,6 +2927,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     int64_t size;
     const char *format = arg->format;
     bool zero_target;
+    bool target_is_zero;
     int ret;

     bs = qmp_get_root_bs(arg->device, errp);
@@ -3044,6 +3044,8 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     zero_target = (arg->sync == MIRROR_SYNC_MODE_FULL &&
                    (arg->mode == NEW_IMAGE_MODE_EXISTING ||
                     !bdrv_has_zero_init(target_bs)));
+    target_is_zero = (arg->mode == NEW_IMAGE_MODE_ABSOLUTE_PATHS &&
+                      bdrv_has_zero_init(target_bs));
     bdrv_graph_rdunlock_main_loop();


@@ -3055,7 +3057,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)

     blockdev_mirror_common(arg->job_id, bs, target_bs,
                            arg->replaces, arg->sync,
-                           backing_mode, zero_target,
+                           backing_mode, zero_target, target_is_zero,
                            arg->has_speed, arg->speed,
                            arg->has_granularity, arg->granularity,
                            arg->has_buf_size, arg->buf_size,
@@ -3085,6 +3087,7 @@ void qmp_blockdev_mirror(const char *job_id,
                          bool has_copy_mode, MirrorCopyMode copy_mode,
                          bool has_auto_finalize, bool auto_finalize,
                          bool has_auto_dismiss, bool auto_dismiss,
+                         bool has_target_is_zero, bool target_is_zero,
                          Error **errp)
 {
     BlockDriverState *bs;
@@ -3115,7 +3118,8 @@ void qmp_blockdev_mirror(const char *job_id,

     blockdev_mirror_common(job_id, bs, target_bs,
                            replaces, sync, backing_mode,
-                           zero_target, has_speed, speed,
+                           zero_target, has_target_is_zero && target_is_zero,
+                           has_speed, speed,
                            has_granularity, granularity,
                            has_buf_size, buf_size,
                            has_on_source_error, on_source_error,
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index e26b3be5939..54aed8252c0 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -755,7 +755,7 @@ static void test_propagate_mirror(void)

     /* Start a mirror job */
     mirror_start("job0", src, target, NULL, JOB_DEFAULT, 0, 0, 0,
-                 MIRROR_SYNC_MODE_NONE, MIRROR_OPEN_BACKING_CHAIN, false,
+                 MIRROR_SYNC_MODE_NONE, MIRROR_OPEN_BACKING_CHAIN, false, false,
                  BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
                  false, "filter_node", MIRROR_COPY_MODE_BACKGROUND,
                  &error_abort);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 07/11] mirror: Skip pre-zeroing destination if it is already zero
  2025-04-25  0:52 ` [PATCH v3 07/11] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
@ 2025-04-30 16:09   ` Sunny Zhu
  2025-05-01 17:33     ` Eric Blake
  0 siblings, 1 reply; 24+ messages in thread
From: Sunny Zhu @ 2025-04-30 16:09 UTC (permalink / raw)
  To: eblake; +Cc: hreitz, jsnow, kwolf, qemu-block, qemu-devel, stefanha,
	vsementsov

> When doing a sync=full mirroring, QMP drive-mirror requests full
> zeroing if it did not just create the destination, and blockdev-mirror
> requests full zeroing unconditionally.  This is because during a full
> sync, we must ensure that the portions of the disk that are not
> otherwise touched by the source still read as zero upon completion.
> 
> However, in mirror_dirty_init(), we were blindly assuming that if the
> destination allows punching holes, we should pre-zero the entire
> image; and if it does not allow punching holes, then treat the entire
> source as dirty rather than mirroring just the allocated portions of
> the source.  Without the ability to punch holes, this results in the
> destination file being fully allocated; and even when punching holes
> is supported, it causes duplicate I/O to the portions of the
> destination corresponding to chunks of the source that are allocated
> but read as zero.
> 
> Smarter is to avoid the pre-zeroing pass over the destination if it
> can be proved the destination already reads as zero.  Note that a
> later patch will then further improve things to skip writing to the
> destination for parts of the image where the source is zero; but even
> with just this patch, it is possible to see a difference for any BDS
> that can quickly report that it already reads as zero.
> 
> Note, however, that if the destination was opened with
> "detect-zeroes": "unmap", then the user wants us to punch holes where
> possible for any zeroes in the source; in that case, we are better off
> doing unmap up front in bulk.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> 
> v3: add exemption for "detect-zeroes":"unmap" on destination
> ---
>  block/mirror.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 34c6c5252e1..4059bf96854 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -849,13 +849,30 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
>      bdrv_graph_co_rdunlock();
> 
>      if (s->zero_target) {
> +        offset = 0;
> +        bdrv_graph_co_rdlock();
> +        ret = bdrv_co_is_all_zeroes(target_bs);
> +        bdrv_graph_co_rdunlock();
> +        if (ret < 0) {
> +            return ret;
> +        }
> +        /*
> +         * If the destination already reads as zero, and we are not
> +         * requested to punch holes into existing zeroes, then we can
> +         * skip pre-zeroing the destination.
> +         */
> +        if (ret > 0 &&
> +            (target_bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP ||
> +             !bdrv_can_write_zeroes_with_unmap(target_bs))) {
> +            offset = s->bdev_length;

If when bdrv_can_write_zeroes_with_unmap(target_bs) == true, we prefer to
punch holes regardless of whether target_bs already reads as zero, then
execute bdrv_co_is_all_zeroes() in advance might be wasteful.

    if (bdrv_can_write_zeroes_with_unmap(target_bs)) {
        initial_zeroing();
    } else {
        ret = bdrv_co_is_all_zeroes(target_bs);
        ...
    }

> +        }
>          if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
>              bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
>              return 0;

When ret > 0, We should not return directly here.

>          }
> 
>          s->initial_zeroing_ongoing = true;
> -        for (offset = 0; offset < s->bdev_length; ) {
> +        while (offset < s->bdev_length) {
>              int bytes = MIN(s->bdev_length - offset,
>                              QEMU_ALIGN_DOWN(INT_MAX, s->granularity));



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re [PATCH v3 08/11] mirror: Skip writing zeroes when target is already zero
  2025-04-25  0:52 ` [PATCH v3 08/11] mirror: Skip writing zeroes when target " Eric Blake
@ 2025-04-30 16:38   ` Sunny Zhu
  2025-05-01 17:58     ` Eric Blake
  0 siblings, 1 reply; 24+ messages in thread
From: Sunny Zhu @ 2025-04-30 16:38 UTC (permalink / raw)
  To: eblake; +Cc: hreitz, jsnow, kwolf, qemu-block, qemu-devel, stefanha,
	vsementsov

on Thu 24 Apr 2025 19:52:08 -0500, Eric wrote:
>      if (s->zero_target) {
> +        int64_t bitmap_length = DIV_ROUND_UP(s->bdev_length, s->granularity);
> +
>          offset = 0;
>          bdrv_graph_co_rdlock();
>          ret = bdrv_co_is_all_zeroes(target_bs);
> @@ -856,6 +898,7 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
>          if (ret < 0) {
>              return ret;
>          }
> +        s->zero_bitmap = bitmap_new(bitmap_length);
>          /*
>           * If the destination already reads as zero, and we are not
>           * requested to punch holes into existing zeroes, then we can
> @@ -864,6 +907,7 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
>          if (ret > 0 &&
>              (target_bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP ||
>               !bdrv_can_write_zeroes_with_unmap(target_bs))) {
> +            bitmap_set(s->zero_bitmap, 0, bitmap_length);

when arg->mode != NEW_IMAGE_MODE_EXISTING && bdrv_has_zero_init(target_bs) is true
in drive_mirror (This means the target image is newly created), in which case
s->zero_target == false, we still need to execute bitmap_set(s->zero_bitmap, 0, bitmap_length) 

>              offset = s->bdev_length;
>          }
>          if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
> @@ -875,6 +919,7 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
>          while (offset < s->bdev_length) {
>              int bytes = MIN(s->bdev_length - offset,
>                              QEMU_ALIGN_DOWN(INT_MAX, s->granularity));



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents
  2025-04-25  0:52 ` [PATCH v3 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents Eric Blake
@ 2025-05-01 17:20   ` Stefan Hajnoczi
  0 siblings, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2025-05-01 17:20 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, qemu-block, vsementsov, Fam Zheng, Kevin Wolf,
	Hanna Reitz

[-- Attachment #1: Type: text/plain, Size: 1193 bytes --]

On Thu, Apr 24, 2025 at 07:52:03PM -0500, Eric Blake wrote:
> Some BDS drivers have a cap on how much block status they can supply
> in one query (for example, NBD talking to an older server cannot
> inspect more than 4G per query; and qcow2 tends to cap its answers
> rather than cross a cluster boundary of an L1 table).  Although the
> existing callers of bdrv_co_is_zero_fast are not passing in that large
> of a 'bytes' parameter, an upcoming caller wants to query the entire
> image at once, and will thus benefit from being able to treat adjacent
> zero regions in a coalesced manner, rather than claiming the region is
> non-zero merely because pnum was truncated and didn't match the
> incoming bytes.
> 
> While refactoring this into a loop, note that there is no need to
> assign pnum prior to calling bdrv_co_common_block_status_above() (it
> is guaranteed to be assigned deeper in the callstack).
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> 
> v3: also tweak function comment
> ---
>  block/io.c | 29 ++++++++++++++++-------------
>  1 file changed, 16 insertions(+), 13 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 04/11] block: Add new bdrv_co_is_all_zeroes() function
  2025-04-25  0:52 ` [PATCH v3 04/11] block: Add new bdrv_co_is_all_zeroes() function Eric Blake
@ 2025-05-01 17:24   ` Stefan Hajnoczi
  0 siblings, 0 replies; 24+ messages in thread
From: Stefan Hajnoczi @ 2025-05-01 17:24 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, qemu-block, vsementsov, Fam Zheng, Kevin Wolf,
	Hanna Reitz

[-- Attachment #1: Type: text/plain, Size: 1590 bytes --]

On Thu, Apr 24, 2025 at 07:52:04PM -0500, Eric Blake wrote:
> There are some optimizations that require knowing if an image starts
> out as reading all zeroes, such as making blockdev-mirror faster by
> skipping the copying of source zeroes to the destination.  The
> existing bdrv_co_is_zero_fast() is a good building block for answering
> this question, but it tends to give an answer of 0 for a file we just
> created via QMP 'blockdev-create' or similar (such as 'qemu-img create
> -f raw').  Why?  Because file-posix.c insists on allocating a tiny
> header to any file rather than leaving it 100% sparse, due to some
> filesystems that are unable to answer alignment probes on a hole.  But
> teaching file-posix.c to read the tiny header doesn't scale - the
> problem of a small header is also visible when libvirt sets up an NBD
> client to a just-created file on a migration destination host.
> 
> So, we need a wrapper function that handles a bit more complexity in a
> common manner for all block devices - when the BDS is mostly a hole,
> but has a small non-hole header, it is still worth the time to read
> that header and check if it reads as all zeroes before giving up and
> returning a pessimistic answer.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> 
> ---
> 
> v3: Use constant 128k as maximum data header size to read [Stefan]
> ---
>  include/block/block-io.h |  2 ++
>  block/io.c               | 62 ++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 64 insertions(+)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 07/11] mirror: Skip pre-zeroing destination if it is already zero
  2025-04-30 16:09   ` Sunny Zhu
@ 2025-05-01 17:33     ` Eric Blake
  2025-05-02  3:26       ` Sunny Zhu
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Blake @ 2025-05-01 17:33 UTC (permalink / raw)
  To: Sunny Zhu
  Cc: hreitz, jsnow, kwolf, qemu-block, qemu-devel, stefanha,
	vsementsov

On Thu, May 01, 2025 at 12:09:26AM +0800, Sunny Zhu wrote:
> > When doing a sync=full mirroring, QMP drive-mirror requests full
> > zeroing if it did not just create the destination, and blockdev-mirror
> > requests full zeroing unconditionally.  This is because during a full
> > sync, we must ensure that the portions of the disk that are not
> > otherwise touched by the source still read as zero upon completion.
> > 
> > However, in mirror_dirty_init(), we were blindly assuming that if the
> > destination allows punching holes, we should pre-zero the entire
> > image; and if it does not allow punching holes, then treat the entire
> > source as dirty rather than mirroring just the allocated portions of
> > the source.  Without the ability to punch holes, this results in the
> > destination file being fully allocated; and even when punching holes
> > is supported, it causes duplicate I/O to the portions of the
> > destination corresponding to chunks of the source that are allocated
> > but read as zero.
> > 
> > Smarter is to avoid the pre-zeroing pass over the destination if it
> > can be proved the destination already reads as zero.  Note that a
> > later patch will then further improve things to skip writing to the
> > destination for parts of the image where the source is zero; but even
> > with just this patch, it is possible to see a difference for any BDS
> > that can quickly report that it already reads as zero.
> > 
> > Note, however, that if the destination was opened with
> > "detect-zeroes": "unmap", then the user wants us to punch holes where
> > possible for any zeroes in the source; in that case, we are better off
> > doing unmap up front in bulk.
> > 
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > 
> > ---
> > 
> > v3: add exemption for "detect-zeroes":"unmap" on destination
> > ---
> >  block/mirror.c | 19 ++++++++++++++++++-
> >  1 file changed, 18 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/mirror.c b/block/mirror.c
> > index 34c6c5252e1..4059bf96854 100644
> > --- a/block/mirror.c
> > +++ b/block/mirror.c
> > @@ -849,13 +849,30 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
> >      bdrv_graph_co_rdunlock();
> > 
> >      if (s->zero_target) {
> > +        offset = 0;
> > +        bdrv_graph_co_rdlock();
> > +        ret = bdrv_co_is_all_zeroes(target_bs);
> > +        bdrv_graph_co_rdunlock();
> > +        if (ret < 0) {
> > +            return ret;
> > +        }
> > +        /*
> > +         * If the destination already reads as zero, and we are not
> > +         * requested to punch holes into existing zeroes, then we can
> > +         * skip pre-zeroing the destination.
> > +         */
> > +        if (ret > 0 &&
> > +            (target_bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP ||
> > +             !bdrv_can_write_zeroes_with_unmap(target_bs))) {
> > +            offset = s->bdev_length;
> 
> If when bdrv_can_write_zeroes_with_unmap(target_bs) == true, we prefer to
> punch holes regardless of whether target_bs already reads as zero, then
> execute bdrv_co_is_all_zeroes() in advance might be wasteful.

Hmm.  bdrv_co_is_all_zeroes() is supposed to be fast, but you're right
that even faster than a syscall or two is no syscalls at all.

> 
>     if (bdrv_can_write_zeroes_with_unmap(target_bs)) {
>         initial_zeroing();
>     } else {
>         ret = bdrv_co_is_all_zeroes(target_bs);
>         ...
>     }

That's a bigger refactoring; probably worth doing in a separate patch.
It looks like I should probably do a v4 along those lines, to see how
it compares.

> 
> > +        }
> >          if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
> >              bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
> >              return 0;
> 
> When ret > 0, We should not return directly here.

That's pre-existing, but correct.  If we can't write zeroes with
unmap, then we mark the entire image dirty (every zero in the source
will result in zeroes in the dest - as it was before this series); and
we have also marked the zero bitmap (writing zeroes will be a no-op if
the zero bitmap says that is safe).  The rest of this function has two
purposes: finish pre-zeroing (well, there's nothing to pre-zero if we
can't punch holes, and especially nothing to pre-zero if we already
know the image reads as all zero), and populate the dirty bitmap (we
just populated the entire map here, so it's not worth trying to
populate the map with finer granularity later), so returning here is
the right thing to do.

> 
> >          }
> > 
> >          s->initial_zeroing_ongoing = true;
> > -        for (offset = 0; offset < s->bdev_length; ) {
> > +        while (offset < s->bdev_length) {
> >              int bytes = MIN(s->bdev_length - offset,
> >                              QEMU_ALIGN_DOWN(INT_MAX, s->granularity));
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 10/11] tests: Add iotest mirror-sparse for recent patches
  2025-04-25  0:52 ` [PATCH v3 10/11] tests: Add iotest mirror-sparse for recent patches Eric Blake
@ 2025-05-01 17:34   ` Stefan Hajnoczi
  2025-05-01 18:00     ` Eric Blake
  2025-05-09 15:33   ` Eric Blake
  2025-08-01  7:54   ` Michael Tokarev
  2 siblings, 1 reply; 24+ messages in thread
From: Stefan Hajnoczi @ 2025-05-01 17:34 UTC (permalink / raw)
  To: Eric Blake; +Cc: qemu-devel, qemu-block, vsementsov, Kevin Wolf, Hanna Reitz

[-- Attachment #1: Type: text/plain, Size: 291 bytes --]

On Thu, Apr 24, 2025 at 07:52:10PM -0500, Eric Blake wrote:
> +TEST_IMG="$TEST_IMG.base" _make_test_img 20M
> +$QEMU_IO -c 'w 8M 2M' -f $IMGFMT "$TEST_IMG.base" | _filter_qemu_io
> +
> +_launch_qemu -machine q35 \

Does the machine type matter? This test would be more portable without
q35.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: Re [PATCH v3 08/11] mirror: Skip writing zeroes when target is already zero
  2025-04-30 16:38   ` Re " Sunny Zhu
@ 2025-05-01 17:58     ` Eric Blake
  2025-05-02  5:43       ` Sunny Zhu
  0 siblings, 1 reply; 24+ messages in thread
From: Eric Blake @ 2025-05-01 17:58 UTC (permalink / raw)
  To: Sunny Zhu
  Cc: hreitz, jsnow, kwolf, qemu-block, qemu-devel, stefanha,
	vsementsov

On Thu, May 01, 2025 at 12:38:30AM +0800, Sunny Zhu wrote:
> on Thu 24 Apr 2025 19:52:08 -0500, Eric wrote:
> >      if (s->zero_target) {
> > +        int64_t bitmap_length = DIV_ROUND_UP(s->bdev_length, s->granularity);
> > +
> >          offset = 0;
> >          bdrv_graph_co_rdlock();
> >          ret = bdrv_co_is_all_zeroes(target_bs);
> > @@ -856,6 +898,7 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
> >          if (ret < 0) {
> >              return ret;
> >          }
> > +        s->zero_bitmap = bitmap_new(bitmap_length);
> >          /*
> >           * If the destination already reads as zero, and we are not
> >           * requested to punch holes into existing zeroes, then we can
> > @@ -864,6 +907,7 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
> >          if (ret > 0 &&
> >              (target_bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP ||
> >               !bdrv_can_write_zeroes_with_unmap(target_bs))) {
> > +            bitmap_set(s->zero_bitmap, 0, bitmap_length);
> 
> when arg->mode != NEW_IMAGE_MODE_EXISTING && bdrv_has_zero_init(target_bs) is true
> in drive_mirror (This means the target image is newly created), in which case
> s->zero_target == false, we still need to execute bitmap_set(s->zero_bitmap, 0, bitmap_length)

Good catch.  I will fix that in v4.

Now that I'm thinking a bit more about it, I wonder if s->zero_target
is the right name.  We have several pieces of information feeding the
potential optimizations: are we mirroring the full image or just a
portion of it (if just a portion, pre-zeroing is wrong because we
shouldn't touch the part of the image not being mirrored), did we just
create the image (in which case it reads as zero and we can skip
pre-zeroing), did the user pass target-is-zero (later in this series,
same effect as if we just created the image), is punching zeroes
allowed (not all setups allow it; and even when it is allowed, there
are tradeoffs on whether to punch one big hole and then fill back up
or to only punch small holes as zeroes are encountered).

It's a big mess of conditionals, so I'm still trying to figure out if
there is a more sensible way to arrange the various logic bits.  With
the addition of target-is-zero, maybe it makes more sense to rename
s->zero_target to s->full_image, and to set s->full_image=true
s->target_is_zero in the arg->mode != NEW_IMAGE_MODE_EXISTING &&
bdrv_has_zero_init(target) case.

> 
> >              offset = s->bdev_length;
> >          }
> >          if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
> > @@ -875,6 +919,7 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
> >          while (offset < s->bdev_length) {
> >              int bytes = MIN(s->bdev_length - offset,
> >                              QEMU_ALIGN_DOWN(INT_MAX, s->granularity));
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 10/11] tests: Add iotest mirror-sparse for recent patches
  2025-05-01 17:34   ` Stefan Hajnoczi
@ 2025-05-01 18:00     ` Eric Blake
  0 siblings, 0 replies; 24+ messages in thread
From: Eric Blake @ 2025-05-01 18:00 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, qemu-block, vsementsov, Kevin Wolf, Hanna Reitz

On Thu, May 01, 2025 at 01:34:19PM -0400, Stefan Hajnoczi wrote:
> On Thu, Apr 24, 2025 at 07:52:10PM -0500, Eric Blake wrote:
> > +TEST_IMG="$TEST_IMG.base" _make_test_img 20M
> > +$QEMU_IO -c 'w 8M 2M' -f $IMGFMT "$TEST_IMG.base" | _filter_qemu_io
> > +
> > +_launch_qemu -machine q35 \
> 
> Does the machine type matter? This test would be more portable without
> q35.

Probably not. Copy-paste from other tests.  I'll drop it if the test
works without it.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 07/11] mirror: Skip pre-zeroing destination if it is already zero
  2025-05-01 17:33     ` Eric Blake
@ 2025-05-02  3:26       ` Sunny Zhu
  0 siblings, 0 replies; 24+ messages in thread
From: Sunny Zhu @ 2025-05-02  3:26 UTC (permalink / raw)
  To: eblake
  Cc: hreitz, jsnow, kwolf, qemu-block, qemu-devel, stefanha, sunnyzhyy,
	vsementsov

On Thu, 1 May 2025 12:33:14 -0500, Eric wrote:
> > > +         * If the destination already reads as zero, and we are not
> > > +         * requested to punch holes into existing zeroes, then we can
> > > +         * skip pre-zeroing the destination.
> > > +         */
> > > +        if (ret > 0 &&
> > > +            (target_bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP ||
> > > +             !bdrv_can_write_zeroes_with_unmap(target_bs))) {
> > > +            offset = s->bdev_length;
> > 
> > If when bdrv_can_write_zeroes_with_unmap(target_bs) == true, we prefer to
> > punch holes regardless of whether target_bs already reads as zero, then
> > execute bdrv_co_is_all_zeroes() in advance might be wasteful.
> 
> Hmm.  bdrv_co_is_all_zeroes() is supposed to be fast, but you're right
> that even faster than a syscall or two is no syscalls at all.

Indeed, bdrv_co_is_all_zeroes() is supposed to be fast, and since we are on the
management plane rather than the data plane, the impact here should be negligible.

> 
> > 
> >     if (bdrv_can_write_zeroes_with_unmap(target_bs)) {
> >         initial_zeroing();
> >     } else {
> >         ret = bdrv_co_is_all_zeroes(target_bs);
> >         ...
> >     }
> 
> That's a bigger refactoring; probably worth doing in a separate patch.
> It looks like I should probably do a v4 along those lines, to see how
> it compares.
> 
> > 
> > > +        }
> > >          if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
> > >              bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
> > >              return 0;
> > 
> > When ret > 0, We should not return directly here.
> 
> That's pre-existing, but correct.  If we can't write zeroes with
> unmap, then we mark the entire image dirty (every zero in the source
> will result in zeroes in the dest - as it was before this series); and
> we have also marked the zero bitmap (writing zeroes will be a no-op if
> the zero bitmap says that is safe).  The rest of this function has two
> purposes: finish pre-zeroing (well, there's nothing to pre-zero if we
> can't punch holes, and especially nothing to pre-zero if we already
> know the image reads as all zero), and populate the dirty bitmap (we
> just populated the entire map here, so it's not worth trying to
> populate the map with finer granularity later), so returning here is
> the right thing to do.

I apologize that my previous explanation missed the key point. What I meant is:
when bdrv_co_is_all_zeroes() > 0 && !bdrv_can_write_zeroes_with_unmap(target_bs),
we set offset = s->bdev_length;

However here, when !bdrv_can_write_zeroes_with_unmap(target_bs) is true, we directly
return, which means that offset = s->bdev_length (i.e., our logic to skip pre-zeroing
the destination) will never take effect.

The appropriate modification here should be:

    if (!ret && !bdrv_can_write_zeroes_with_unmap(target_bs)) {
        bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
        return 0;

> 
> > 
> > >          }
> > > 
> > >          s->initial_zeroing_ongoing = true;
> > > -        for (offset = 0; offset < s->bdev_length; ) {
> > > +        while (offset < s->bdev_length) {
> > >              int bytes = MIN(s->bdev_length - offset,
> > >                              QEMU_ALIGN_DOWN(INT_MAX, s->granularity));
> > 
> 



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re [PATCH v3 08/11] mirror: Skip writing zeroes when target is already zero
  2025-05-01 17:58     ` Eric Blake
@ 2025-05-02  5:43       ` Sunny Zhu
  0 siblings, 0 replies; 24+ messages in thread
From: Sunny Zhu @ 2025-05-02  5:43 UTC (permalink / raw)
  To: eblake
  Cc: hreitz, jsnow, kwolf, qemu-block, qemu-devel, stefanha, sunnyzhyy,
	vsementsov

On Thu, 1 May 2025 12:58:42 -0500, Eric wrote:
> On Thu, May 01, 2025 at 12:38:30AM +0800, Sunny Zhu wrote:
> > on Thu 24 Apr 2025 19:52:08 -0500, Eric wrote:
> > >      if (s->zero_target) {
> > > +        int64_t bitmap_length = DIV_ROUND_UP(s->bdev_length, s->granularity);
> > > +
> > >          offset = 0;
> > >          bdrv_graph_co_rdlock();
> > >          ret = bdrv_co_is_all_zeroes(target_bs);
> > > @@ -856,6 +898,7 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
> > >          if (ret < 0) {
> > >              return ret;
> > >          }
> > > +        s->zero_bitmap = bitmap_new(bitmap_length);
> > >          /*
> > >           * If the destination already reads as zero, and we are not
> > >           * requested to punch holes into existing zeroes, then we can
> > > @@ -864,6 +907,7 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
> > >          if (ret > 0 &&
> > >              (target_bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_UNMAP ||
> > >               !bdrv_can_write_zeroes_with_unmap(target_bs))) {
> > > +            bitmap_set(s->zero_bitmap, 0, bitmap_length);
> > 
> > when arg->mode != NEW_IMAGE_MODE_EXISTING && bdrv_has_zero_init(target_bs) is true
> > in drive_mirror (This means the target image is newly created), in which case
> > s->zero_target == false, we still need to execute bitmap_set(s->zero_bitmap, 0, bitmap_length)
> 
> Good catch.  I will fix that in v4.
> 
> Now that I'm thinking a bit more about it, I wonder if s->zero_target
> is the right name.  We have several pieces of information feeding the
> potential optimizations: are we mirroring the full image or just a
> portion of it (if just a portion, pre-zeroing is wrong because we
> shouldn't touch the part of the image not being mirrored), did we just
> create the image (in which case it reads as zero and we can skip
> pre-zeroing), did the user pass target-is-zero (later in this series,
> same effect as if we just created the image), is punching zeroes
> allowed (not all setups allow it; and even when it is allowed, there
> are tradeoffs on whether to punch one big hole and then fill back up
> or to only punch small holes as zeroes are encountered).
> 
> It's a big mess of conditionals, so I'm still trying to figure out if
> there is a more sensible way to arrange the various logic bits.  With
> the addition of target-is-zero, maybe it makes more sense to rename
> s->zero_target to s->full_image, and to set s->full_image=true
> s->target_is_zero in the arg->mode != NEW_IMAGE_MODE_EXISTING &&
> bdrv_has_zero_init(target) case.

Correct. The pre-zero and skip-zero optimizations are only applicable
when sync == MIRROR_SYNC_MODE_FULL. In MIRROR_SYNC_MODE_TOP scenarios,
where the source snapshot chain (excluding the top snapshot) is pre-copied
to the target, the destination image is not an all-zero image. Consequently,
we can neither perform pre-zeroing nor skip-zero operations in this case.

    if (sync == MIRROR_SYNC_MODE_TOP) {
        /* No optimization is needed. */
    } else {
        /* sync == MIRROR_SYNC_MODE_FULL */
        if (mode == NEW_IMAGE_MODE_EXISTING) {
            if (target_is_zero) {
                bitmap_set(zero_bitmap)
            } else {
                pre_zeroing;
                bitmap_set(zero_bitmap);
            }
        } else {
            if (bdrv_has_zero_init(target_bs)) {
                bitmap_set(zero_bitmap);
            } else {
                pre_zeroing;
                bitmap_set(zero_bitmap);
            }
        }
    }

We list all possible scenarios, where some code branches can be merged
during implementation.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 10/11] tests: Add iotest mirror-sparse for recent patches
  2025-04-25  0:52 ` [PATCH v3 10/11] tests: Add iotest mirror-sparse for recent patches Eric Blake
  2025-05-01 17:34   ` Stefan Hajnoczi
@ 2025-05-09 15:33   ` Eric Blake
  2025-08-01  7:54   ` Michael Tokarev
  2 siblings, 0 replies; 24+ messages in thread
From: Eric Blake @ 2025-05-09 15:33 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-block, vsementsov, stefanha, Kevin Wolf, Hanna Reitz

On Thu, Apr 24, 2025 at 07:52:10PM -0500, Eric Blake wrote:
> Prove that blockdev-mirror can now result in sparse raw destination
> files, regardless of whether the source is raw or qcow2.  By making
> this a separate test, it was possible to test effects of individual
> patches for the various pieces that all have to work together for a
> sparse mirror to be successful.
> 
> Note that ./check -file produces different job lengths than ./check
> -qcow2 (the test uses a filter to normalize); that's because when
> deciding how much of the image to be mirrored, the code looks at how
> much of the source image was allocated (for qcow2, this is only the
> written clusters; for raw, it is the entire file).  But the important
> part is that the destination file ends up smaller than 3M, rather than
> the 20M it used to be before this patch series.
> 

> +
> +filter_len() {
> +    sed -e 's/"len": [0-9]*/"len": LEN/g' \
> +        -e 's/"offset": [0-9]*/"offset": OFFSET/g'
> +}

This duplicates _filter_block_job_offset and _filter_block_job_len in
common.filter.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v3 10/11] tests: Add iotest mirror-sparse for recent patches
  2025-04-25  0:52 ` [PATCH v3 10/11] tests: Add iotest mirror-sparse for recent patches Eric Blake
  2025-05-01 17:34   ` Stefan Hajnoczi
  2025-05-09 15:33   ` Eric Blake
@ 2025-08-01  7:54   ` Michael Tokarev
  2 siblings, 0 replies; 24+ messages in thread
From: Michael Tokarev @ 2025-08-01  7:54 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: qemu-block, vsementsov, stefanha, Kevin Wolf, Hanna Reitz,
	Daniel P. Berrangé

On 25.04.2025 03:52, Eric Blake wrote:
> Prove that blockdev-mirror can now result in sparse raw destination
> files, regardless of whether the source is raw or qcow2.  By making
> this a separate test, it was possible to test effects of individual
> patches for the various pieces that all have to work together for a
> sparse mirror to be successful.
> 
> Note that ./check -file produces different job lengths than ./check
> -qcow2 (the test uses a filter to normalize); that's because when
> deciding how much of the image to be mirrored, the code looks at how
> much of the source image was allocated (for qcow2, this is only the
> written clusters; for raw, it is the entire file).  But the important
> part is that the destination file ends up smaller than 3M, rather than
> the 20M it used to be before this patch series.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>

Hi!

This test requires O_DIRECT to be available on the underlying filesystem

Without O_DIRECT support, the test fails with:

Listing only the last 100 lines from a long log.
-{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, 
"event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
...
-*** done
+Timeout waiting for capabilities on handle 0

which is not helping at all.

This hit me on debian, where automatic buildds run everything on
a tmpfs which does not support O_DIRECT, and the test failed this
way, without a way to tell what's wrong.

After adding save_stderr=y to the test, I finally were able to see
what's going on:

+qemu-system-x86_64: -blockdev {"driver":"file", "cache":{"direct":true, 
"no-flush":false},
+ 
"filename":"/build/reproducible-path/qemu-10.1.0~rc1+ds/scratch/qcow2-file-mirror-sparse/t.qcow2.base", 
"node-name":"src-file"}: Could not open 
'/build/reproducible-path/qemu-10.1.0~rc1+ds/scratch/qcow2-file-mirror-sparse/t.qcow2.base': 
filesystem does not support O_DIRECT
+Timeout waiting for capabilities on handle 0
make: *** [debian/rules:188: b/qemu/tested] Error 1


but it took me multiple rather painful iterations because each time
I had to upload new version of the source package to debian for the
builddds to pick it up (similar to what we have in the CI but without
saving the build artifacts).

This test should somehow depend on O_DIRECT availability - we already
had something similar with 9pfs which also required O_DIRECT and
failed when /tmp was on a tmpfs.

And I think generally stderr should always be visible, at least when
the test failed, - where it is especially needed to diagnose the issue.

In debian I disabled mirror-sparse test entirely (also had to do a few
iterations, because it turned out removing the test files has to be
done before ./configure run).

Thanks,

/mjt


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-08-01  7:56 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-25  0:52 [PATCH v3 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
2025-04-25  0:52 ` [PATCH v3 01/11] block: Expand block status mode from bool to flags Eric Blake
2025-04-25  0:52 ` [PATCH v3 02/11] file-posix, gluster: Handle zero block status hint better Eric Blake
2025-04-25  0:52 ` [PATCH v3 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents Eric Blake
2025-05-01 17:20   ` Stefan Hajnoczi
2025-04-25  0:52 ` [PATCH v3 04/11] block: Add new bdrv_co_is_all_zeroes() function Eric Blake
2025-05-01 17:24   ` Stefan Hajnoczi
2025-04-25  0:52 ` [PATCH v3 05/11] iotests: Improve iotest 194 to mirror data Eric Blake
2025-04-25  0:52 ` [PATCH v3 06/11] mirror: Minor refactoring Eric Blake
2025-04-25  0:52 ` [PATCH v3 07/11] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
2025-04-30 16:09   ` Sunny Zhu
2025-05-01 17:33     ` Eric Blake
2025-05-02  3:26       ` Sunny Zhu
2025-04-25  0:52 ` [PATCH v3 08/11] mirror: Skip writing zeroes when target " Eric Blake
2025-04-30 16:38   ` Re " Sunny Zhu
2025-05-01 17:58     ` Eric Blake
2025-05-02  5:43       ` Sunny Zhu
2025-04-25  0:52 ` [PATCH v3 09/11] iotests/common.rc: add disk_usage function Eric Blake
2025-04-25  0:52 ` [PATCH v3 10/11] tests: Add iotest mirror-sparse for recent patches Eric Blake
2025-05-01 17:34   ` Stefan Hajnoczi
2025-05-01 18:00     ` Eric Blake
2025-05-09 15:33   ` Eric Blake
2025-08-01  7:54   ` Michael Tokarev
2025-04-25  0:52 ` [PATCH v3 11/11] mirror: Allow QMP override to declare target already zero Eric Blake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).