qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/6] Make blockdev-mirror dest sparse in more cases
@ 2025-04-11  1:04 Eric Blake
  2025-04-11  1:04 ` [PATCH 1/6] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
                   ` (7 more replies)
  0 siblings, 8 replies; 23+ messages in thread
From: Eric Blake @ 2025-04-11  1:04 UTC (permalink / raw)
  To: qemu-devel

When mirroring images, it makes sense for the destination to be sparse
even if it was not connected with "discard":"unmap"; the only time the
destination should be fully allocated is if the user pre-allocated it,
or if the source was not sparse.

Eric Blake (6):
  mirror: Skip pre-zeroing destination if it is already zero
  file-posix: Allow lseek at offset 0 when !want_zero
  mirror: Skip writing zeroes when target is already zero
  block: Expand block status mode from bool to enum
  file-posix: Recognize blockdev-create file as starting all zero
  tests: Add iotest mirror-sparse for recent patches

 block/coroutines.h                         |   4 +-
 include/block/block-common.h               |  26 ++++
 include/block/block_int-common.h           |  25 ++--
 include/block/block_int-io.h               |   4 +-
 block/io.c                                 |  51 +++----
 block/blkdebug.c                           |   6 +-
 block/copy-before-write.c                  |   4 +-
 block/file-posix.c                         |  94 +++++++++++--
 block/gluster.c                            |   4 +-
 block/iscsi.c                              |   6 +-
 block/mirror.c                             | 100 +++++++++++---
 block/nbd.c                                |   4 +-
 block/null.c                               |   6 +-
 block/parallels.c                          |   6 +-
 block/qcow.c                               |   2 +-
 block/qcow2.c                              |   6 +-
 block/qed.c                                |   6 +-
 block/quorum.c                             |   4 +-
 block/raw-format.c                         |   4 +-
 block/rbd.c                                |   6 +-
 block/snapshot-access.c                    |   4 +-
 block/vdi.c                                |   4 +-
 block/vmdk.c                               |   2 +-
 block/vpc.c                                |   2 +-
 block/vvfat.c                              |   6 +-
 tests/unit/test-block-iothread.c           |   2 +-
 tests/qemu-iotests/194                     |  15 +-
 tests/qemu-iotests/194.out                 |   4 +-
 tests/qemu-iotests/tests/mirror-sparse     | 100 ++++++++++++++
 tests/qemu-iotests/tests/mirror-sparse.out | 153 +++++++++++++++++++++
 30 files changed, 547 insertions(+), 113 deletions(-)
 create mode 100755 tests/qemu-iotests/tests/mirror-sparse
 create mode 100644 tests/qemu-iotests/tests/mirror-sparse.out

-- 
2.49.0



^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/6] mirror: Skip pre-zeroing destination if it is already zero
  2025-04-11  1:04 [PATCH 0/6] Make blockdev-mirror dest sparse in more cases Eric Blake
@ 2025-04-11  1:04 ` Eric Blake
  2025-04-11 19:54   ` Vladimir Sementsov-Ogievskiy
  2025-04-14 16:29   ` Vladimir Sementsov-Ogievskiy
  2025-04-11  1:04 ` [PATCH 2/6] file-posix: Allow lseek at offset 0 when !want_zero Eric Blake
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 23+ messages in thread
From: Eric Blake @ 2025-04-11  1:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: John Snow, Vladimir Sementsov-Ogievskiy, Kevin Wolf, Hanna Reitz,
	open list:Block Jobs

When doing a sync=full mirroring, QMP drive-mirror requests full
zeroing if it did not just create the destination, and blockdev-mirror
requests full zeroing unconditionally.  This is because during a full
sync, we must ensure that the portions of the disk that are not
otherwise touched by the source still read as zero upon completion.

However, in mirror_dirty_init(), we were blindly assuming that if the
destination allows punching holes, we should pre-zero the entire
image; and if it does not allow punching holes, then treat the entire
source as dirty rather than mirroring just the allocated portions of
the source.  Without the ability to punch holes, this results in the
destination file being fully allocated; and even when punching holes
is supported, it causes duplicate I/O to the portions of the
destination corresponding to chunks of the source that are allocated
but read as zero.

Smarter is to avoid the pre-zeroing pass over the destination if it
can be proved the destination already reads as zero.  Note that a
later patch will then further improve things to skip writing to the
destination for parts of the image where the source is zero; but even
with just this patch, it is possible to see a difference for any BDS
that can quickly report that it already reads as zero.  Iotest 194 is
proof of this: instead of mirroring a completely sparse file, change
it to pre-populate some data.  When run with './check -file 194', the
full 1G is still allocated, but with './check -qcow2 194', only the 1M
of pre-populated data is now mirrored; this in turn requires an
additional log filter.

Note that there are still BDS layers that do not quickly report
reading as all zero; for example, the file-posix code implementation
for fast block status currently blindly reports the entire image as
allocated and non-zero without even consulting lseek(SEEK_DATA)); that
will be addressed in later patches.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/mirror.c             | 10 ++++++++--
 tests/qemu-iotests/194     | 15 +++++++++++++--
 tests/qemu-iotests/194.out |  4 ++--
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index a53582f17bb..2e1e14c8e7e 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -841,14 +841,20 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
     int64_t offset;
     BlockDriverState *bs;
     BlockDriverState *target_bs = blk_bs(s->target);
-    int ret = -1;
+    int ret;
     int64_t count;

     bdrv_graph_co_rdlock();
     bs = s->mirror_top_bs->backing->bs;
+    if (s->zero_target) {
+        ret = bdrv_co_is_zero_fast(target_bs, 0, s->bdev_length);
+    }
     bdrv_graph_co_rdunlock();

-    if (s->zero_target) {
+    if (s->zero_target && ret <= 0) {
+        if (ret < 0) {
+            return ret;
+        }
         if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
             bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
             return 0;
diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
index c0ce82dd257..814c15dfe3b 100755
--- a/tests/qemu-iotests/194
+++ b/tests/qemu-iotests/194
@@ -22,6 +22,14 @@

 import iotests

+def filter_job_event(event):
+    '''Filter len and offset in a job event'''
+    event = dict(event)
+    if event['data'].get('len', 0) == event['data'].get('offset', 1):
+        event['data']['len'] = 'LEN'
+        event['data']['offset'] = 'LEN'
+    return event
+
 iotests.script_initialize(supported_fmts=['qcow2', 'qed', 'raw'],
                           supported_platforms=['linux'])

@@ -34,6 +42,7 @@ with iotests.FilePath('source.img') as source_img_path, \

     img_size = '1G'
     iotests.qemu_img_create('-f', iotests.imgfmt, source_img_path, img_size)
+    iotests.qemu_io('-f', iotests.imgfmt, '-c', 'write 512M 1M', source_img_path)
     iotests.qemu_img_create('-f', iotests.imgfmt, dest_img_path, img_size)

     iotests.log('Launching VMs...')
@@ -61,7 +70,8 @@ with iotests.FilePath('source.img') as source_img_path, \

     iotests.log('Waiting for `drive-mirror` to complete...')
     iotests.log(source_vm.event_wait('BLOCK_JOB_READY'),
-                filters=[iotests.filter_qmp_event])
+                filters=[iotests.filter_qmp_event,
+                         filter_job_event])

     iotests.log('Starting migration...')
     capabilities = [{'capability': 'events', 'state': True},
@@ -87,7 +97,8 @@ with iotests.FilePath('source.img') as source_img_path, \

     while True:
         event2 = source_vm.event_wait('BLOCK_JOB_COMPLETED')
-        iotests.log(event2, filters=[iotests.filter_qmp_event])
+        iotests.log(event2, filters=[iotests.filter_qmp_event,
+                                     filter_job_event])
         if event2['event'] == 'BLOCK_JOB_COMPLETED':
             iotests.log('Stopping the NBD server on destination...')
             iotests.log(dest_vm.qmp('nbd-server-stop'))
diff --git a/tests/qemu-iotests/194.out b/tests/qemu-iotests/194.out
index 6940e809cde..79b961723d8 100644
--- a/tests/qemu-iotests/194.out
+++ b/tests/qemu-iotests/194.out
@@ -7,7 +7,7 @@ Launching NBD server on destination...
 Starting `drive-mirror` on source...
 {"return": {}}
 Waiting for `drive-mirror` to complete...
-{"data": {"device": "mirror-job0", "len": 1073741824, "offset": 1073741824, "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"device": "mirror-job0", "len": "LEN", "offset": "LEN", "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
 Starting migration...
 {"return": {}}
 {"execute": "migrate-start-postcopy", "arguments": {}}
@@ -18,7 +18,7 @@ Starting migration...
 {"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
 Gracefully ending the `drive-mirror` job on source...
 {"return": {}}
-{"data": {"device": "mirror-job0", "len": 1073741824, "offset": 1073741824, "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"device": "mirror-job0", "len": "LEN", "offset": "LEN", "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
 Stopping the NBD server on destination...
 {"return": {}}
 Wait for migration completion on target...
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/6] file-posix: Allow lseek at offset 0 when !want_zero
  2025-04-11  1:04 [PATCH 0/6] Make blockdev-mirror dest sparse in more cases Eric Blake
  2025-04-11  1:04 ` [PATCH 1/6] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
@ 2025-04-11  1:04 ` Eric Blake
  2025-04-14 17:05   ` Vladimir Sementsov-Ogievskiy
  2025-04-11  1:04 ` [PATCH 3/6] mirror: Skip writing zeroes when target is already zero Eric Blake
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Eric Blake @ 2025-04-11  1:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Hanna Reitz, open list:raw

The 'want_zero' parameter to raw_co_block_status() was added so that
we can avoid potentially time-consuming lseek(SEEK_DATA) calls
throughout the file (working around poor filesystems that have O(n)
rather than O(1) extent probing).  But when it comes to learning if a
file is completely sparse (for example, it was just created), always
claiming that a file is all data without even checking offset 0 breaks
what would otherwise be attempts at useful optimizations for a
known-zero mirror destination.

Note that this allows file-posix to report a file as completely zero
if it was externally created (such as via 'truncate --size=$n file')
as entirely sparse; however, it does NOT work for files created
internally by blockdev-create.  That's because blockdev-create
intentionally does a sequence of truncate(0), truncate(size),
allocate_first_block(), in order to make it possible for gluster on
XFS to probe the sector size for direct I/O (which doesn't work if the
first block is sparse).  That will be addressed in a later patch.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/file-posix.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 56d1972d156..67e83528cf5 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3217,7 +3217,14 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
         return ret;
     }

-    if (!want_zero) {
+    /*
+     * If want_zero is clear, then the caller wants speed over
+     * accuracy, and the only place where SEEK_DATA should be
+     * attempted is at the start of the file to learn if the file has
+     * any data at all (anywhere else, just blindly claim the entire
+     * file is data).
+     */
+    if (!want_zero && offset) {
         *pnum = bytes;
         *map = offset;
         *file = bs;
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/6] mirror: Skip writing zeroes when target is already zero
  2025-04-11  1:04 [PATCH 0/6] Make blockdev-mirror dest sparse in more cases Eric Blake
  2025-04-11  1:04 ` [PATCH 1/6] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
  2025-04-11  1:04 ` [PATCH 2/6] file-posix: Allow lseek at offset 0 when !want_zero Eric Blake
@ 2025-04-11  1:04 ` Eric Blake
  2025-04-15 15:59   ` Vladimir Sementsov-Ogievskiy
  2025-04-11  1:04 ` [PATCH 4/6] block: Expand block status mode from bool to enum Eric Blake
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 23+ messages in thread
From: Eric Blake @ 2025-04-11  1:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: John Snow, Vladimir Sementsov-Ogievskiy, Kevin Wolf, Hanna Reitz,
	open list:Block Jobs

When mirroring, the goal is to ensure that the destination reads the
same as the source; this goal is met whether the destination is sparse
or fully-allocated.  However, if the destination cannot efficiently
write zeroes, then any time the mirror operation wants to copy zeroes
from the source to the destination (either during the background over
sparse regions when doing a full mirror, or in the foreground when the
guest actively writes zeroes), we were causing the destination to
fully allocate that portion of the disk, even if it already read as
zeroes.

We could just teach mirror_co_zero() to do a block_status() probe of
the destination, and skip the zeroes if the destination already reads
as zero, but we know from past experience that block_status() calls
are not always cheap (tmpfs, anyone?).  So this patch takes a slightly
different approach: any time we have to transfer the full image,
mirror_dirty_init() is _already_ doing a pre-zero pass over the entire
destination.  Therefore, if we track which clusters of the destination
are zero at any given moment, we don't have to do a block_status()
call on the destination, but can instead just refer to the zero bitmap
associated with the job.

With this patch, if I externally create a raw sparse destination file
('truncate --size=$N dst.raw'), connect it with QMP 'blockdev-add'
while leaving it at the default "discard":"ignore", then run QMP
'blockdev-mirror' with "sync":"full", the destination remains sparse
rather than fully allocated.

However, a raw destination file created with 'blockdev-create' still
gets fully allocated, because more work is needed in file-posix to
still identify reads-as-zeroes even when the first 4k has to be
allocated to make alignment probing work.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/mirror.c | 94 +++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 78 insertions(+), 16 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 2e1e14c8e7e..98da5a6dc27 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -73,6 +73,7 @@ typedef struct MirrorBlockJob {
     size_t buf_size;
     int64_t bdev_length;
     unsigned long *cow_bitmap;
+    unsigned long *zero_bitmap;
     BdrvDirtyBitmap *dirty_bitmap;
     BdrvDirtyBitmapIter *dbi;
     uint8_t *buf;
@@ -408,15 +409,32 @@ static void coroutine_fn mirror_co_read(void *opaque)
 static void coroutine_fn mirror_co_zero(void *opaque)
 {
     MirrorOp *op = opaque;
-    int ret;
+    bool write_needed = true;
+    int ret = 0;

     op->s->in_flight++;
     op->s->bytes_in_flight += op->bytes;
     *op->bytes_handled = op->bytes;
     op->is_in_flight = true;

-    ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
-                               op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
+    if (op->s->zero_bitmap) {
+        unsigned long last = (op->offset + op->bytes) / op->s->granularity;
+        assert(QEMU_IS_ALIGNED(op->offset, op->s->granularity));
+        assert(QEMU_IS_ALIGNED(op->bytes, op->s->granularity) ||
+               op->offset + op->bytes == op->s->bdev_length);
+        if (find_next_zero_bit(op->s->zero_bitmap, last,
+                               op->offset / op->s->granularity) == last) {
+            write_needed = false;
+        }
+    }
+    if (write_needed) {
+        ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
+                                   op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
+    }
+    if (ret >= 0 && op->s->zero_bitmap) {
+        bitmap_set(op->s->zero_bitmap, op->offset / op->s->granularity,
+                   op->bytes / op->s->granularity);
+    }
     mirror_write_complete(op, ret);
 }

@@ -441,6 +459,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
     Coroutine *co;
     int64_t bytes_handled = -1;

+    assert(QEMU_IS_ALIGNED(offset, s->granularity));
+    assert(QEMU_IS_ALIGNED(bytes, s->granularity) ||
+           offset + bytes == s->bdev_length);
     op = g_new(MirrorOp, 1);
     *op = (MirrorOp){
         .s              = s,
@@ -452,12 +473,21 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,

     switch (mirror_method) {
     case MIRROR_METHOD_COPY:
+        if (s->zero_bitmap) {
+            bitmap_clear(s->zero_bitmap, offset / s->granularity,
+                         bytes / s->granularity);
+        }
         co = qemu_coroutine_create(mirror_co_read, op);
         break;
     case MIRROR_METHOD_ZERO:
+        /* s->zero_bitmap handled in mirror_co_zero */
         co = qemu_coroutine_create(mirror_co_zero, op);
         break;
     case MIRROR_METHOD_DISCARD:
+        if (s->zero_bitmap) {
+            bitmap_clear(s->zero_bitmap, offset / s->granularity,
+                         bytes / s->granularity);
+        }
         co = qemu_coroutine_create(mirror_co_discard, op);
         break;
     default:
@@ -851,10 +881,17 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
     }
     bdrv_graph_co_rdunlock();

-    if (s->zero_target && ret <= 0) {
+    if (s->zero_target) {
+        int64_t length;
+
         if (ret < 0) {
             return ret;
         }
+        length = DIV_ROUND_UP(s->bdev_length, s->granularity);
+        s->zero_bitmap = bitmap_new(length);
+        if (ret > 0) {
+            bitmap_set(s->zero_bitmap, 0, length);
+        }
         if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
             bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
             return 0;
@@ -1169,6 +1206,7 @@ immediate_exit:
     assert(s->in_flight == 0);
     qemu_vfree(s->buf);
     g_free(s->cow_bitmap);
+    g_free(s->zero_bitmap);
     g_free(s->in_flight_bitmap);
     bdrv_dirty_iter_free(s->dbi);

@@ -1347,7 +1385,8 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
 {
     int ret;
     size_t qiov_offset = 0;
-    int64_t bitmap_offset, bitmap_end;
+    int64_t dirty_bitmap_offset, dirty_bitmap_end;
+    int64_t zero_bitmap_offset, zero_bitmap_end;

     if (!QEMU_IS_ALIGNED(offset, job->granularity) &&
         bdrv_dirty_bitmap_get(job->dirty_bitmap, offset))
@@ -1391,31 +1430,54 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
     }

     /*
-     * Tails are either clean or shrunk, so for bitmap resetting
-     * we safely align the range down.
+     * Tails are either clean or shrunk, so for dirty bitmap resetting
+     * we safely align the range down.  But for zero bitmap, round range
+     * up for checking or clearing, and down for setting.
      */
-    bitmap_offset = QEMU_ALIGN_UP(offset, job->granularity);
-    bitmap_end = QEMU_ALIGN_DOWN(offset + bytes, job->granularity);
-    if (bitmap_offset < bitmap_end) {
-        bdrv_reset_dirty_bitmap(job->dirty_bitmap, bitmap_offset,
-                                bitmap_end - bitmap_offset);
+    dirty_bitmap_offset = QEMU_ALIGN_UP(offset, job->granularity);
+    dirty_bitmap_end = QEMU_ALIGN_DOWN(offset + bytes, job->granularity);
+    if (dirty_bitmap_offset < dirty_bitmap_end) {
+        bdrv_reset_dirty_bitmap(job->dirty_bitmap, dirty_bitmap_offset,
+                                dirty_bitmap_end - dirty_bitmap_offset);
     }
+    zero_bitmap_offset = offset / job->granularity;
+    zero_bitmap_end = DIV_ROUND_UP(offset + bytes, job->granularity);

     job_progress_increase_remaining(&job->common.job, bytes);
     job->active_write_bytes_in_flight += bytes;

     switch (method) {
     case MIRROR_METHOD_COPY:
+        if (job->zero_bitmap) {
+            bitmap_clear(job->zero_bitmap, zero_bitmap_offset,
+                         zero_bitmap_end - zero_bitmap_offset);
+        }
         ret = blk_co_pwritev_part(job->target, offset, bytes,
                                   qiov, qiov_offset, flags);
         break;

     case MIRROR_METHOD_ZERO:
+        if (job->zero_bitmap) {
+            if (find_next_zero_bit(job->zero_bitmap, zero_bitmap_end,
+                                   zero_bitmap_offset) == zero_bitmap_end) {
+                ret = 0;
+                break;
+            }
+        }
         assert(!qiov);
         ret = blk_co_pwrite_zeroes(job->target, offset, bytes, flags);
+        if (job->zero_bitmap && ret >= 0) {
+            bitmap_set(job->zero_bitmap, dirty_bitmap_offset / job->granularity,
+                       (dirty_bitmap_end - dirty_bitmap_offset) /
+                       job->granularity);
+        }
         break;

     case MIRROR_METHOD_DISCARD:
+        if (job->zero_bitmap) {
+            bitmap_clear(job->zero_bitmap, zero_bitmap_offset,
+                         zero_bitmap_end - zero_bitmap_offset);
+        }
         assert(!qiov);
         ret = blk_co_pdiscard(job->target, offset, bytes);
         break;
@@ -1436,10 +1498,10 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
          * at function start, and they must be still dirty, as we've locked
          * the region for in-flight op.
          */
-        bitmap_offset = QEMU_ALIGN_DOWN(offset, job->granularity);
-        bitmap_end = QEMU_ALIGN_UP(offset + bytes, job->granularity);
-        bdrv_set_dirty_bitmap(job->dirty_bitmap, bitmap_offset,
-                              bitmap_end - bitmap_offset);
+        dirty_bitmap_offset = QEMU_ALIGN_DOWN(offset, job->granularity);
+        dirty_bitmap_end = QEMU_ALIGN_UP(offset + bytes, job->granularity);
+        bdrv_set_dirty_bitmap(job->dirty_bitmap, dirty_bitmap_offset,
+                              dirty_bitmap_end - dirty_bitmap_offset);
         qatomic_set(&job->actively_synced, false);

         action = mirror_error_action(job, false, -ret);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/6] block: Expand block status mode from bool to enum
  2025-04-11  1:04 [PATCH 0/6] Make blockdev-mirror dest sparse in more cases Eric Blake
                   ` (2 preceding siblings ...)
  2025-04-11  1:04 ` [PATCH 3/6] mirror: Skip writing zeroes when target is already zero Eric Blake
@ 2025-04-11  1:04 ` Eric Blake
  2025-04-11  1:04 ` [PATCH 5/6] file-posix: Recognize blockdev-create file as starting all zero Eric Blake
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Eric Blake @ 2025-04-11  1:04 UTC (permalink / raw)
  To: qemu-devel
  Cc: Kevin Wolf, Hanna Reitz, John Snow, Vladimir Sementsov-Ogievskiy,
	Stefan Hajnoczi, Fam Zheng, Ronnie Sahlberg, Paolo Bonzini,
	Peter Lieven, Denis V. Lunev, Alberto Garcia, Ilya Dryomov,
	Stefan Weil, open list:blkdebug, open list:GLUSTER

This patch is purely mechanical, changing bool want_zero into the new
enum BlockStatusMode.  As of this patch, all implementations are
unchanged (the old want_zero==true is now mode==BDRV_BSTAT_PRECISE),
but the callers in io.c are set up so that future patches will be able
to differente between allocation and zero in implementations that care
about the more-specific hint.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/coroutines.h               |  4 +--
 include/block/block-common.h     | 26 ++++++++++++++++
 include/block/block_int-common.h | 25 +++++++++-------
 include/block/block_int-io.h     |  4 +--
 block/io.c                       | 51 ++++++++++++++++----------------
 block/blkdebug.c                 |  6 ++--
 block/copy-before-write.c        |  4 +--
 block/file-posix.c               |  6 ++--
 block/gluster.c                  |  4 +--
 block/iscsi.c                    |  6 ++--
 block/nbd.c                      |  4 +--
 block/null.c                     |  6 ++--
 block/parallels.c                |  6 ++--
 block/qcow.c                     |  2 +-
 block/qcow2.c                    |  6 ++--
 block/qed.c                      |  6 ++--
 block/quorum.c                   |  4 +--
 block/raw-format.c               |  4 +--
 block/rbd.c                      |  6 ++--
 block/snapshot-access.c          |  4 +--
 block/vdi.c                      |  4 +--
 block/vmdk.c                     |  2 +-
 block/vpc.c                      |  2 +-
 block/vvfat.c                    |  6 ++--
 tests/unit/test-block-iothread.c |  2 +-
 25 files changed, 115 insertions(+), 85 deletions(-)

diff --git a/block/coroutines.h b/block/coroutines.h
index 79e5efbf752..c8323aa67e6 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -47,7 +47,7 @@ int coroutine_fn GRAPH_RDLOCK
 bdrv_co_common_block_status_above(BlockDriverState *bs,
                                   BlockDriverState *base,
                                   bool include_base,
-                                  bool want_zero,
+                                  enum BlockStatusMode mode,
                                   int64_t offset,
                                   int64_t bytes,
                                   int64_t *pnum,
@@ -78,7 +78,7 @@ int co_wrapper_mixed_bdrv_rdlock
 bdrv_common_block_status_above(BlockDriverState *bs,
                                BlockDriverState *base,
                                bool include_base,
-                               bool want_zero,
+                               enum BlockStatusMode mode,
                                int64_t offset,
                                int64_t bytes,
                                int64_t *pnum,
diff --git a/include/block/block-common.h b/include/block/block-common.h
index 0b831ef87b1..619e75b9c8d 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -508,6 +508,32 @@ enum BdrvChildRoleBits {
                               | BDRV_CHILD_PRIMARY,
 };

+/* Modes for block status calls */
+enum BlockStatusMode {
+    /*
+     * Status should be as accurate as possible: _OFFSET_VALID
+     * and_OFFSET_ZERO should each be set where efficiently possible,
+     * extents may be smaller, and iteration through the entire block
+     * device may take more calls.
+     */
+    BDRV_BSTAT_PRECISE,
+
+    /*
+     * The caller is primarily concerned about overall allocation:
+     * favor larger *pnum, perhaps by coalescing extents and reporting
+     * _DATA instead of _ZERO, and without needing to read data or
+     * bothering with _OFFSET_VALID.
+     */
+    BDRV_BSTAT_ALLOCATED,
+
+    /*
+     * The caller is primarily concerned about whether the device
+     * reads as zero: favor a result of _ZERO, even if it requires
+     * reading a few sectors to verify, without needing _OFFSET_VALID.
+     */
+    BDRV_BSTAT_ZERO,
+};
+
 /* Mask of BdrvChildRoleBits values */
 typedef unsigned int BdrvChildRole;

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index ebb4e56a503..1fd94b2b568 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -610,13 +610,16 @@ struct BlockDriver {
      * and/or BDRV_BLOCK_RAW; if the current layer defers to a backing
      * layer, the result should be 0 (and not BDRV_BLOCK_ZERO).  See
      * block.h for the overall meaning of the bits.  As a hint, the
-     * flag want_zero is true if the caller cares more about precise
-     * mappings (favor accurate _OFFSET_VALID/_ZERO) or false for
-     * overall allocation (favor larger *pnum, perhaps by reporting
-     * _DATA instead of _ZERO).  The block layer guarantees input
-     * clamped to bdrv_getlength() and aligned to request_alignment,
-     * as well as non-NULL pnum, map, and file; in turn, the driver
-     * must return an error or set pnum to an aligned non-zero value.
+     * flag @mode is BDRV_BSTAT_PRECISE if the caller cares more about
+     * precise mappings (favor accurate _OFFSET_VALID/_ZERO),
+     * BDRV_BSTAT_ALLOCATED for overall allocation (favor larger
+     * *pnum, perhaps by reporting _DATA instead of _ZERO), or
+     * BDRV_BSTAT_ZERO for overall reads-as-zero (favor _ZERO, even if
+     * it requires reading a few sectors to verify).  The block layer
+     * guarantees input clamped to bdrv_getlength() and aligned to
+     * request_alignment, as well as non-NULL pnum, map, and file; in
+     * turn, the driver must return an error or set pnum to an aligned
+     * non-zero value.
      *
      * Note that @bytes is just a hint on how big of a region the
      * caller wants to inspect.  It is not a limit on *pnum.
@@ -628,8 +631,8 @@ struct BlockDriver {
      * to clamping *pnum for return to its caller.
      */
     int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_block_status)(
-        BlockDriverState *bs,
-        bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
+        BlockDriverState *bs, enum BlockStatusMode mode,
+        int64_t offset, int64_t bytes, int64_t *pnum,
         int64_t *map, BlockDriverState **file);

     /*
@@ -653,8 +656,8 @@ struct BlockDriver {
         QEMUIOVector *qiov, size_t qiov_offset);

     int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_snapshot_block_status)(
-        BlockDriverState *bs, bool want_zero, int64_t offset, int64_t bytes,
-        int64_t *pnum, int64_t *map, BlockDriverState **file);
+        BlockDriverState *bs, enum BlockStatusMode mode, int64_t offset,
+        int64_t bytes, int64_t *pnum, int64_t *map, BlockDriverState **file);

     int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_pdiscard_snapshot)(
         BlockDriverState *bs, int64_t offset, int64_t bytes);
diff --git a/include/block/block_int-io.h b/include/block/block_int-io.h
index 4a7cf2b4fdc..e019e81fa0c 100644
--- a/include/block/block_int-io.h
+++ b/include/block/block_int-io.h
@@ -38,8 +38,8 @@
 int coroutine_fn GRAPH_RDLOCK bdrv_co_preadv_snapshot(BdrvChild *child,
     int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t qiov_offset);
 int coroutine_fn GRAPH_RDLOCK bdrv_co_snapshot_block_status(
-    BlockDriverState *bs, bool want_zero, int64_t offset, int64_t bytes,
-    int64_t *pnum, int64_t *map, BlockDriverState **file);
+    BlockDriverState *bs, enum BlockStatusMode mode, int64_t offset,
+    int64_t bytes, int64_t *pnum, int64_t *map, BlockDriverState **file);
 int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard_snapshot(BlockDriverState *bs,
     int64_t offset, int64_t bytes);

diff --git a/block/io.c b/block/io.c
index 1ba8d1aeea1..73c96084e62 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2364,10 +2364,8 @@ int bdrv_flush_all(void)
  * Drivers not implementing the functionality are assumed to not support
  * backing files, hence all their sectors are reported as allocated.
  *
- * If 'want_zero' is true, the caller is querying for mapping
- * purposes, with a focus on valid BDRV_BLOCK_OFFSET_VALID, _DATA, and
- * _ZERO where possible; otherwise, the result favors larger 'pnum',
- * with a focus on accurate BDRV_BLOCK_ALLOCATED.
+ * 'mode' serves as a hint as to which results are favored; see enum
+ * BlockStatusMode for details of the supported modes.
  *
  * If 'offset' is beyond the end of the disk image the return value is
  * BDRV_BLOCK_EOF and 'pnum' is set to 0.
@@ -2387,7 +2385,7 @@ int bdrv_flush_all(void)
  * set to the host mapping and BDS corresponding to the guest offset.
  */
 static int coroutine_fn GRAPH_RDLOCK
-bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
+bdrv_co_do_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
                         int64_t offset, int64_t bytes,
                         int64_t *pnum, int64_t *map, BlockDriverState **file)
 {
@@ -2476,7 +2474,7 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
             local_file = bs;
             local_map = aligned_offset;
         } else {
-            ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
+            ret = bs->drv->bdrv_co_block_status(bs, mode, aligned_offset,
                                                 aligned_bytes, pnum, &local_map,
                                                 &local_file);

@@ -2488,10 +2486,10 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
              * the cache requires an RCU update, so double check here to avoid
              * such an update if possible.
              *
-             * Check want_zero, because we only want to update the cache when we
+             * Check mode, because we only want to update the cache when we
              * have accurate information about what is zero and what is data.
              */
-            if (want_zero &&
+            if (mode == BDRV_BSTAT_PRECISE &&
                 ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
                 QLIST_EMPTY(&bs->children))
             {
@@ -2548,7 +2546,7 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,

     if (ret & BDRV_BLOCK_RAW) {
         assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
-        ret = bdrv_co_do_block_status(local_file, want_zero, local_map,
+        ret = bdrv_co_do_block_status(local_file, mode, local_map,
                                       *pnum, pnum, &local_map, &local_file);
         goto out;
     }
@@ -2560,7 +2558,7 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,

         if (!cow_bs) {
             ret |= BDRV_BLOCK_ZERO;
-        } else if (want_zero) {
+        } else if (mode == BDRV_BSTAT_PRECISE) {
             int64_t size2 = bdrv_co_getlength(cow_bs);

             if (size2 >= 0 && offset >= size2) {
@@ -2569,14 +2567,14 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
         }
     }

-    if (want_zero && ret & BDRV_BLOCK_RECURSE &&
+    if (mode == BDRV_BSTAT_PRECISE && ret & BDRV_BLOCK_RECURSE &&
         local_file && local_file != bs &&
         (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
         (ret & BDRV_BLOCK_OFFSET_VALID)) {
         int64_t file_pnum;
         int ret2;

-        ret2 = bdrv_co_do_block_status(local_file, want_zero, local_map,
+        ret2 = bdrv_co_do_block_status(local_file, mode, local_map,
                                        *pnum, &file_pnum, NULL, NULL);
         if (ret2 >= 0) {
             /* Ignore errors.  This is just providing extra information, it
@@ -2627,7 +2625,7 @@ int coroutine_fn
 bdrv_co_common_block_status_above(BlockDriverState *bs,
                                   BlockDriverState *base,
                                   bool include_base,
-                                  bool want_zero,
+                                  enum BlockStatusMode mode,
                                   int64_t offset,
                                   int64_t bytes,
                                   int64_t *pnum,
@@ -2654,7 +2652,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
         return 0;
     }

-    ret = bdrv_co_do_block_status(bs, want_zero, offset, bytes, pnum,
+    ret = bdrv_co_do_block_status(bs, mode, offset, bytes, pnum,
                                   map, file);
     ++*depth;
     if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
@@ -2671,7 +2669,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
     for (p = bdrv_filter_or_cow_bs(bs); include_base || p != base;
          p = bdrv_filter_or_cow_bs(p))
     {
-        ret = bdrv_co_do_block_status(p, want_zero, offset, bytes, pnum,
+        ret = bdrv_co_do_block_status(p, mode, offset, bytes, pnum,
                                       map, file);
         ++*depth;
         if (ret < 0) {
@@ -2734,7 +2732,8 @@ int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
                                             BlockDriverState **file)
 {
     IO_CODE();
-    return bdrv_co_common_block_status_above(bs, base, false, true, offset,
+    return bdrv_co_common_block_status_above(bs, base, false,
+                                             BDRV_BSTAT_PRECISE, offset,
                                              bytes, pnum, map, file, NULL);
 }

@@ -2765,8 +2764,9 @@ int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
         return 1;
     }

-    ret = bdrv_co_common_block_status_above(bs, NULL, false, false, offset,
-                                            bytes, &pnum, NULL, NULL, NULL);
+    ret = bdrv_co_common_block_status_above(bs, NULL, false, BDRV_BSTAT_ZERO,
+                                            offset, bytes, &pnum, NULL, NULL,
+                                            NULL);

     if (ret < 0) {
         return ret;
@@ -2782,9 +2782,9 @@ int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t offset,
     int64_t dummy;
     IO_CODE();

-    ret = bdrv_co_common_block_status_above(bs, bs, true, false, offset,
-                                            bytes, pnum ? pnum : &dummy, NULL,
-                                            NULL, NULL);
+    ret = bdrv_co_common_block_status_above(bs, bs, true, BDRV_BSTAT_ALLOCATED,
+                                            offset, bytes, pnum ? pnum : &dummy,
+                                            NULL, NULL, NULL);
     if (ret < 0) {
         return ret;
     }
@@ -2817,7 +2817,8 @@ int coroutine_fn bdrv_co_is_allocated_above(BlockDriverState *bs,
     int ret;
     IO_CODE();

-    ret = bdrv_co_common_block_status_above(bs, base, include_base, false,
+    ret = bdrv_co_common_block_status_above(bs, base, include_base,
+                                            BDRV_BSTAT_ALLOCATED,
                                             offset, bytes, pnum, NULL, NULL,
                                             &depth);
     if (ret < 0) {
@@ -3709,8 +3710,8 @@ bdrv_co_preadv_snapshot(BdrvChild *child, int64_t offset, int64_t bytes,
 }

 int coroutine_fn
-bdrv_co_snapshot_block_status(BlockDriverState *bs,
-                              bool want_zero, int64_t offset, int64_t bytes,
+bdrv_co_snapshot_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+                              int64_t offset, int64_t bytes,
                               int64_t *pnum, int64_t *map,
                               BlockDriverState **file)
 {
@@ -3728,7 +3729,7 @@ bdrv_co_snapshot_block_status(BlockDriverState *bs,
     }

     bdrv_inc_in_flight(bs);
-    ret = drv->bdrv_co_snapshot_block_status(bs, want_zero, offset, bytes,
+    ret = drv->bdrv_co_snapshot_block_status(bs, mode, offset, bytes,
                                              pnum, map, file);
     bdrv_dec_in_flight(bs);

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 1c1967f8e0a..f3eba9e6f27 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -751,9 +751,9 @@ blkdebug_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
 }

 static int coroutine_fn GRAPH_RDLOCK
-blkdebug_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
-                         int64_t bytes, int64_t *pnum, int64_t *map,
-                         BlockDriverState **file)
+blkdebug_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+                         int64_t offset, int64_t bytes, int64_t *pnum,
+                         int64_t *map, BlockDriverState **file)
 {
     int err;

diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index fd470f5f926..e75e6925e50 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -291,8 +291,8 @@ cbw_co_preadv_snapshot(BlockDriverState *bs, int64_t offset, int64_t bytes,
 }

 static int coroutine_fn GRAPH_RDLOCK
-cbw_co_snapshot_block_status(BlockDriverState *bs,
-                             bool want_zero, int64_t offset, int64_t bytes,
+cbw_co_snapshot_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+                             int64_t offset, int64_t bytes,
                              int64_t *pnum, int64_t *map,
                              BlockDriverState **file)
 {
diff --git a/block/file-posix.c b/block/file-posix.c
index 67e83528cf5..dcf906a6a7c 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3201,7 +3201,7 @@ static int find_allocation(BlockDriverState *bs, off_t start,
  * well exceed it.
  */
 static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
-                                            bool want_zero,
+                                            enum BlockStatusMode mode,
                                             int64_t offset,
                                             int64_t bytes, int64_t *pnum,
                                             int64_t *map,
@@ -3218,13 +3218,13 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
     }

     /*
-     * If want_zero is clear, then the caller wants speed over
+     * If mode != PRECISE, then the caller wants speed over
      * accuracy, and the only place where SEEK_DATA should be
      * attempted is at the start of the file to learn if the file has
      * any data at all (anywhere else, just blindly claim the entire
      * file is data).
      */
-    if (!want_zero && offset) {
+    if (mode != BDRV_BSTAT_PRECISE && offset) {
         *pnum = bytes;
         *map = offset;
         *file = bs;
diff --git a/block/gluster.c b/block/gluster.c
index c6d25ae7335..f5ee3cdcc1f 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1465,7 +1465,7 @@ exit:
  * (Based on raw_co_block_status() from file-posix.c.)
  */
 static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
-                                                     bool want_zero,
+                                                     enum BlockStatusMode mode,
                                                      int64_t offset,
                                                      int64_t bytes,
                                                      int64_t *pnum,
@@ -1482,7 +1482,7 @@ static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
         return ret;
     }

-    if (!want_zero) {
+    if (mode != BDRV_BSTAT_PRECISE) {
         *pnum = bytes;
         *map = offset;
         *file = bs;
diff --git a/block/iscsi.c b/block/iscsi.c
index 2f0f4dac097..c7b425597f0 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -694,9 +694,9 @@ out_unlock:


 static int coroutine_fn iscsi_co_block_status(BlockDriverState *bs,
-                                              bool want_zero, int64_t offset,
-                                              int64_t bytes, int64_t *pnum,
-                                              int64_t *map,
+                                              enum BlockStatusMode mode,
+                                              int64_t offset, int64_t bytes,
+                                              int64_t *pnum, int64_t *map,
                                               BlockDriverState **file)
 {
     IscsiLun *iscsilun = bs->opaque;
diff --git a/block/nbd.c b/block/nbd.c
index 887841bc813..591dedde62e 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1397,8 +1397,8 @@ nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
 }

 static int coroutine_fn GRAPH_RDLOCK nbd_client_co_block_status(
-        BlockDriverState *bs, bool want_zero, int64_t offset, int64_t bytes,
-        int64_t *pnum, int64_t *map, BlockDriverState **file)
+        BlockDriverState *bs, enum BlockStatusMode mode, int64_t offset,
+        int64_t bytes, int64_t *pnum, int64_t *map, BlockDriverState **file)
 {
     int ret, request_ret;
     NBDExtent64 extent = { 0 };
diff --git a/block/null.c b/block/null.c
index dc0b1fdbd9b..66470787cfd 100644
--- a/block/null.c
+++ b/block/null.c
@@ -227,9 +227,9 @@ static int null_reopen_prepare(BDRVReopenState *reopen_state,
 }

 static int coroutine_fn null_co_block_status(BlockDriverState *bs,
-                                             bool want_zero, int64_t offset,
-                                             int64_t bytes, int64_t *pnum,
-                                             int64_t *map,
+                                             enum BlockStatusMode mode,
+                                             int64_t offset, int64_t bytes,
+                                             int64_t *pnum, int64_t *map,
                                              BlockDriverState **file)
 {
     BDRVNullState *s = bs->opaque;
diff --git a/block/parallels.c b/block/parallels.c
index 347ca127f34..93b42b47239 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -416,9 +416,9 @@ parallels_co_flush_to_os(BlockDriverState *bs)
 }

 static int coroutine_fn GRAPH_RDLOCK
-parallels_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
-                          int64_t bytes, int64_t *pnum, int64_t *map,
-                          BlockDriverState **file)
+parallels_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+                          int64_t offset, int64_t bytes, int64_t *pnum,
+                          int64_t *map, BlockDriverState **file)
 {
     BDRVParallelsState *s = bs->opaque;
     int count;
diff --git a/block/qcow.c b/block/qcow.c
index da8ad4d2430..de7fb42c51c 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -530,7 +530,7 @@ get_cluster_offset(BlockDriverState *bs, uint64_t offset, int allocate,
 }

 static int coroutine_fn GRAPH_RDLOCK
-qcow_co_block_status(BlockDriverState *bs, bool want_zero,
+qcow_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
                      int64_t offset, int64_t bytes, int64_t *pnum,
                      int64_t *map, BlockDriverState **file)
 {
diff --git a/block/qcow2.c b/block/qcow2.c
index 7774e7f0909..14fa1c00df1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2141,9 +2141,9 @@ static void qcow2_join_options(QDict *options, QDict *old_options)
 }

 static int coroutine_fn GRAPH_RDLOCK
-qcow2_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
-                      int64_t count, int64_t *pnum, int64_t *map,
-                      BlockDriverState **file)
+qcow2_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+                      int64_t offset, int64_t count, int64_t *pnum,
+                      int64_t *map, BlockDriverState **file)
 {
     BDRVQcow2State *s = bs->opaque;
     uint64_t host_offset;
diff --git a/block/qed.c b/block/qed.c
index ac24449ffb3..6e57859d05f 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -833,9 +833,9 @@ fail:
 }

 static int coroutine_fn GRAPH_RDLOCK
-bdrv_qed_co_block_status(BlockDriverState *bs, bool want_zero, int64_t pos,
-                         int64_t bytes, int64_t *pnum, int64_t *map,
-                         BlockDriverState **file)
+bdrv_qed_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+                         int64_t pos, int64_t bytes, int64_t *pnum,
+                         int64_t *map, BlockDriverState **file)
 {
     BDRVQEDState *s = bs->opaque;
     size_t len = MIN(bytes, SIZE_MAX);
diff --git a/block/quorum.c b/block/quorum.c
index 30747a6df93..97091136fcb 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1226,7 +1226,7 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
  * region contains zeroes, and BDRV_BLOCK_DATA otherwise.
  */
 static int coroutine_fn GRAPH_RDLOCK
-quorum_co_block_status(BlockDriverState *bs, bool want_zero,
+quorum_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
                        int64_t offset, int64_t count,
                        int64_t *pnum, int64_t *map, BlockDriverState **file)
 {
@@ -1238,7 +1238,7 @@ quorum_co_block_status(BlockDriverState *bs, bool want_zero,
     for (i = 0; i < s->num_children; i++) {
         int64_t bytes;
         ret = bdrv_co_common_block_status_above(s->children[i]->bs, NULL, false,
-                                                want_zero, offset, count,
+                                                mode, offset, count,
                                                 &bytes, NULL, NULL, NULL);
         if (ret < 0) {
             quorum_report_bad(QUORUM_OP_TYPE_READ, offset, count,
diff --git a/block/raw-format.c b/block/raw-format.c
index e08526e2eca..0ff5367123b 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -283,8 +283,8 @@ fail:
 }

 static int coroutine_fn GRAPH_RDLOCK
-raw_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
-                    int64_t bytes, int64_t *pnum, int64_t *map,
+raw_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+                    int64_t offset, int64_t bytes, int64_t *pnum, int64_t *map,
                     BlockDriverState **file)
 {
     BDRVRawState *s = bs->opaque;
diff --git a/block/rbd.c b/block/rbd.c
index af984fb7db4..abcdd5e4e76 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1504,9 +1504,9 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
 }

 static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
-                                                 bool want_zero, int64_t offset,
-                                                 int64_t bytes, int64_t *pnum,
-                                                 int64_t *map,
+                                                 enum BlockStatusMode mode,
+                                                 int64_t offset, int64_t bytes,
+                                                 int64_t *pnum, int64_t *map,
                                                  BlockDriverState **file)
 {
     BDRVRBDState *s = bs->opaque;
diff --git a/block/snapshot-access.c b/block/snapshot-access.c
index 71ac83c01f0..1d91b6df5d7 100644
--- a/block/snapshot-access.c
+++ b/block/snapshot-access.c
@@ -41,11 +41,11 @@ snapshot_access_co_preadv_part(BlockDriverState *bs,

 static int coroutine_fn GRAPH_RDLOCK
 snapshot_access_co_block_status(BlockDriverState *bs,
-                                bool want_zero, int64_t offset,
+                                enum BlockStatusMode mode, int64_t offset,
                                 int64_t bytes, int64_t *pnum,
                                 int64_t *map, BlockDriverState **file)
 {
-    return bdrv_co_snapshot_block_status(bs->file->bs, want_zero, offset,
+    return bdrv_co_snapshot_block_status(bs->file->bs, mode, offset,
                                          bytes, pnum, map, file);
 }

diff --git a/block/vdi.c b/block/vdi.c
index a2da6ecab01..9a9d402c946 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -523,8 +523,8 @@ static int vdi_reopen_prepare(BDRVReopenState *state,
 }

 static int coroutine_fn GRAPH_RDLOCK
-vdi_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
-                    int64_t bytes, int64_t *pnum, int64_t *map,
+vdi_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+                    int64_t offset, int64_t bytes, int64_t *pnum, int64_t *map,
                     BlockDriverState **file)
 {
     BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
diff --git a/block/vmdk.c b/block/vmdk.c
index 2adec499122..6e2fd8d16ef 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1777,7 +1777,7 @@ static inline uint64_t vmdk_find_offset_in_cluster(VmdkExtent *extent,
 }

 static int coroutine_fn GRAPH_RDLOCK
-vmdk_co_block_status(BlockDriverState *bs, bool want_zero,
+vmdk_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
                      int64_t offset, int64_t bytes, int64_t *pnum,
                      int64_t *map, BlockDriverState **file)
 {
diff --git a/block/vpc.c b/block/vpc.c
index 0309e319f60..4eac32fd1c4 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -726,7 +726,7 @@ fail:
 }

 static int coroutine_fn GRAPH_RDLOCK
-vpc_co_block_status(BlockDriverState *bs, bool want_zero,
+vpc_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
                     int64_t offset, int64_t bytes,
                     int64_t *pnum, int64_t *map,
                     BlockDriverState **file)
diff --git a/block/vvfat.c b/block/vvfat.c
index 91d69b3cc83..336679cac12 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -3134,9 +3134,9 @@ vvfat_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
 }

 static int coroutine_fn vvfat_co_block_status(BlockDriverState *bs,
-                                              bool want_zero, int64_t offset,
-                                              int64_t bytes, int64_t *n,
-                                              int64_t *map,
+                                              enum BlockStatusMode mode,
+                                              int64_t offset, int64_t bytes,
+                                              int64_t *n, int64_t *map,
                                               BlockDriverState **file)
 {
     *n = bytes;
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index 2b358eaaa82..8189b32fd52 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -63,7 +63,7 @@ bdrv_test_co_truncate(BlockDriverState *bs, int64_t offset, bool exact,
 }

 static int coroutine_fn bdrv_test_co_block_status(BlockDriverState *bs,
-                                                  bool want_zero,
+                                                  enum BlockStatusMode mode,
                                                   int64_t offset, int64_t count,
                                                   int64_t *pnum, int64_t *map,
                                                   BlockDriverState **file)
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/6] file-posix: Recognize blockdev-create file as starting all zero
  2025-04-11  1:04 [PATCH 0/6] Make blockdev-mirror dest sparse in more cases Eric Blake
                   ` (3 preceding siblings ...)
  2025-04-11  1:04 ` [PATCH 4/6] block: Expand block status mode from bool to enum Eric Blake
@ 2025-04-11  1:04 ` Eric Blake
  2025-04-11  1:04 ` [PATCH 6/6] tests: Add iotest mirror-sparse for recent patches Eric Blake
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 23+ messages in thread
From: Eric Blake @ 2025-04-11  1:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Hanna Reitz, open list:raw

There are enough optimizations possible when a file is known to read
as all zero that it is worth taking the extra time during
raw_open_common() to catch more than just 100% sparse files.  In
particular, since our implementation of blockdev-create intentionally
allocates a small all-zero block at the front of a file to make
alignment probing easier, it is well worth the time spent checking
whether that early block reads as all zero if the rest of the file is
still sparse.

I decided to stick the special case lseek and read in open, rather
than in block status; which means the BDRVRawState struct gains
another bool that must be updated anywhere an action can change the
file's contents.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/file-posix.c | 89 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 77 insertions(+), 12 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index dcf906a6a7c..342ade828d7 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -167,6 +167,7 @@ typedef struct BDRVRawState {
     bool force_alignment;
     bool drop_cache;
     bool check_cache_dropped;
+    bool all_zero;
     struct {
         uint64_t discard_nb_ok;
         uint64_t discard_nb_failed;
@@ -543,6 +544,43 @@ static void raw_parse_filename(const char *filename, QDict *options,
     bdrv_parse_filename_strip_prefix(filename, "file:", options);
 }

+static int find_allocation(int fd, off_t start, off_t *data, off_t *hole);
+
+/*
+ * Hueristic to determine if the file reads as all zeroes.  Accounts
+ * for files resulting from raw_co_create, where we intentionally
+ * allocated a small block of zeroes up front to make alignment
+ * probing easier.
+ */
+static bool raw_all_zero(int fd)
+{
+    off_t data, hole;
+    int ret;
+    char *buf;
+    size_t max_align = MAX(MAX_BLOCKSIZE, qemu_real_host_page_size());
+    bool result;
+
+    ret = find_allocation(fd, 0, &data, &hole);
+    if (ret == -ENXIO) {
+        return true;
+    } else if (ret < 0) {
+        return false;
+    }
+    /*
+     * If data is reasonably small, AND if the rest of the file is a
+     * hole, then it is worth reading the data portion to see if it
+     * reads as zero.
+     */
+    if (data != 0 || hole > max_align ||
+        find_allocation(fd, hole, &data, &hole) != -ENXIO) {
+        return false;
+    }
+    buf = qemu_memalign(max_align, max_align);
+    result = pread(fd, buf, hole, 0) == hole && buffer_is_zero(buf, hole);
+    qemu_vfree(buf);
+    return result;
+}
+
 static QemuOptsList raw_runtime_opts = {
     .name = "raw",
     .head = QTAILQ_HEAD_INITIALIZER(raw_runtime_opts.head),
@@ -756,9 +794,9 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
                        bs->drv->format_name, bs->filename);
             ret = -EINVAL;
             goto fail;
-        } else {
-            s->has_fallocate = true;
         }
+        s->has_fallocate = true;
+        s->all_zero = raw_all_zero(s->fd);
     } else {
         if (!(S_ISCHR(st.st_mode) || S_ISBLK(st.st_mode))) {
             error_setg(errp, "'%s' driver requires '%s' to be either "
@@ -1695,9 +1733,13 @@ qemu_pwritev(int fd, const struct iovec *iov, int nr_iov, off_t offset)
 static ssize_t handle_aiocb_rw_vector(RawPosixAIOData *aiocb)
 {
     ssize_t len;
+    BDRVRawState *s = aiocb->bs->opaque;
+    bool do_write = aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND);

-    len = RETRY_ON_EINTR(
-        (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) ?
+    if (do_write) {
+        s->all_zero = false;
+    }
+    len = RETRY_ON_EINTR(do_write ?
             qemu_pwritev(aiocb->aio_fildes,
                            aiocb->io.iov,
                            aiocb->io.niov,
@@ -1724,9 +1766,11 @@ static ssize_t handle_aiocb_rw_linear(RawPosixAIOData *aiocb, char *buf)
 {
     ssize_t offset = 0;
     ssize_t len;
+    BDRVRawState *s = aiocb->bs->opaque;

     while (offset < aiocb->aio_nbytes) {
         if (aiocb->aio_type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
+            s->all_zero = false;
             len = pwrite(aiocb->aio_fildes,
                          (const char *)buf + offset,
                          aiocb->aio_nbytes - offset,
@@ -2171,7 +2215,9 @@ static int handle_aiocb_copy_range(void *opaque)
     uint64_t bytes = aiocb->aio_nbytes;
     off_t in_off = aiocb->aio_offset;
     off_t out_off = aiocb->copy_range.aio_offset2;
+    BDRVRawState *s = aiocb->bs->opaque;

+    s->all_zero = false;
     while (bytes) {
         ssize_t ret = copy_file_range(aiocb->aio_fildes, &in_off,
                                       aiocb->copy_range.aio_fd2, &out_off,
@@ -2208,6 +2254,9 @@ static int handle_aiocb_discard(void *opaque)
     if (!s->has_discard) {
         return -ENOTSUP;
     }
+    if (!s->has_write_zeroes) {
+        s->all_zero = false;
+    }

     if (aiocb->aio_type & QEMU_AIO_BLKDEV) {
 #ifdef BLKDISCARD
@@ -2493,6 +2542,9 @@ static int coroutine_fn raw_co_prw(BlockDriverState *bs, int64_t *offset_ptr,
     int ret;
     uint64_t offset = *offset_ptr;

+    if (type & (QEMU_AIO_WRITE | QEMU_AIO_ZONE_APPEND)) {
+        s->all_zero = false;
+    }
     if (fd_open(bs) < 0)
         return -EIO;
 #if defined(CONFIG_BLKZONED)
@@ -3103,11 +3155,10 @@ static int coroutine_fn raw_co_delete_file(BlockDriverState *bs,
  * If @start is in a trailing hole or beyond EOF, return -ENXIO.
  * If we can't find out, return a negative errno other than -ENXIO.
  */
-static int find_allocation(BlockDriverState *bs, off_t start,
+static int find_allocation(int fd, off_t start,
                            off_t *data, off_t *hole)
 {
 #if defined SEEK_HOLE && defined SEEK_DATA
-    BDRVRawState *s = bs->opaque;
     off_t offs;

     /*
@@ -3121,7 +3172,7 @@ static int find_allocation(BlockDriverState *bs, off_t start,
      *     Treating like a trailing hole is simplest.
      * D4. offs < 0, errno != ENXIO: we learned nothing
      */
-    offs = lseek(s->fd, start, SEEK_DATA);
+    offs = lseek(fd, start, SEEK_DATA);
     if (offs < 0) {
         return -errno;          /* D3 or D4 */
     }
@@ -3158,7 +3209,7 @@ static int find_allocation(BlockDriverState *bs, off_t start,
      * H4. offs < 0, errno != ENXIO: we learned nothing
      *     Pretend we know nothing at all, i.e. "forget" about D1.
      */
-    offs = lseek(s->fd, start, SEEK_HOLE);
+    offs = lseek(fd, start, SEEK_HOLE);
     if (offs < 0) {
         return -errno;          /* D1 and (H3 or H4) */
     }
@@ -3207,6 +3258,7 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
                                             int64_t *map,
                                             BlockDriverState **file)
 {
+    BDRVRawState *s = bs->opaque;
     off_t data = 0, hole = 0;
     int ret;

@@ -3218,20 +3270,29 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
     }

     /*
-     * If mode != PRECISE, then the caller wants speed over
+     * If mode == ALLOCATED, then the caller wants speed over
      * accuracy, and the only place where SEEK_DATA should be
      * attempted is at the start of the file to learn if the file has
      * any data at all (anywhere else, just blindly claim the entire
-     * file is data).
+     * file is data).  If mode == ZERO, use internal status tracking
+     * to work around thet case where raw_co_block_status calls
+     * allocate_first_block to ease alignment probing but which breaks
+     * lseek hole detection.
      */
-    if (mode != BDRV_BSTAT_PRECISE && offset) {
+    if (mode == BDRV_BSTAT_ALLOCATED && offset) {
         *pnum = bytes;
         *map = offset;
         *file = bs;
         return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
     }
+    if (mode == BDRV_BSTAT_ZERO && s->all_zero) {
+        *pnum = bytes;
+        *map = offset;
+        *file = bs;
+        return BDRV_BLOCK_ZERO;
+    }

-    ret = find_allocation(bs, offset, &data, &hole);
+    ret = find_allocation(s->fd, offset, &data, &hole);
     if (ret == -ENXIO) {
         /* Trailing hole */
         *pnum = bytes;
@@ -3572,6 +3633,9 @@ raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
     RawPosixAIOData acb;
     int ret;

+    if (!s->has_write_zeroes) {
+        s->all_zero = false;
+    }
     acb = (RawPosixAIOData) {
         .bs             = bs,
         .aio_fildes     = s->fd,
@@ -3874,6 +3938,7 @@ raw_co_copy_range_to(BlockDriverState *bs,
     BDRVRawState *s = bs->opaque;
     BDRVRawState *src_s;

+    s->all_zero = false;
     assert(dst->bs == bs);
     if (src->bs->drv->bdrv_co_copy_range_to != raw_co_copy_range_to) {
         return -ENOTSUP;
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/6] tests: Add iotest mirror-sparse for recent patches
  2025-04-11  1:04 [PATCH 0/6] Make blockdev-mirror dest sparse in more cases Eric Blake
                   ` (4 preceding siblings ...)
  2025-04-11  1:04 ` [PATCH 5/6] file-posix: Recognize blockdev-create file as starting all zero Eric Blake
@ 2025-04-11  1:04 ` Eric Blake
  2025-04-11 14:48 ` [PATCH 7/6] mirror: Allow QMP override to declare target already zero Eric Blake
  2025-04-11 20:15 ` [PATCH 0/6] Make blockdev-mirror dest sparse in more cases Eric Blake
  7 siblings, 0 replies; 23+ messages in thread
From: Eric Blake @ 2025-04-11  1:04 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Hanna Reitz, open list:Block layer core

Prove that blockdev-mirror can now result in sparse destination files.
By making this a separate test, it was possible to test effects of
individual patches for the various pieces that all have to work
together for a sparse mirror to be successful.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 tests/qemu-iotests/tests/mirror-sparse     | 100 ++++++++++++++
 tests/qemu-iotests/tests/mirror-sparse.out | 153 +++++++++++++++++++++
 2 files changed, 253 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/mirror-sparse
 create mode 100644 tests/qemu-iotests/tests/mirror-sparse.out

diff --git a/tests/qemu-iotests/tests/mirror-sparse b/tests/qemu-iotests/tests/mirror-sparse
new file mode 100755
index 00000000000..4251dea7531
--- /dev/null
+++ b/tests/qemu-iotests/tests/mirror-sparse
@@ -0,0 +1,100 @@
+#!/usr/bin/env bash
+# group: rw auto quick
+#
+# Test blockdev-mirror with raw sparse destination
+#
+# Copyright (C) 2025 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1 # failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+    _cleanup_qemu
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+cd ..
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt qcow2 raw  # Format of the source. dst is always raw file
+_supported_proto file
+_supported_os Linux
+
+echo
+echo "=== Initial image setup ==="
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img 20M
+$QEMU_IO -c 'w 8M 2M' -f $IMGFMT "$TEST_IMG.base" | _filter_qemu_io
+
+_launch_qemu -machine q35 \
+    -blockdev '{"driver":"file", "cache":{"direct":true, "no-flush":false},
+                "filename":"'"$TEST_IMG.base"'", "node-name":"src-file"}' \
+    -blockdev '{"driver":"'$IMGFMT'", "node-name":"src", "file":"src-file"}'
+h1=$QEMU_HANDLE
+_send_qemu_cmd $h1 '{"execute": "qmp_capabilities"}' 'return'
+
+# Each of these combinations should result in a sparse destination;
+# the destination should only be fully allocated if pre-allocated
+for creation in external blockdev-create; do
+for discard in ignore unmap; do
+
+echo
+echo "=== Destination with $creation creation and discard=$discard ==="
+echo
+
+rm -f $TEST_IMG
+if test $creation = external; then
+    truncate --size=20M $TEST_IMG
+else
+    _send_qemu_cmd $h1 '{"execute": "blockdev-create", "arguments":
+          {"options": {"driver":"file", "filename":"'$TEST_IMG'",
+          "size":'$((20*1024*1024))'}, "job-id":"job1"}}' 'concluded'
+    _send_qemu_cmd $h1 '{"execute": "job-dismiss", "arguments":
+          {"id": "job1"}}' 'return'
+fi
+_send_qemu_cmd $h1 '{"execute": "blockdev-add", "arguments":
+                     {"node-name": "dst", "driver":"file",
+                      "filename":"'$TEST_IMG'", "aio":"threads",
+                      "auto-read-only":true, "discard":"'$discard'"}}' 'return'
+_send_qemu_cmd $h1 '{"execute":"blockdev-mirror", "arguments":
+                     {"sync":"full", "device":"src", "target":"dst",
+                      "job-id":"job2"}}' 'return'
+_timed_wait_for $h1 '"ready"'
+_send_qemu_cmd $h1 '{"execute": "job-complete", "arguments":
+               {"id":"job2"}}' 'return'
+_send_qemu_cmd $h1 '{"execute": "blockdev-del", "arguments":
+                {"node-name": "dst"}}' 'return'
+$QEMU_IMG compare -f $IMGFMT -F raw $TEST_IMG.base $TEST_IMG
+$QEMU_IMG info -f raw $TEST_IMG | grep '^disk size:'
+
+done
+done
+
+_send_qemu_cmd $h1 '{"execute":"quit"}' ''
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/mirror-sparse.out b/tests/qemu-iotests/tests/mirror-sparse.out
new file mode 100644
index 00000000000..16eb041841f
--- /dev/null
+++ b/tests/qemu-iotests/tests/mirror-sparse.out
@@ -0,0 +1,153 @@
+QA output created by mirror-sparse
+
+=== Initial image setup ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=20971520
+wrote 2097152/2097152 bytes at offset 8388608
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+{"execute": "qmp_capabilities"}
+{"return": {}}
+
+=== Destination with external creation and discard=ignore ===
+
+{"execute": "blockdev-add", "arguments":
+                     {"node-name": "dst", "driver":"file",
+                      "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+                      "auto-read-only":true, "discard":"ignore"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+                     {"sync":"full", "device":"src", "target":"dst",
+                      "job-id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+               {"id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": 20971520, "offset": 20971520, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+                {"node-name": "dst"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": 20971520, "offset": 20971520, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+disk size: 2.06 MiB
+
+=== Destination with external creation and discard=unmap ===
+
+{"execute": "blockdev-add", "arguments":
+                     {"node-name": "dst", "driver":"file",
+                      "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+                      "auto-read-only":true, "discard":"unmap"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+                     {"sync":"full", "device":"src", "target":"dst",
+                      "job-id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+               {"id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": 20971520, "offset": 20971520, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+                {"node-name": "dst"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": 20971520, "offset": 20971520, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+disk size: 2.06 MiB
+
+=== Destination with blockdev-create creation and discard=ignore ===
+
+{"execute": "blockdev-create", "arguments":
+          {"options": {"driver":"file", "filename":"TEST_DIR/t.IMGFMT",
+          "size":20971520}, "job-id":"job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job1"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job1"}}
+{"execute": "job-dismiss", "arguments":
+          {"id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job1"}}
+{"return": {}}
+{"execute": "blockdev-add", "arguments":
+                     {"node-name": "dst", "driver":"file",
+                      "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+                      "auto-read-only":true, "discard":"ignore"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+                     {"sync":"full", "device":"src", "target":"dst",
+                      "job-id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+               {"id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": 20971520, "offset": 20971520, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+                {"node-name": "dst"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": 20971520, "offset": 20971520, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+disk size: 2.06 MiB
+
+=== Destination with blockdev-create creation and discard=unmap ===
+
+{"execute": "blockdev-create", "arguments":
+          {"options": {"driver":"file", "filename":"TEST_DIR/t.IMGFMT",
+          "size":20971520}, "job-id":"job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job1"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job1"}}
+{"execute": "job-dismiss", "arguments":
+          {"id": "job1"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job1"}}
+{"return": {}}
+{"execute": "blockdev-add", "arguments":
+                     {"node-name": "dst", "driver":"file",
+                      "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+                      "auto-read-only":true, "discard":"unmap"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+                     {"sync":"full", "device":"src", "target":"dst",
+                      "job-id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+               {"id":"job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": 20971520, "offset": 20971520, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+                {"node-name": "dst"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": 20971520, "offset": 20971520, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+disk size: 2.06 MiB
+{"execute":"quit"}
+*** done
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/6] mirror: Allow QMP override to declare target already zero
  2025-04-11  1:04 [PATCH 0/6] Make blockdev-mirror dest sparse in more cases Eric Blake
                   ` (5 preceding siblings ...)
  2025-04-11  1:04 ` [PATCH 6/6] tests: Add iotest mirror-sparse for recent patches Eric Blake
@ 2025-04-11 14:48 ` Eric Blake
  2025-04-12  4:56   ` Markus Armbruster
  2025-04-11 20:15 ` [PATCH 0/6] Make blockdev-mirror dest sparse in more cases Eric Blake
  7 siblings, 1 reply; 23+ messages in thread
From: Eric Blake @ 2025-04-11 14:48 UTC (permalink / raw)
  To: qemu-devel
  Cc: John Snow, Vladimir Sementsov-Ogievskiy, Kevin Wolf, Hanna Reitz,
	Markus Armbruster, open list:Block Jobs

Qemu's attempts to learn whether a destination file starts life with
all zero contents are just a hueristic.  There may be cases where the
caller is aware of information that qemu cannot learn quickly, in
which case telling qemu what to assume about the destination can make
the mirror operation faster.  Given our existing example of "qemu-img
convert --target-is-zero", it is time to expose this override in QMP
for blockdev-mirror as well.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 qapi/block-core.json                   |  9 ++++++++-
 include/block/block_int-global-state.h |  3 ++-
 block/mirror.c                         | 19 +++++++++++++------
 blockdev.c                             | 18 +++++++++++-------
 tests/unit/test-block-iothread.c       |  2 +-
 5 files changed, 35 insertions(+), 16 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index b1937780e19..6d6185a336a 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2538,6 +2538,12 @@
 #     disappear from the query list without user intervention.
 #     Defaults to true.  (Since 3.1)
 #
+# @target-is-zero: Assume the destination read as all zeroes before
+#     the mirror started, even if qemu is unable to quickly learn that
+#     from the destination.  Default false, since setting this to true
+#     when the destination is not already zero can lead to a corrupt
+#     destination.  (Since 9.1)
+#
 # Since: 2.6
 #
 # .. qmp-example::
@@ -2557,7 +2563,8 @@
             '*on-target-error': 'BlockdevOnError',
             '*filter-node-name': 'str',
             '*copy-mode': 'MirrorCopyMode',
-            '*auto-finalize': 'bool', '*auto-dismiss': 'bool' },
+            '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
+            '*target-is-zero': 'bool'},
   'allow-preconfig': true }

 ##
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index eb2d92a2261..a2b96f90d44 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -140,6 +140,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
  * @mode: Whether to collapse all images in the chain to the target.
  * @backing_mode: How to establish the target's backing chain after completion.
  * @zero_target: Whether the target should be explicitly zero-initialized
+ * @target_is_zero: Whether the target already is zero-initialized
  * @on_source_error: The action to take upon error reading from the source.
  * @on_target_error: The action to take upon error writing to the target.
  * @unmap: Whether to unmap target where source sectors only contain zeroes.
@@ -159,7 +160,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
                   int creation_flags, int64_t speed,
                   uint32_t granularity, int64_t buf_size,
                   MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
-                  bool zero_target,
+                  bool zero_target, bool target_is_zero,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   bool unmap, const char *filter_node_name,
diff --git a/block/mirror.c b/block/mirror.c
index 98da5a6dc27..c0936a31028 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -55,6 +55,8 @@ typedef struct MirrorBlockJob {
     BlockMirrorBackingMode backing_mode;
     /* Whether the target image requires explicit zero-initialization */
     bool zero_target;
+    /* Whether the target should be assumed to be already zero initialized */
+    bool target_is_zero;
     /*
      * To be accesssed with atomics. Written only under the BQL (required by the
      * current implementation of mirror_change()).
@@ -877,7 +879,11 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
     bdrv_graph_co_rdlock();
     bs = s->mirror_top_bs->backing->bs;
     if (s->zero_target) {
-        ret = bdrv_co_is_zero_fast(target_bs, 0, s->bdev_length);
+        if (s->target_is_zero) {
+            ret = 1;
+        } else {
+            ret = bdrv_co_is_zero_fast(target_bs, 0, s->bdev_length);
+        }
     }
     bdrv_graph_co_rdunlock();

@@ -1780,7 +1786,7 @@ static BlockJob *mirror_start_job(
                              const char *replaces, int64_t speed,
                              uint32_t granularity, int64_t buf_size,
                              BlockMirrorBackingMode backing_mode,
-                             bool zero_target,
+                             bool zero_target, bool target_is_zero,
                              BlockdevOnError on_source_error,
                              BlockdevOnError on_target_error,
                              bool unmap,
@@ -1949,6 +1955,7 @@ static BlockJob *mirror_start_job(
     s->is_none_mode = is_none_mode;
     s->backing_mode = backing_mode;
     s->zero_target = zero_target;
+    s->target_is_zero = target_is_zero;
     qatomic_set(&s->copy_mode, copy_mode);
     s->base = base;
     s->base_overlay = bdrv_find_overlay(bs, base);
@@ -2077,7 +2084,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
                   int creation_flags, int64_t speed,
                   uint32_t granularity, int64_t buf_size,
                   MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
-                  bool zero_target,
+                  bool zero_target, bool target_is_zero,
                   BlockdevOnError on_source_error,
                   BlockdevOnError on_target_error,
                   bool unmap, const char *filter_node_name,
@@ -2102,8 +2109,8 @@ void mirror_start(const char *job_id, BlockDriverState *bs,

     mirror_start_job(job_id, bs, creation_flags, target, replaces,
                      speed, granularity, buf_size, backing_mode, zero_target,
-                     on_source_error, on_target_error, unmap, NULL, NULL,
-                     &mirror_job_driver, is_none_mode, base, false,
+                     target_is_zero, on_source_error, on_target_error, unmap,
+                     NULL, NULL, &mirror_job_driver, is_none_mode, base, false,
                      filter_node_name, true, copy_mode, false, errp);
 }

@@ -2129,7 +2136,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,

     job = mirror_start_job(
                      job_id, bs, creation_flags, base, NULL, speed, 0, 0,
-                     MIRROR_LEAVE_BACKING_CHAIN, false,
+                     MIRROR_LEAVE_BACKING_CHAIN, false, false,
                      on_error, on_error, true, cb, opaque,
                      &commit_active_job_driver, false, base, auto_complete,
                      filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
diff --git a/blockdev.c b/blockdev.c
index 1d1f27cfff6..6f5373991c8 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2798,7 +2798,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
                                    const char *replaces,
                                    enum MirrorSyncMode sync,
                                    BlockMirrorBackingMode backing_mode,
-                                   bool zero_target,
+                                   bool zero_target, bool target_is_zero,
                                    bool has_speed, int64_t speed,
                                    bool has_granularity, uint32_t granularity,
                                    bool has_buf_size, int64_t buf_size,
@@ -2909,11 +2909,10 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
     /* pass the node name to replace to mirror start since it's loose coupling
      * and will allow to check whether the node still exist at mirror completion
      */
-    mirror_start(job_id, bs, target,
-                 replaces, job_flags,
+    mirror_start(job_id, bs, target, replaces, job_flags,
                  speed, granularity, buf_size, sync, backing_mode, zero_target,
-                 on_source_error, on_target_error, unmap, filter_node_name,
-                 copy_mode, errp);
+                 target_is_zero, on_source_error, on_target_error, unmap,
+                 filter_node_name, copy_mode, errp);
 }

 void qmp_drive_mirror(DriveMirror *arg, Error **errp)
@@ -2928,6 +2927,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     int64_t size;
     const char *format = arg->format;
     bool zero_target;
+    bool target_is_zero;
     int ret;

     bs = qmp_get_root_bs(arg->device, errp);
@@ -3044,6 +3044,8 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
     zero_target = (arg->sync == MIRROR_SYNC_MODE_FULL &&
                    (arg->mode == NEW_IMAGE_MODE_EXISTING ||
                     !bdrv_has_zero_init(target_bs)));
+    target_is_zero = (arg->mode == NEW_IMAGE_MODE_ABSOLUTE_PATHS &&
+                      bdrv_has_zero_init(target_bs));
     bdrv_graph_rdunlock_main_loop();


@@ -3055,7 +3057,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)

     blockdev_mirror_common(arg->job_id, bs, target_bs,
                            arg->replaces, arg->sync,
-                           backing_mode, zero_target,
+                           backing_mode, zero_target, target_is_zero,
                            arg->has_speed, arg->speed,
                            arg->has_granularity, arg->granularity,
                            arg->has_buf_size, arg->buf_size,
@@ -3085,6 +3087,7 @@ void qmp_blockdev_mirror(const char *job_id,
                          bool has_copy_mode, MirrorCopyMode copy_mode,
                          bool has_auto_finalize, bool auto_finalize,
                          bool has_auto_dismiss, bool auto_dismiss,
+                         bool has_target_is_zero, bool target_is_zero,
                          Error **errp)
 {
     BlockDriverState *bs;
@@ -3115,7 +3118,8 @@ void qmp_blockdev_mirror(const char *job_id,

     blockdev_mirror_common(job_id, bs, target_bs,
                            replaces, sync, backing_mode,
-                           zero_target, has_speed, speed,
+                           zero_target, has_target_is_zero && target_is_zero,
+                           has_speed, speed,
                            has_granularity, granularity,
                            has_buf_size, buf_size,
                            has_on_source_error, on_source_error,
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index 8189b32fd52..ffc878d401e 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -755,7 +755,7 @@ static void test_propagate_mirror(void)

     /* Start a mirror job */
     mirror_start("job0", src, target, NULL, JOB_DEFAULT, 0, 0, 0,
-                 MIRROR_SYNC_MODE_NONE, MIRROR_OPEN_BACKING_CHAIN, false,
+                 MIRROR_SYNC_MODE_NONE, MIRROR_OPEN_BACKING_CHAIN, false, false,
                  BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
                  false, "filter_node", MIRROR_COPY_MODE_BACKGROUND,
                  &error_abort);
-- 
2.49.0



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] mirror: Skip pre-zeroing destination if it is already zero
  2025-04-11  1:04 ` [PATCH 1/6] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
@ 2025-04-11 19:54   ` Vladimir Sementsov-Ogievskiy
  2025-04-16 21:42     ` Eric Blake
  2025-04-14 16:29   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-04-11 19:54 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: John Snow, Kevin Wolf, Hanna Reitz, open list:Block Jobs

On 11.04.25 04:04, Eric Blake wrote:
> When doing a sync=full mirroring, QMP drive-mirror requests full
> zeroing if it did not just create the destination, and blockdev-mirror
> requests full zeroing unconditionally.  This is because during a full
> sync, we must ensure that the portions of the disk that are not
> otherwise touched by the source still read as zero upon completion.
> 
> However, in mirror_dirty_init(), we were blindly assuming that if the
> destination allows punching holes, we should pre-zero the entire
> image; and if it does not allow punching holes, then treat the entire
> source as dirty rather than mirroring just the allocated portions of
> the source.  Without the ability to punch holes, this results in the
> destination file being fully allocated; and even when punching holes
> is supported, it causes duplicate I/O to the portions of the
> destination corresponding to chunks of the source that are allocated
> but read as zero.
> 
> Smarter is to avoid the pre-zeroing pass over the destination if it
> can be proved the destination already reads as zero.  Note that a
> later patch will then further improve things to skip writing to the
> destination for parts of the image where the source is zero; but even
> with just this patch, it is possible to see a difference for any BDS
> that can quickly report that it already reads as zero.  Iotest 194 is
> proof of this: instead of mirroring a completely sparse file, change
> it to pre-populate some data.  When run with './check -file 194', the
> full 1G is still allocated, but with './check -qcow2 194', only the 1M
> of pre-populated data is now mirrored; this in turn requires an
> additional log filter.
> 
> Note that there are still BDS layers that do not quickly report
> reading as all zero; for example, the file-posix code implementation
> for fast block status currently blindly reports the entire image as
> allocated and non-zero without even consulting lseek(SEEK_DATA)); that
> will be addressed in later patches.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   block/mirror.c             | 10 ++++++++--
>   tests/qemu-iotests/194     | 15 +++++++++++++--
>   tests/qemu-iotests/194.out |  4 ++--
>   3 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index a53582f17bb..2e1e14c8e7e 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -841,14 +841,20 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
>       int64_t offset;
>       BlockDriverState *bs;
>       BlockDriverState *target_bs = blk_bs(s->target);
> -    int ret = -1;
> +    int ret;

I think, it was to avoid Coverity false-positive around further code

         WITH_GRAPH_RDLOCK_GUARD() {
             ret = bdrv_co_is_allocated_above(bs, s->base_overlay, true, offset,
                                              bytes, &count);
         }
         if (ret < 0) {
             return ret;
         }

which you don't touch here. I think "= -1;" should be kept. Or I missed static analyzes revolution (if so, it should be mentioned in commit message).

>       int64_t count;
> 
>       bdrv_graph_co_rdlock();
>       bs = s->mirror_top_bs->backing->bs;
> +    if (s->zero_target) {
> +        ret = bdrv_co_is_zero_fast(target_bs, 0, s->bdev_length);
> +    }
>       bdrv_graph_co_rdunlock();
> 
> -    if (s->zero_target) {
> +    if (s->zero_target && ret <= 0) {
> +        if (ret < 0) {
> +            return ret;
> +        }
>           if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
>               bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
>               return 0;
> diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
> index c0ce82dd257..814c15dfe3b 100755



-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 0/6] Make blockdev-mirror dest sparse in more cases
  2025-04-11  1:04 [PATCH 0/6] Make blockdev-mirror dest sparse in more cases Eric Blake
                   ` (6 preceding siblings ...)
  2025-04-11 14:48 ` [PATCH 7/6] mirror: Allow QMP override to declare target already zero Eric Blake
@ 2025-04-11 20:15 ` Eric Blake
  7 siblings, 0 replies; 23+ messages in thread
From: Eric Blake @ 2025-04-11 20:15 UTC (permalink / raw)
  To: qemu-devel, qemu-block; +Cc: nsoffer, pkrempa

On Thu, Apr 10, 2025 at 08:04:50PM -0500, Eric Blake wrote:
> When mirroring images, it makes sense for the destination to be sparse
> even if it was not connected with "discard":"unmap"; the only time the
> destination should be fully allocated is if the user pre-allocated it,
> or if the source was not sparse.
> 
> Eric Blake (6):
>   mirror: Skip pre-zeroing destination if it is already zero
>   file-posix: Allow lseek at offset 0 when !want_zero
>   mirror: Skip writing zeroes when target is already zero
>   block: Expand block status mode from bool to enum
>   file-posix: Recognize blockdev-create file as starting all zero
>   tests: Add iotest mirror-sparse for recent patches

Responding here to point out that in
https://issues.redhat.com/browse/RHEL-82906, Peter Krempa identified
that it was Nir's earlier patch:

> commit d05ae948cc887054495977855b0859d0d4ab2613
> Author: Nir Soffer <nsoffer@redhat.com>
> Date:   Fri Jun 28 23:20:58 2024 +0300
> 
>     Consider discard option when writing zeros
>     
>     When opening an image with discard=off, we punch hole in the image when
>     writing zeroes, making the image sparse. This breaks users that want to
>     ensure that writes cannot fail with ENOSPACE by using fully allocated
>     images[1].
>     
>     bdrv_co_pwrite_zeroes() correctly disables BDRV_REQ_MAY_UNMAP if we
>     opened the child without discard=unmap or discard=on. But we don't go
>     through this function when accessing the top node. Move the check down
>     to bdrv_co_do_pwrite_zeroes() which seems to be used in all code paths.
>     
>     This change implements the documented behavior, punching holes only when
>     opening the image with discard=on or discard=unmap. This may not be the
>     best default but can improve it later.
> ...

that changed qemu's behavior last year to be closer to its documentation.

It raises the question: should we change the default for "discard"
from "ignore" (what we currently have, where write zeroes defaults to
full allocation if you didn't request otherwise - but where this patch
series demonstrates that we can still be careful to avoid writing
zeroes to something that already reads as zeroes as a way to preserve
sparseness) to "unmap" (ie. we would favor sparse files by default,
but where you can still opt-in to full allocation)?

What are the policies on changing defaults?  Do we have to issue a
deprecation notice for any block device opened without specifying a
"discard" policy, and go through a couple of release cycles, so that
two releases down the road we can change the "discard" parameter from
optional to mandatory, and then even later switch it back to optional
but with a new default?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 7/6] mirror: Allow QMP override to declare target already zero
  2025-04-11 14:48 ` [PATCH 7/6] mirror: Allow QMP override to declare target already zero Eric Blake
@ 2025-04-12  4:56   ` Markus Armbruster
  0 siblings, 0 replies; 23+ messages in thread
From: Markus Armbruster @ 2025-04-12  4:56 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, John Snow, Vladimir Sementsov-Ogievskiy, Kevin Wolf,
	Hanna Reitz, open list:Block Jobs

Eric Blake <eblake@redhat.com> writes:

> Qemu's attempts to learn whether a destination file starts life with

QEMU (multiple times).

> all zero contents are just a hueristic.  There may be cases where the

heuristic

> caller is aware of information that qemu cannot learn quickly, in
> which case telling qemu what to assume about the destination can make
> the mirror operation faster.  Given our existing example of "qemu-img
> convert --target-is-zero", it is time to expose this override in QMP
> for blockdev-mirror as well.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  qapi/block-core.json                   |  9 ++++++++-
>  include/block/block_int-global-state.h |  3 ++-
>  block/mirror.c                         | 19 +++++++++++++------
>  blockdev.c                             | 18 +++++++++++-------
>  tests/unit/test-block-iothread.c       |  2 +-
>  5 files changed, 35 insertions(+), 16 deletions(-)
>
> diff --git a/qapi/block-core.json b/qapi/block-core.json
> index b1937780e19..6d6185a336a 100644
> --- a/qapi/block-core.json
> +++ b/qapi/block-core.json
> @@ -2538,6 +2538,12 @@
>  #     disappear from the query list without user intervention.
>  #     Defaults to true.  (Since 3.1)
>  #
> +# @target-is-zero: Assume the destination read as all zeroes before

reads

> +#     the mirror started, even if qemu is unable to quickly learn that

QEMU

> +#     from the destination.  Default false, since setting this to true
> +#     when the destination is not already zero can lead to a corrupt
> +#     destination.  (Since 9.1)

10.1

The "even if" clause is confusing.  How would I decide that "QEMU is
unable to learn"?

According to the commit message, the purpose of the flag is to maybe
speed up things when we know the destination is all zeroes, but that's
less than obvious from the doc string.

> +#

Here's my attempt:

   # @target-is-zero: Assume the destination read as all zeroes before
   #     the mirror started.  Setting this to true can speed up the
   #     mirror.  Setting this to true when the destination is not
   #     actually all zero can corrupt the destination.  (Since 10.1)

>  # Since: 2.6
>  #
>  # .. qmp-example::
> @@ -2557,7 +2563,8 @@
>              '*on-target-error': 'BlockdevOnError',
>              '*filter-node-name': 'str',
>              '*copy-mode': 'MirrorCopyMode',
> -            '*auto-finalize': 'bool', '*auto-dismiss': 'bool' },
> +            '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
> +            '*target-is-zero': 'bool'},
>    'allow-preconfig': true }
>
>  ##

[...]



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] mirror: Skip pre-zeroing destination if it is already zero
  2025-04-11  1:04 ` [PATCH 1/6] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
  2025-04-11 19:54   ` Vladimir Sementsov-Ogievskiy
@ 2025-04-14 16:29   ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-04-14 16:29 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: John Snow, Kevin Wolf, Hanna Reitz, open list:Block Jobs

On 11.04.25 04:04, Eric Blake wrote:
> When doing a sync=full mirroring, QMP drive-mirror requests full
> zeroing if it did not just create the destination, and blockdev-mirror
> requests full zeroing unconditionally.  This is because during a full
> sync, we must ensure that the portions of the disk that are not
> otherwise touched by the source still read as zero upon completion.
> 
> However, in mirror_dirty_init(), we were blindly assuming that if the
> destination allows punching holes, we should pre-zero the entire
> image; and if it does not allow punching holes, then treat the entire
> source as dirty rather than mirroring just the allocated portions of
> the source.  Without the ability to punch holes, this results in the
> destination file being fully allocated; and even when punching holes
> is supported, it causes duplicate I/O to the portions of the
> destination corresponding to chunks of the source that are allocated
> but read as zero.
> 
> Smarter is to avoid the pre-zeroing pass over the destination if it
> can be proved the destination already reads as zero.  Note that a
> later patch will then further improve things to skip writing to the
> destination for parts of the image where the source is zero; but even
> with just this patch, it is possible to see a difference for any BDS
> that can quickly report that it already reads as zero.  Iotest 194 is
> proof of this: instead of mirroring a completely sparse file, change
> it to pre-populate some data.  When run with './check -file 194', the
> full 1G is still allocated, but with './check -qcow2 194', only the 1M
> of pre-populated data is now mirrored; this in turn requires an
> additional log filter.

This make me doubt.

Actually, what is the different for the user between "fast zeroing empty qcow2 (actually no-op)" and "skip zeroing empty qcow2 (no-op)"? And why visible interface effect should change after some internal optimization?

> 
> Note that there are still BDS layers that do not quickly report
> reading as all zero; for example, the file-posix code implementation
> for fast block status currently blindly reports the entire image as
> allocated and non-zero without even consulting lseek(SEEK_DATA)); that
> will be addressed in later patches.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   block/mirror.c             | 10 ++++++++--
>   tests/qemu-iotests/194     | 15 +++++++++++++--
>   tests/qemu-iotests/194.out |  4 ++--
>   3 files changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index a53582f17bb..2e1e14c8e7e 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -841,14 +841,20 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
>       int64_t offset;
>       BlockDriverState *bs;
>       BlockDriverState *target_bs = blk_bs(s->target);
> -    int ret = -1;
> +    int ret;
>       int64_t count;
> 
>       bdrv_graph_co_rdlock();
>       bs = s->mirror_top_bs->backing->bs;
> +    if (s->zero_target) {
> +        ret = bdrv_co_is_zero_fast(target_bs, 0, s->bdev_length);
> +    }
>       bdrv_graph_co_rdunlock();
> 
> -    if (s->zero_target) {
> +    if (s->zero_target && ret <= 0) {
> +        if (ret < 0) {
> +            return ret;
> +        }
>           if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
>               bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);

So, without your patch we go into this "if" and set the whole dirty bitmap even with qcow2?

That looks like a preexisting bug to me. And this is because we don't have discard=unmap option on qcow2 node..

Probably, we'd better have something like bdrv_can_write_zeroes_fast(target_bs) here, which of course should be true for qcow2.

Probably in real world people always set discard=unmap for qcow2 nodes.

>               return 0;
> diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
> index c0ce82dd257..814c15dfe3b 100755
> --- a/tests/qemu-iotests/194
> +++ b/tests/qemu-iotests/194
> @@ -22,6 +22,14 @@
> 
>   import iotests
> 
> +def filter_job_event(event):
> +    '''Filter len and offset in a job event'''
> +    event = dict(event)
> +    if event['data'].get('len', 0) == event['data'].get('offset', 1):

I'd prefer explicit check that both fields exist instead of magic 0 and 1. But I don't care too much.

> +        event['data']['len'] = 'LEN'
> +        event['data']['offset'] = 'LEN'
> +    return event
> +
>   iotests.script_initialize(supported_fmts=['qcow2', 'qed', 'raw'],
>                             supported_platforms=['linux'])
> 
> @@ -34,6 +42,7 @@ with iotests.FilePath('source.img') as source_img_path, \
> 
>       img_size = '1G'
>       iotests.qemu_img_create('-f', iotests.imgfmt, source_img_path, img_size)
> +    iotests.qemu_io('-f', iotests.imgfmt, '-c', 'write 512M 1M', source_img_path)
>       iotests.qemu_img_create('-f', iotests.imgfmt, dest_img_path, img_size)
> 
>       iotests.log('Launching VMs...')
> @@ -61,7 +70,8 @@ with iotests.FilePath('source.img') as source_img_path, \
> 
>       iotests.log('Waiting for `drive-mirror` to complete...')
>       iotests.log(source_vm.event_wait('BLOCK_JOB_READY'),
> -                filters=[iotests.filter_qmp_event])
> +                filters=[iotests.filter_qmp_event,
> +                         filter_job_event])
> 
>       iotests.log('Starting migration...')
>       capabilities = [{'capability': 'events', 'state': True},
> @@ -87,7 +97,8 @@ with iotests.FilePath('source.img') as source_img_path, \
> 
>       while True:
>           event2 = source_vm.event_wait('BLOCK_JOB_COMPLETED')
> -        iotests.log(event2, filters=[iotests.filter_qmp_event])
> +        iotests.log(event2, filters=[iotests.filter_qmp_event,
> +                                     filter_job_event])
>           if event2['event'] == 'BLOCK_JOB_COMPLETED':
>               iotests.log('Stopping the NBD server on destination...')
>               iotests.log(dest_vm.qmp('nbd-server-stop'))
> diff --git a/tests/qemu-iotests/194.out b/tests/qemu-iotests/194.out
> index 6940e809cde..79b961723d8 100644
> --- a/tests/qemu-iotests/194.out
> +++ b/tests/qemu-iotests/194.out
> @@ -7,7 +7,7 @@ Launching NBD server on destination...
>   Starting `drive-mirror` on source...
>   {"return": {}}
>   Waiting for `drive-mirror` to complete...
> -{"data": {"device": "mirror-job0", "len": 1073741824, "offset": 1073741824, "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
> +{"data": {"device": "mirror-job0", "len": "LEN", "offset": "LEN", "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_READY", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>   Starting migration...
>   {"return": {}}
>   {"execute": "migrate-start-postcopy", "arguments": {}}
> @@ -18,7 +18,7 @@ Starting migration...
>   {"data": {"status": "completed"}, "event": "MIGRATION", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>   Gracefully ending the `drive-mirror` job on source...
>   {"return": {}}
> -{"data": {"device": "mirror-job0", "len": 1073741824, "offset": 1073741824, "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
> +{"data": {"device": "mirror-job0", "len": "LEN", "offset": "LEN", "speed": 0, "type": "mirror"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
>   Stopping the NBD server on destination...
>   {"return": {}}
>   Wait for migration completion on target...


With "int ret = -1;" kept as is:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/6] file-posix: Allow lseek at offset 0 when !want_zero
  2025-04-11  1:04 ` [PATCH 2/6] file-posix: Allow lseek at offset 0 when !want_zero Eric Blake
@ 2025-04-14 17:05   ` Vladimir Sementsov-Ogievskiy
  2025-04-14 18:12     ` Eric Blake
  0 siblings, 1 reply; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-04-14 17:05 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: Kevin Wolf, Hanna Reitz, open list:raw, Denis V. Lunev

On 11.04.25 04:04, Eric Blake wrote:
> The 'want_zero' parameter to raw_co_block_status() was added so that
> we can avoid potentially time-consuming lseek(SEEK_DATA) calls
> throughout the file (working around poor filesystems that have O(n)
> rather than O(1) extent probing).  But when it comes to learning if a
> file is completely sparse (for example, it was just created), always
> claiming that a file is all data without even checking offset 0 breaks
> what would otherwise be attempts at useful optimizations for a
> known-zero mirror destination.
> 
> Note that this allows file-posix to report a file as completely zero
> if it was externally created (such as via 'truncate --size=$n file')
> as entirely sparse; however, it does NOT work for files created
> internally by blockdev-create.  That's because blockdev-create
> intentionally does a sequence of truncate(0), truncate(size),
> allocate_first_block(), in order to make it possible for gluster on
> XFS to probe the sector size for direct I/O (which doesn't work if the
> first block is sparse).  That will be addressed in a later patch.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   block/file-posix.c | 9 ++++++++-
>   1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/block/file-posix.c b/block/file-posix.c
> index 56d1972d156..67e83528cf5 100644
> --- a/block/file-posix.c
> +++ b/block/file-posix.c
> @@ -3217,7 +3217,14 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
>           return ret;
>       }
> 
> -    if (!want_zero) {
> +    /*
> +     * If want_zero is clear, then the caller wants speed over
> +     * accuracy, and the only place where SEEK_DATA should be
> +     * attempted is at the start of the file to learn if the file has
> +     * any data at all (anywhere else, just blindly claim the entire
> +     * file is data).
> +     */
> +    if (!want_zero && offset) {
>           *pnum = bytes;
>           *map = offset;
>           *file = bs;

Looks like a hack. So we have bdrv_co_is_zero_fast() which do pass want_zero=false to block-status. But in case of mirror, which want to check the whole disk, we actually want want_zero=true, and detect it by offset=0..

Isn't it better to add a kind of bdrv_is_zero_middle_speed() (which means, don't try to read the data to check, but be free to use suboptimal lseek call or something like this), which will pass want_zero=true, and use it from mirror? Mirror case differs from usage in qcow2 exactly by the fact that we call it only once.


Another doubt (really weak): can this one extra lseek be so slow, that mirror becomes worse?
Den, is it right, that problems about slow lseek (that we experienced several years ago) were about qcow2-internals, and nothing related to mirror itself? May one lseek call on mirror target break something?


-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/6] file-posix: Allow lseek at offset 0 when !want_zero
  2025-04-14 17:05   ` Vladimir Sementsov-Ogievskiy
@ 2025-04-14 18:12     ` Eric Blake
  2025-04-15 12:37       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Blake @ 2025-04-14 18:12 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, Kevin Wolf, Hanna Reitz, open list:raw,
	Denis V. Lunev

On Mon, Apr 14, 2025 at 08:05:21PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 11.04.25 04:04, Eric Blake wrote:
> > The 'want_zero' parameter to raw_co_block_status() was added so that
> > we can avoid potentially time-consuming lseek(SEEK_DATA) calls
> > throughout the file (working around poor filesystems that have O(n)
> > rather than O(1) extent probing).  But when it comes to learning if a
> > file is completely sparse (for example, it was just created), always
> > claiming that a file is all data without even checking offset 0 breaks
> > what would otherwise be attempts at useful optimizations for a
> > known-zero mirror destination.
> > 
> > Note that this allows file-posix to report a file as completely zero
> > if it was externally created (such as via 'truncate --size=$n file')
> > as entirely sparse; however, it does NOT work for files created
> > internally by blockdev-create.  That's because blockdev-create
> > intentionally does a sequence of truncate(0), truncate(size),
> > allocate_first_block(), in order to make it possible for gluster on
> > XFS to probe the sector size for direct I/O (which doesn't work if the
> > first block is sparse).  That will be addressed in a later patch.
> > 
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > ---
> >   block/file-posix.c | 9 ++++++++-
> >   1 file changed, 8 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/file-posix.c b/block/file-posix.c
> > index 56d1972d156..67e83528cf5 100644
> > --- a/block/file-posix.c
> > +++ b/block/file-posix.c
> > @@ -3217,7 +3217,14 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
> >           return ret;
> >       }
> > 
> > -    if (!want_zero) {
> > +    /*
> > +     * If want_zero is clear, then the caller wants speed over
> > +     * accuracy, and the only place where SEEK_DATA should be
> > +     * attempted is at the start of the file to learn if the file has
> > +     * any data at all (anywhere else, just blindly claim the entire
> > +     * file is data).
> > +     */
> > +    if (!want_zero && offset) {
> >           *pnum = bytes;
> >           *map = offset;
> >           *file = bs;
> 
> Looks like a hack. So we have bdrv_co_is_zero_fast() which do pass want_zero=false to block-status. But in case of mirror, which want to check the whole disk, we actually want want_zero=true, and detect it by offset=0..
> 
> Isn't it better to add a kind of bdrv_is_zero_middle_speed() (which means, don't try to read the data to check, but be free to use suboptimal lseek call or something like this), which will pass want_zero=true, and use it from mirror? Mirror case differs from usage in qcow2 exactly by the fact that we call it only once.

Which is exactly why I wrote patch 4/6 turning the want_zero bool into
an enum so that we are being more explicit in WHY block status is
being requested.

> 
> 
> Another doubt (really weak): can this one extra lseek be so slow, that mirror becomes worse?
> Den, is it right, that problems about slow lseek (that we experienced several years ago) were about qcow2-internals, and nothing related to mirror itself? May one lseek call on mirror target break something?

I'm not worried about one or even two lseek on the destination; what
I'm trying to avoid is an lseek on the destination for every chunk
where the source is doing a write-zeroes request (we're already doing
an lseek on the source as part of the background task of the mirror,
but foreground writes being split into the original and the mirror are
where it really matters that we aren't going slower just because of
the mirror).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/6] file-posix: Allow lseek at offset 0 when !want_zero
  2025-04-14 18:12     ` Eric Blake
@ 2025-04-15 12:37       ` Vladimir Sementsov-Ogievskiy
  2025-04-15 15:22         ` Eric Blake
  0 siblings, 1 reply; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-04-15 12:37 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Kevin Wolf, Hanna Reitz, open list:raw,
	Denis V. Lunev

On 14.04.25 21:12, Eric Blake wrote:
> On Mon, Apr 14, 2025 at 08:05:21PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 11.04.25 04:04, Eric Blake wrote:
>>> The 'want_zero' parameter to raw_co_block_status() was added so that
>>> we can avoid potentially time-consuming lseek(SEEK_DATA) calls
>>> throughout the file (working around poor filesystems that have O(n)
>>> rather than O(1) extent probing).  But when it comes to learning if a
>>> file is completely sparse (for example, it was just created), always
>>> claiming that a file is all data without even checking offset 0 breaks
>>> what would otherwise be attempts at useful optimizations for a
>>> known-zero mirror destination.
>>>
>>> Note that this allows file-posix to report a file as completely zero
>>> if it was externally created (such as via 'truncate --size=$n file')
>>> as entirely sparse; however, it does NOT work for files created
>>> internally by blockdev-create.  That's because blockdev-create
>>> intentionally does a sequence of truncate(0), truncate(size),
>>> allocate_first_block(), in order to make it possible for gluster on
>>> XFS to probe the sector size for direct I/O (which doesn't work if the
>>> first block is sparse).  That will be addressed in a later patch.
>>>
>>> Signed-off-by: Eric Blake<eblake@redhat.com>
>>> ---
>>>    block/file-posix.c | 9 ++++++++-
>>>    1 file changed, 8 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>> index 56d1972d156..67e83528cf5 100644
>>> --- a/block/file-posix.c
>>> +++ b/block/file-posix.c
>>> @@ -3217,7 +3217,14 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
>>>            return ret;
>>>        }
>>>
>>> -    if (!want_zero) {
>>> +    /*
>>> +     * If want_zero is clear, then the caller wants speed over
>>> +     * accuracy, and the only place where SEEK_DATA should be
>>> +     * attempted is at the start of the file to learn if the file has
>>> +     * any data at all (anywhere else, just blindly claim the entire
>>> +     * file is data).
>>> +     */
>>> +    if (!want_zero && offset) {
>>>            *pnum = bytes;
>>>            *map = offset;
>>>            *file = bs;
>> Looks like a hack. So we have bdrv_co_is_zero_fast() which do pass want_zero=false to block-status. But in case of mirror, which want to check the whole disk, we actually want want_zero=true, and detect it by offset=0..
>>
>> Isn't it better to add a kind of bdrv_is_zero_middle_speed() (which means, don't try to read the data to check, but be free to use suboptimal lseek call or something like this), which will pass want_zero=true, and use it from mirror? Mirror case differs from usage in qcow2 exactly by the fact that we call it only once.
> Which is exactly why I wrote patch 4/6 turning the want_zero bool into
> an enum so that we are being more explicit in WHY block status is
> being requested.

Hmm, and this makes more strange that this hack for file-posix is kept after it. Don't we have other block drivers, where we should behave similarly in block_status for offset=0? Or I mean, could we just use different block-status modes in qcow2 and mirror, when call bdrv_co_is_zero_fast(), and don't handle offset==0 specially in file-posix?

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/6] file-posix: Allow lseek at offset 0 when !want_zero
  2025-04-15 12:37       ` Vladimir Sementsov-Ogievskiy
@ 2025-04-15 15:22         ` Eric Blake
  2025-04-15 16:11           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Blake @ 2025-04-15 15:22 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, Kevin Wolf, Hanna Reitz, open list:raw,
	Denis V. Lunev

On Tue, Apr 15, 2025 at 03:37:39PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 14.04.25 21:12, Eric Blake wrote:
> > On Mon, Apr 14, 2025 at 08:05:21PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > On 11.04.25 04:04, Eric Blake wrote:
> > > > The 'want_zero' parameter to raw_co_block_status() was added so that
> > > > we can avoid potentially time-consuming lseek(SEEK_DATA) calls
> > > > throughout the file (working around poor filesystems that have O(n)
> > > > rather than O(1) extent probing).  But when it comes to learning if a
> > > > file is completely sparse (for example, it was just created), always
> > > > claiming that a file is all data without even checking offset 0 breaks
> > > > what would otherwise be attempts at useful optimizations for a
> > > > known-zero mirror destination.
> > > > 
> > > > Note that this allows file-posix to report a file as completely zero
> > > > if it was externally created (such as via 'truncate --size=$n file')
> > > > as entirely sparse; however, it does NOT work for files created
> > > > internally by blockdev-create.  That's because blockdev-create
> > > > intentionally does a sequence of truncate(0), truncate(size),
> > > > allocate_first_block(), in order to make it possible for gluster on
> > > > XFS to probe the sector size for direct I/O (which doesn't work if the
> > > > first block is sparse).  That will be addressed in a later patch.
> > > > 
> > > > Signed-off-by: Eric Blake<eblake@redhat.com>
> > > > ---
> > > >    block/file-posix.c | 9 ++++++++-
> > > >    1 file changed, 8 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > > index 56d1972d156..67e83528cf5 100644
> > > > --- a/block/file-posix.c
> > > > +++ b/block/file-posix.c
> > > > @@ -3217,7 +3217,14 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
> > > >            return ret;
> > > >        }
> > > > 
> > > > -    if (!want_zero) {
> > > > +    /*
> > > > +     * If want_zero is clear, then the caller wants speed over
> > > > +     * accuracy, and the only place where SEEK_DATA should be
> > > > +     * attempted is at the start of the file to learn if the file has
> > > > +     * any data at all (anywhere else, just blindly claim the entire
> > > > +     * file is data).
> > > > +     */
> > > > +    if (!want_zero && offset) {
> > > >            *pnum = bytes;
> > > >            *map = offset;
> > > >            *file = bs;
> > > Looks like a hack. So we have bdrv_co_is_zero_fast() which do pass want_zero=false to block-status. But in case of mirror, which want to check the whole disk, we actually want want_zero=true, and detect it by offset=0..
> > > 
> > > Isn't it better to add a kind of bdrv_is_zero_middle_speed() (which means, don't try to read the data to check, but be free to use suboptimal lseek call or something like this), which will pass want_zero=true, and use it from mirror? Mirror case differs from usage in qcow2 exactly by the fact that we call it only once.
> > Which is exactly why I wrote patch 4/6 turning the want_zero bool into
> > an enum so that we are being more explicit in WHY block status is
> > being requested.
> 
> Hmm, and this makes more strange that this hack for file-posix is kept after it. Don't we have other block drivers, where we should behave similarly in block_status for offset=0? Or I mean, could we just use different block-status modes in qcow2 and mirror, when call bdrv_co_is_zero_fast(), and don't handle offset==0 specially in file-posix?

I'll need to ajust this in v2 of my series.

The problem I'm trying to resolve: libvirt migration with
--migrate-disks-detect-zeroes is causing the destination to become
fully allocated if the destination does not use "discard":"unmap",
whereas with QEMU 9.0 it did not.  The change was commit d05ae948c,
where we fixed a problem with punching holes into the mirror
destination even when the user had requested the file to not shrink;
but it means that a file that is not allowed to shrink is now fully
allocated rather than left sparse even when leaving it sparse would
still be an accurate mirror result with less I/O.

And it turns out that when libvirt is driving the mirror into a raw
destination, the mirroring is done over NBD, rather than directly to a
file-posix destination.  Coupled with the oddity that 'qemu-img create
-f raw' ends up pre-allocating up to 64k of the image with all zero
contents to make alignment probing easier, it is no longer possible to
quickly see that the entire NBD image reads as zero, just as it was
not possible to quickly see that for file-posix.  So, in v2 of my
patch series, I think I need to hoist the logic that I added to
file-posix.c that reads an initial small sector in order to determine
if the entire image is zero to instead live in io.c to be used across
all protocols, NBD included.

So if you don't think offset=0 should be given any special treatment
in file-posix, but instead have generic code in io.c that checks if an
entire image has block_status of sparse and/or just a small initial
non-sparse section that can be manually checked for zero contents,
then that should still fix the issue of mirroring to a sparse
destination with "discard":"ignore" but keeping the destination
sparse.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] mirror: Skip writing zeroes when target is already zero
  2025-04-11  1:04 ` [PATCH 3/6] mirror: Skip writing zeroes when target is already zero Eric Blake
@ 2025-04-15 15:59   ` Vladimir Sementsov-Ogievskiy
  2025-04-16 21:51     ` Eric Blake
  0 siblings, 1 reply; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-04-15 15:59 UTC (permalink / raw)
  To: Eric Blake, qemu-devel
  Cc: John Snow, Kevin Wolf, Hanna Reitz, open list:Block Jobs

On 11.04.25 04:04, Eric Blake wrote:
> When mirroring, the goal is to ensure that the destination reads the
> same as the source; this goal is met whether the destination is sparse
> or fully-allocated.  However, if the destination cannot efficiently
> write zeroes, then any time the mirror operation wants to copy zeroes
> from the source to the destination (either during the background over
> sparse regions when doing a full mirror, or in the foreground when the
> guest actively writes zeroes), we were causing the destination to
> fully allocate that portion of the disk, even if it already read as
> zeroes.
> 
> We could just teach mirror_co_zero() to do a block_status() probe of
> the destination, and skip the zeroes if the destination already reads
> as zero, but we know from past experience that block_status() calls
> are not always cheap (tmpfs, anyone?).  So this patch takes a slightly
> different approach: any time we have to transfer the full image,
> mirror_dirty_init() is _already_ doing a pre-zero pass over the entire
> destination.  Therefore, if we track which clusters of the destination
> are zero at any given moment, we don't have to do a block_status()
> call on the destination, but can instead just refer to the zero bitmap
> associated with the job.
> 
> With this patch, if I externally create a raw sparse destination file
> ('truncate --size=$N dst.raw'), connect it with QMP 'blockdev-add'
> while leaving it at the default "discard":"ignore", then run QMP
> 'blockdev-mirror' with "sync":"full", the destination remains sparse
> rather than fully allocated.
> 
> However, a raw destination file created with 'blockdev-create' still
> gets fully allocated, because more work is needed in file-posix to
> still identify reads-as-zeroes even when the first 4k has to be
> allocated to make alignment probing work.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   block/mirror.c | 94 +++++++++++++++++++++++++++++++++++++++++---------
>   1 file changed, 78 insertions(+), 16 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 2e1e14c8e7e..98da5a6dc27 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -73,6 +73,7 @@ typedef struct MirrorBlockJob {
>       size_t buf_size;
>       int64_t bdev_length;
>       unsigned long *cow_bitmap;
> +    unsigned long *zero_bitmap;
>       BdrvDirtyBitmap *dirty_bitmap;
>       BdrvDirtyBitmapIter *dbi;
>       uint8_t *buf;
> @@ -408,15 +409,32 @@ static void coroutine_fn mirror_co_read(void *opaque)
>   static void coroutine_fn jk(void *opaque)
>   {
>       MirrorOp *op = opaque;
> -    int ret;
> +    bool write_needed = true;
> +    int ret = 0;
> 
>       op->s->in_flight++;
>       op->s->bytes_in_flight += op->bytes;
>       *op->bytes_handled = op->bytes;
>       op->is_in_flight = true;
> 
> -    ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
> -                               op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
> +    if (op->s->zero_bitmap) {
> +        unsigned long last = (op->offset + op->bytes) / op->s->granularity;

Maybe, call it "end", not "last, as it's not last element of the range, but first after the range.

Also, seems we need still do DIV_ROUND_UP, for ..

> +        assert(QEMU_IS_ALIGNED(op->offset, op->s->granularity));
> +        assert(QEMU_IS_ALIGNED(op->bytes, op->s->granularity) ||
> +               op->offset + op->bytes == op->s->bdev_length);

.. ^ this case, when bytes is unaligned to granularity but aligned to bdev_length.

> +        if (find_next_zero_bit(op->s->zero_bitmap, last,
> +                               op->offset / op->s->granularity) == last) {
> +            write_needed = false;
> +        }
> +    }
> +    if (write_needed) {
> +        ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
> +                                   op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
> +    }
> +    if (ret >= 0 && op->s->zero_bitmap) {
> +        bitmap_set(op->s->zero_bitmap, op->offset / op->s->granularity,
> +                   op->bytes / op->s->granularity);

and here we want to align up bytes, for the corner case

> +    }

Also, I'm not sure, what guarantees we have in case of write-zeroes failure. Should we clear the bitmap in this case, like we do MIRROR_METHOD_COPY and MIRROR_METHOD_DISCARD below

>       mirror_write_complete(op, ret);
>   }
> 
> @@ -441,6 +459,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
>       Coroutine *co;
>       int64_t bytes_handled = -1;
> 
> +    assert(QEMU_IS_ALIGNED(offset, s->granularity));
> +    assert(QEMU_IS_ALIGNED(bytes, s->granularity) ||
> +           offset + bytes == s->bdev_length);
>       op = g_new(MirrorOp, 1);
>       *op = (MirrorOp){
>           .s              = s,
> @@ -452,12 +473,21 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
> 
>       switch (mirror_method) {
>       case MIRROR_METHOD_COPY:
> +        if (s->zero_bitmap) {
> +            bitmap_clear(s->zero_bitmap, offset / s->granularity,
> +                         bytes / s->granularity);

again, align up for corner case

> +        }
>           co = qemu_coroutine_create(mirror_co_read, op);
>           break;
>       case MIRROR_METHOD_ZERO:
> +        /* s->zero_bitmap handled in mirror_co_zero */
>           co = qemu_coroutine_create(mirror_co_zero, op);
>           break;
>       case MIRROR_METHOD_DISCARD:
> +        if (s->zero_bitmap) {
> +            bitmap_clear(s->zero_bitmap, offset / s->granularity,
> +                         bytes / s->granularity);
> +        }
>           co = qemu_coroutine_create(mirror_co_discard, op);
>           break;
>       default:
> @@ -851,10 +881,17 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
>       }
>       bdrv_graph_co_rdunlock();
> 
> -    if (s->zero_target && ret <= 0) {
> +    if (s->zero_target) {
> +        int64_t length;
> +
>           if (ret < 0) {
>               return ret;
>           }
> +        length = DIV_ROUND_UP(s->bdev_length, s->granularity);
> +        s->zero_bitmap = bitmap_new(length);
> +        if (ret > 0) {
> +            bitmap_set(s->zero_bitmap, 0, length);

hmm, we should not continue zeroing target in case of ret > 0.

I didn't like that we set ret in one if-block, and handle in another, but now it gets even more confusing.

Maybe, just move bdrv_co_is_zero_fast() call into big "if (s->zero_target) {" ?



> +        }
>           if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
>               bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
>               return 0;
> @@ -1169,6 +1206,7 @@ immediate_exit:
>       assert(s->in_flight == 0);
>       qemu_vfree(s->buf);
>       g_free(s->cow_bitmap);
> +    g_free(s->zero_bitmap);
>       g_free(s->in_flight_bitmap);
>       bdrv_dirty_iter_free(s->dbi);
> 
> @@ -1347,7 +1385,8 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
>   {
>       int ret;
>       size_t qiov_offset = 0;
> -    int64_t bitmap_offset, bitmap_end;
> +    int64_t dirty_bitmap_offset, dirty_bitmap_end;
> +    int64_t zero_bitmap_offset, zero_bitmap_end;
> 
>       if (!QEMU_IS_ALIGNED(offset, job->granularity) &&
>           bdrv_dirty_bitmap_get(job->dirty_bitmap, offset))
> @@ -1391,31 +1430,54 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
>       }
> 
>       /*
> -     * Tails are either clean or shrunk, so for bitmap resetting
> -     * we safely align the range down.
> +     * Tails are either clean or shrunk, so for dirty bitmap resetting
> +     * we safely align the range down.  But for zero bitmap, round range
> +     * up for checking or clearing, and down for setting.
>        */
> -    bitmap_offset = QEMU_ALIGN_UP(offset, job->granularity);
> -    bitmap_end = QEMU_ALIGN_DOWN(offset + bytes, job->granularity);
> -    if (bitmap_offset < bitmap_end) {
> -        bdrv_reset_dirty_bitmap(job->dirty_bitmap, bitmap_offset,
> -                                bitmap_end - bitmap_offset);
> +    dirty_bitmap_offset = QEMU_ALIGN_UP(offset, job->granularity);
> +    dirty_bitmap_end = QEMU_ALIGN_DOWN(offset + bytes, job->granularity);
> +    if (dirty_bitmap_offset < dirty_bitmap_end) {
> +        bdrv_reset_dirty_bitmap(job->dirty_bitmap, dirty_bitmap_offset,
> +                                dirty_bitmap_end - dirty_bitmap_offset);
>       }
> +    zero_bitmap_offset = offset / job->granularity;
> +    zero_bitmap_end = DIV_ROUND_UP(offset + bytes, job->granularity);
> 
>       job_progress_increase_remaining(&job->common.job, bytes);
>       job->active_write_bytes_in_flight += bytes;
> 
>       switch (method) {
>       case MIRROR_METHOD_COPY:
> +        if (job->zero_bitmap) {
> +            bitmap_clear(job->zero_bitmap, zero_bitmap_offset,
> +                         zero_bitmap_end - zero_bitmap_offset);
> +        }
>           ret = blk_co_pwritev_part(job->target, offset, bytes,
>                                     qiov, qiov_offset, flags);
>           break;
> 
>       case MIRROR_METHOD_ZERO:
> +        if (job->zero_bitmap) {
> +            if (find_next_zero_bit(job->zero_bitmap, zero_bitmap_end,
> +                                   zero_bitmap_offset) == zero_bitmap_end) {
> +                ret = 0;
> +                break;
> +            }
> +        }
>           assert(!qiov);
>           ret = blk_co_pwrite_zeroes(job->target, offset, bytes, flags);
> +        if (job->zero_bitmap && ret >= 0) {
> +            bitmap_set(job->zero_bitmap, dirty_bitmap_offset / job->granularity,
> +                       (dirty_bitmap_end - dirty_bitmap_offset) /
> +                       job->granularity);
> +        }

Same thing, probably we should clear the bitmap in case of write failure.

>           break;
> 
>       case MIRROR_METHOD_DISCARD:
> +        if (job->zero_bitmap) {
> +            bitmap_clear(job->zero_bitmap, zero_bitmap_offset,
> +                         zero_bitmap_end - zero_bitmap_offset);
> +        }
>           assert(!qiov);
>           ret = blk_co_pdiscard(job->target, offset, bytes);
>           break;
> @@ -1436,10 +1498,10 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
>            * at function start, and they must be still dirty, as we've locked
>            * the region for in-flight op.
>            */
> -        bitmap_offset = QEMU_ALIGN_DOWN(offset, job->granularity);
> -        bitmap_end = QEMU_ALIGN_UP(offset + bytes, job->granularity);
> -        bdrv_set_dirty_bitmap(job->dirty_bitmap, bitmap_offset,
> -                              bitmap_end - bitmap_offset);
> +        dirty_bitmap_offset = QEMU_ALIGN_DOWN(offset, job->granularity);
> +        dirty_bitmap_end = QEMU_ALIGN_UP(offset + bytes, job->granularity);
> +        bdrv_set_dirty_bitmap(job->dirty_bitmap, dirty_bitmap_offset,
> +                              dirty_bitmap_end - dirty_bitmap_offset);

Not really matter, but still, renaming in a separate patch would make this one a bit simpler.

>           qatomic_set(&job->actively_synced, false);
> 
>           action = mirror_error_action(job, false, -ret);

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/6] file-posix: Allow lseek at offset 0 when !want_zero
  2025-04-15 15:22         ` Eric Blake
@ 2025-04-15 16:11           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-04-15 16:11 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, Kevin Wolf, Hanna Reitz, open list:raw,
	Denis V. Lunev

On 15.04.25 18:22, Eric Blake wrote:
> On Tue, Apr 15, 2025 at 03:37:39PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 14.04.25 21:12, Eric Blake wrote:
>>> On Mon, Apr 14, 2025 at 08:05:21PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> On 11.04.25 04:04, Eric Blake wrote:
>>>>> The 'want_zero' parameter to raw_co_block_status() was added so that
>>>>> we can avoid potentially time-consuming lseek(SEEK_DATA) calls
>>>>> throughout the file (working around poor filesystems that have O(n)
>>>>> rather than O(1) extent probing).  But when it comes to learning if a
>>>>> file is completely sparse (for example, it was just created), always
>>>>> claiming that a file is all data without even checking offset 0 breaks
>>>>> what would otherwise be attempts at useful optimizations for a
>>>>> known-zero mirror destination.
>>>>>
>>>>> Note that this allows file-posix to report a file as completely zero
>>>>> if it was externally created (such as via 'truncate --size=$n file')
>>>>> as entirely sparse; however, it does NOT work for files created
>>>>> internally by blockdev-create.  That's because blockdev-create
>>>>> intentionally does a sequence of truncate(0), truncate(size),
>>>>> allocate_first_block(), in order to make it possible for gluster on
>>>>> XFS to probe the sector size for direct I/O (which doesn't work if the
>>>>> first block is sparse).  That will be addressed in a later patch.
>>>>>
>>>>> Signed-off-by: Eric Blake<eblake@redhat.com>
>>>>> ---
>>>>>     block/file-posix.c | 9 ++++++++-
>>>>>     1 file changed, 8 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/block/file-posix.c b/block/file-posix.c
>>>>> index 56d1972d156..67e83528cf5 100644
>>>>> --- a/block/file-posix.c
>>>>> +++ b/block/file-posix.c
>>>>> @@ -3217,7 +3217,14 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
>>>>>             return ret;
>>>>>         }
>>>>>
>>>>> -    if (!want_zero) {
>>>>> +    /*
>>>>> +     * If want_zero is clear, then the caller wants speed over
>>>>> +     * accuracy, and the only place where SEEK_DATA should be
>>>>> +     * attempted is at the start of the file to learn if the file has
>>>>> +     * any data at all (anywhere else, just blindly claim the entire
>>>>> +     * file is data).
>>>>> +     */
>>>>> +    if (!want_zero && offset) {
>>>>>             *pnum = bytes;
>>>>>             *map = offset;
>>>>>             *file = bs;
>>>> Looks like a hack. So we have bdrv_co_is_zero_fast() which do pass want_zero=false to block-status. But in case of mirror, which want to check the whole disk, we actually want want_zero=true, and detect it by offset=0..
>>>>
>>>> Isn't it better to add a kind of bdrv_is_zero_middle_speed() (which means, don't try to read the data to check, but be free to use suboptimal lseek call or something like this), which will pass want_zero=true, and use it from mirror? Mirror case differs from usage in qcow2 exactly by the fact that we call it only once.
>>> Which is exactly why I wrote patch 4/6 turning the want_zero bool into
>>> an enum so that we are being more explicit in WHY block status is
>>> being requested.
>>
>> Hmm, and this makes more strange that this hack for file-posix is kept after it. Don't we have other block drivers, where we should behave similarly in block_status for offset=0? Or I mean, could we just use different block-status modes in qcow2 and mirror, when call bdrv_co_is_zero_fast(), and don't handle offset==0 specially in file-posix?
> 
> I'll need to ajust this in v2 of my series.
> 
> The problem I'm trying to resolve: libvirt migration with
> --migrate-disks-detect-zeroes is causing the destination to become
> fully allocated if the destination does not use "discard":"unmap",
> whereas with QEMU 9.0 it did not.  The change was commit d05ae948c,
> where we fixed a problem with punching holes into the mirror
> destination even when the user had requested the file to not shrink;
> but it means that a file that is not allowed to shrink is now fully
> allocated rather than left sparse even when leaving it sparse would
> still be an accurate mirror result with less I/O.
> 
> And it turns out that when libvirt is driving the mirror into a raw
> destination, the mirroring is done over NBD, rather than directly to a
> file-posix destination.  Coupled with the oddity that 'qemu-img create
> -f raw' ends up pre-allocating up to 64k of the image with all zero
> contents to make alignment probing easier, it is no longer possible to
> quickly see that the entire NBD image reads as zero, just as it was
> not possible to quickly see that for file-posix.  So, in v2 of my
> patch series, I think I need to hoist the logic that I added to
> file-posix.c that reads an initial small sector in order to determine
> if the entire image is zero to instead live in io.c to be used across
> all protocols, NBD included.
> 
> So if you don't think offset=0 should be given any special treatment
> in file-posix, but instead have generic code in io.c that checks if an
> entire image has block_status of sparse and/or just a small initial
> non-sparse section that can be manually checked for zero contents,
> then that should still fix the issue of mirroring to a sparse
> destination with "discard":"ignore" but keeping the destination
> sparse.
> 

Thanks for explanation! Yes I think making such specific checks about first sector in some kind of generic bdrv_is_all_zeroes() would be cleaner.

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] mirror: Skip pre-zeroing destination if it is already zero
  2025-04-11 19:54   ` Vladimir Sementsov-Ogievskiy
@ 2025-04-16 21:42     ` Eric Blake
  0 siblings, 0 replies; 23+ messages in thread
From: Eric Blake @ 2025-04-16 21:42 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, John Snow, Kevin Wolf, Hanna Reitz,
	open list:Block Jobs

On Fri, Apr 11, 2025 at 10:54:57PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 11.04.25 04:04, Eric Blake wrote:
> > When doing a sync=full mirroring, QMP drive-mirror requests full
> > zeroing if it did not just create the destination, and blockdev-mirror
> > requests full zeroing unconditionally.  This is because during a full
> > sync, we must ensure that the portions of the disk that are not
> > otherwise touched by the source still read as zero upon completion.
...

> > 
> > +++ b/block/mirror.c
> > @@ -841,14 +841,20 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
> >       int64_t offset;
> >       BlockDriverState *bs;
> >       BlockDriverState *target_bs = blk_bs(s->target);
> > -    int ret = -1;
> > +    int ret;
> 
> I think, it was to avoid Coverity false-positive around further code
> 
>         WITH_GRAPH_RDLOCK_GUARD() {
>             ret = bdrv_co_is_allocated_above(bs, s->base_overlay, true, offset,
>                                              bytes, &count);
>         }
>         if (ret < 0) {
>             return ret;
>         }
> 
> which you don't touch here. I think "= -1;" should be kept. Or I missed static analyzes revolution (if so, it should be mentioned in commit message).

Fair enough, for keeping something to avoid Coverity false positives.
HOWEVER, initializing to -1 is wrong, since, as you just showed,
elsewhere in the function we return -errno on failure, and -1 is not
always the right error.  If I'm going to keep an initializer, it will
be -EIO rather than -1.  Furthermore, if I understand
WITH_GRAPH_RDLOCK_GUARD() correctly, we could hoist the 'if (ret < 0)'
check inside the guarded region and still get the same behavior, but
with a different visibility to the analyzer.

> 
> >       int64_t count;
> > 
> >       bdrv_graph_co_rdlock();
> >       bs = s->mirror_top_bs->backing->bs;
> > +    if (s->zero_target) {
> > +        ret = bdrv_co_is_zero_fast(target_bs, 0, s->bdev_length);
> > +    }
> >       bdrv_graph_co_rdunlock();

The rdlock is important, but rewriting it into the
WITH_GRAPH_RDLOCK_GUARD() scope does make it look nicer.  Adjusting
that for v2.

> > 
> > -    if (s->zero_target) {
> > +    if (s->zero_target && ret <= 0) {
> > +        if (ret < 0) {
> > +            return ret;
> > +        }
> >           if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
> >               bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
> >               return 0;

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] mirror: Skip writing zeroes when target is already zero
  2025-04-15 15:59   ` Vladimir Sementsov-Ogievskiy
@ 2025-04-16 21:51     ` Eric Blake
  2025-04-21  6:15       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Blake @ 2025-04-16 21:51 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, John Snow, Kevin Wolf, Hanna Reitz,
	open list:Block Jobs

On Tue, Apr 15, 2025 at 06:59:15PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 11.04.25 04:04, Eric Blake wrote:
> > When mirroring, the goal is to ensure that the destination reads the
> > same as the source; this goal is met whether the destination is sparse
> > or fully-allocated.  However, if the destination cannot efficiently
> > write zeroes, then any time the mirror operation wants to copy zeroes
> > from the source to the destination (either during the background over
> > sparse regions when doing a full mirror, or in the foreground when the
> > guest actively writes zeroes), we were causing the destination to
> > fully allocate that portion of the disk, even if it already read as
> > zeroes.
> > 
> > -    ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
> > -                               op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
> > +    if (op->s->zero_bitmap) {
> > +        unsigned long last = (op->offset + op->bytes) / op->s->granularity;
> 
> Maybe, call it "end", not "last, as it's not last element of the range, but first after the range.

Works for me.

> 
> Also, seems we need still do DIV_ROUND_UP, for ..
> 
> > +        assert(QEMU_IS_ALIGNED(op->offset, op->s->granularity));
> > +        assert(QEMU_IS_ALIGNED(op->bytes, op->s->granularity) ||
> > +               op->offset + op->bytes == op->s->bdev_length);
> 
> .. ^ this case, when bytes is unaligned to granularity but aligned to bdev_length.

Hmm, good point.  I'll have to revisit the logic and make sure it
still works in the case of a final partial region.

> 
> > +        if (find_next_zero_bit(op->s->zero_bitmap, last,
> > +                               op->offset / op->s->granularity) == last) {
> > +            write_needed = false;
> > +        }
> > +    }
> > +    if (write_needed) {
> > +        ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
> > +                                   op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
> > +    }
> > +    if (ret >= 0 && op->s->zero_bitmap) {
> > +        bitmap_set(op->s->zero_bitmap, op->offset / op->s->granularity,
> > +                   op->bytes / op->s->granularity);
> 
> and here we want to align up bytes, for the corner case
> 
> > +    }
> 
> Also, I'm not sure, what guarantees we have in case of write-zeroes failure. Should we clear the bitmap in this case, like we do MIRROR_METHOD_COPY and MIRROR_METHOD_DISCARD below

My goal was to clear the bitmap before any action that (potentially)
destroys zero regions (writes and discards), even if that action
fails; but only set the bitmap after success on any action that
creates zero regions (write-zero).  If the write-zero fails, and the
bitmap was already set, it doesn't matter that we left the bitmap set
(a write zeroes that fails AND causes the disk to no longer read as
zero should not happen); otherwise, we are correct in leaving the
bitmap unset on failure (even if the write-zeros partially succeeded,
the pessimistic answer of assuming a region is non-zero is always
correct even if sometimes slower).

> 
> >       mirror_write_complete(op, ret);
> >   }
> > 
> > @@ -441,6 +459,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
> >       Coroutine *co;
> >       int64_t bytes_handled = -1;
> > 
> > +    assert(QEMU_IS_ALIGNED(offset, s->granularity));
> > +    assert(QEMU_IS_ALIGNED(bytes, s->granularity) ||
> > +           offset + bytes == s->bdev_length);
> >       op = g_new(MirrorOp, 1);
> >       *op = (MirrorOp){
> >           .s              = s,
> > @@ -452,12 +473,21 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
> > 
> >       switch (mirror_method) {
> >       case MIRROR_METHOD_COPY:
> > +        if (s->zero_bitmap) {
> > +            bitmap_clear(s->zero_bitmap, offset / s->granularity,
> > +                         bytes / s->granularity);
> 
> again, align up for corner case
> 
> > +        }
> >           co = qemu_coroutine_create(mirror_co_read, op);
> >           break;
> >       case MIRROR_METHOD_ZERO:
> > +        /* s->zero_bitmap handled in mirror_co_zero */
> >           co = qemu_coroutine_create(mirror_co_zero, op);
> >           break;
> >       case MIRROR_METHOD_DISCARD:
> > +        if (s->zero_bitmap) {
> > +            bitmap_clear(s->zero_bitmap, offset / s->granularity,
> > +                         bytes / s->granularity);
> > +        }
> >           co = qemu_coroutine_create(mirror_co_discard, op);
> >           break;
> >       default:
> > @@ -851,10 +881,17 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
> >       }
> >       bdrv_graph_co_rdunlock();
> > 
> > -    if (s->zero_target && ret <= 0) {
> > +    if (s->zero_target) {
> > +        int64_t length;
> > +
> >           if (ret < 0) {
> >               return ret;
> >           }
> > +        length = DIV_ROUND_UP(s->bdev_length, s->granularity);
> > +        s->zero_bitmap = bitmap_new(length);
> > +        if (ret > 0) {
> > +            bitmap_set(s->zero_bitmap, 0, length);
> 
> hmm, we should not continue zeroing target in case of ret > 0.

Correct. If ret > 0, we set zero_bitmap to record that starting state,
and then want to proceed on with the rest of the code that populates
only the non-zero portions of the dirty bitmap.

> 
> I didn't like that we set ret in one if-block, and handle in another, but now it gets even more confusing.
> 
> Maybe, just move bdrv_co_is_zero_fast() call into big "if (s->zero_target) {" ?

That would require grabbing another RDLOCK - but I'll try and see if
that makes things easier to read.

...
> > +        if (job->zero_bitmap && ret >= 0) {
> > +            bitmap_set(job->zero_bitmap, dirty_bitmap_offset / job->granularity,
> > +                       (dirty_bitmap_end - dirty_bitmap_offset) /
> > +                       job->granularity);
> > +        }
> 
> Same thing, probably we should clear the bitmap in case of write failure.

Why? Either the bitmap was already clear, or we had a write failure on
writing zeroes on top of zeroes but the region should still read as
zeroes.

> 
> >           break;
> > 
> >       case MIRROR_METHOD_DISCARD:
> > +        if (job->zero_bitmap) {
> > +            bitmap_clear(job->zero_bitmap, zero_bitmap_offset,
> > +                         zero_bitmap_end - zero_bitmap_offset);
> > +        }
> >           assert(!qiov);
> >           ret = blk_co_pdiscard(job->target, offset, bytes);
> >           break;
> > @@ -1436,10 +1498,10 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
> >            * at function start, and they must be still dirty, as we've locked
> >            * the region for in-flight op.
> >            */
> > -        bitmap_offset = QEMU_ALIGN_DOWN(offset, job->granularity);
> > -        bitmap_end = QEMU_ALIGN_UP(offset + bytes, job->granularity);
> > -        bdrv_set_dirty_bitmap(job->dirty_bitmap, bitmap_offset,
> > -                              bitmap_end - bitmap_offset);
> > +        dirty_bitmap_offset = QEMU_ALIGN_DOWN(offset, job->granularity);
> > +        dirty_bitmap_end = QEMU_ALIGN_UP(offset + bytes, job->granularity);
> > +        bdrv_set_dirty_bitmap(job->dirty_bitmap, dirty_bitmap_offset,
> > +                              dirty_bitmap_end - dirty_bitmap_offset);
> 
> Not really matter, but still, renaming in a separate patch would make this one a bit simpler.

Okay, I'll split the variable rename.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] mirror: Skip writing zeroes when target is already zero
  2025-04-16 21:51     ` Eric Blake
@ 2025-04-21  6:15       ` Vladimir Sementsov-Ogievskiy
  2025-04-21 14:41         ` Eric Blake
  0 siblings, 1 reply; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-04-21  6:15 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, John Snow, Kevin Wolf, Hanna Reitz,
	open list:Block Jobs

On 17.04.25 00:51, Eric Blake wrote:
> (a write zeroes that fails AND causes the disk to no longer read as
> zero should not happen)

I don't know, is there such a contract? write-zeroes may fallback to write(), which only state that:

        An error return value while performing write() using direct I/O
        does not mean the entire write has failed.  Partial data may be
        written and the data at the file offset on which the write() was
        attempted should be considered inconsistent.

So, I used to think that on failed write nothing is guaranteed.

What do we lose if we just unset the bitmap before write-zeroes, and set it again in case of success?

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] mirror: Skip writing zeroes when target is already zero
  2025-04-21  6:15       ` Vladimir Sementsov-Ogievskiy
@ 2025-04-21 14:41         ` Eric Blake
  2025-04-22  9:08           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 23+ messages in thread
From: Eric Blake @ 2025-04-21 14:41 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel, John Snow, Kevin Wolf, Hanna Reitz,
	open list:Block Jobs

On Mon, Apr 21, 2025 at 09:15:33AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 17.04.25 00:51, Eric Blake wrote:
> > (a write zeroes that fails AND causes the disk to no longer read as
> > zero should not happen)
> 
> I don't know, is there such a contract? write-zeroes may fallback to write(), which only state that:
> 
>        An error return value while performing write() using direct I/O
>        does not mean the entire write has failed.  Partial data may be
>        written and the data at the file offset on which the write() was
>        attempted should be considered inconsistent.
> 
> So, I used to think that on failed write nothing is guaranteed.
> 
> What do we lose if we just unset the bitmap before write-zeroes, and set it again in case of success?
>

I still don't see the point.  Either the cluster was already non-zero
before the failed write-zero (so there's no bit to pre-clear); or the
cluster was already zero before the failed write-zero, and any failure
that corrupts the disk by actually turning zeroes into non-zero is not
worth worrying about, so pre-clearing the bit is not going to make
things any better.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/6] mirror: Skip writing zeroes when target is already zero
  2025-04-21 14:41         ` Eric Blake
@ 2025-04-22  9:08           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 23+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-04-22  9:08 UTC (permalink / raw)
  To: Eric Blake
  Cc: qemu-devel, John Snow, Kevin Wolf, Hanna Reitz,
	open list:Block Jobs

On 21.04.25 17:41, Eric Blake wrote:
> On Mon, Apr 21, 2025 at 09:15:33AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 17.04.25 00:51, Eric Blake wrote:
>>> (a write zeroes that fails AND causes the disk to no longer read as
>>> zero should not happen)
>>
>> I don't know, is there such a contract? write-zeroes may fallback to write(), which only state that:
>>
>>         An error return value while performing write() using direct I/O
>>         does not mean the entire write has failed.  Partial data may be
>>         written and the data at the file offset on which the write() was
>>         attempted should be considered inconsistent.
>>
>> So, I used to think that on failed write nothing is guaranteed.
>>
>> What do we lose if we just unset the bitmap before write-zeroes, and set it again in case of success?
>>
> 
> I still don't see the point.  Either the cluster was already non-zero
> before the failed write-zero (so there's no bit to pre-clear);

[..]

> cluster was already zero before the failed write-zero, and any failure
> that corrupts the disk by actually turning zeroes into non-zero

Yes, I mean this case

> is not
> worth worrying about

And I don't follow why we should not care.

Usually we have no assumptions about data at offset, if write(offset, ..) failed. We just retry write.

Here we do this assumption "if there were zeroes, they will not be broken by writing zeroes even on failure scenarios", for any kind of block-driver and underlying file system.

You say "corrupts the disk". Of course, if disk is corrupted - everything is broken, and we should not care. The result of the mirror would be incorrect anyway. But we don't consider any EIO as disk corruption. We do retry the write and continue mirror operation.

Moreover, if EIO is returned due to some broken sector, we may hope that retrying the write will fix it, probably some other sector will be allocated by FS (I hope I'm not talking nonsense). But relying on the data still read as zero without rewriting would be wrong.

>, so pre-clearing the bit is not going to make
> things any better.
> 

-- 
Best regards,
Vladimir



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2025-04-22  9:09 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-11  1:04 [PATCH 0/6] Make blockdev-mirror dest sparse in more cases Eric Blake
2025-04-11  1:04 ` [PATCH 1/6] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
2025-04-11 19:54   ` Vladimir Sementsov-Ogievskiy
2025-04-16 21:42     ` Eric Blake
2025-04-14 16:29   ` Vladimir Sementsov-Ogievskiy
2025-04-11  1:04 ` [PATCH 2/6] file-posix: Allow lseek at offset 0 when !want_zero Eric Blake
2025-04-14 17:05   ` Vladimir Sementsov-Ogievskiy
2025-04-14 18:12     ` Eric Blake
2025-04-15 12:37       ` Vladimir Sementsov-Ogievskiy
2025-04-15 15:22         ` Eric Blake
2025-04-15 16:11           ` Vladimir Sementsov-Ogievskiy
2025-04-11  1:04 ` [PATCH 3/6] mirror: Skip writing zeroes when target is already zero Eric Blake
2025-04-15 15:59   ` Vladimir Sementsov-Ogievskiy
2025-04-16 21:51     ` Eric Blake
2025-04-21  6:15       ` Vladimir Sementsov-Ogievskiy
2025-04-21 14:41         ` Eric Blake
2025-04-22  9:08           ` Vladimir Sementsov-Ogievskiy
2025-04-11  1:04 ` [PATCH 4/6] block: Expand block status mode from bool to enum Eric Blake
2025-04-11  1:04 ` [PATCH 5/6] file-posix: Recognize blockdev-create file as starting all zero Eric Blake
2025-04-11  1:04 ` [PATCH 6/6] tests: Add iotest mirror-sparse for recent patches Eric Blake
2025-04-11 14:48 ` [PATCH 7/6] mirror: Allow QMP override to declare target already zero Eric Blake
2025-04-12  4:56   ` Markus Armbruster
2025-04-11 20:15 ` [PATCH 0/6] Make blockdev-mirror dest sparse in more cases Eric Blake

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).