* [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases
@ 2025-04-17 18:39 Eric Blake
2025-04-17 18:39 ` [PATCH v2 01/11] block: Expand block status mode from bool to enum Eric Blake
` (12 more replies)
0 siblings, 13 replies; 36+ messages in thread
From: Eric Blake @ 2025-04-17 18:39 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, vsementsov
V1 was here: https://lists.gnu.org/archive/html/qemu-devel/2025-04/msg01654.html
In v2:
- Rearrange and split several patches
- Fewer changes to file-posix.c; instead,
- New bdrv_co_is_all_zeroes() in io.c, which works on both file-posix and NBD
- Clean up several corner cases spotted during review
- Better iotest coverage, required pulling in Andrey's patch for disk_usage
Andrey Drobyshev (1):
iotests/common.rc: add disk_usage function
Eric Blake (10):
block: Expand block status mode from bool to enum
file-posix: Handle zero block status hint better
block: Let bdrv_co_is_zero_fast consolidate adjacent extents
block: Add new bdrv_co_is_all_zeroes() function
iotests: Improve iotest 194 to mirror data
mirror: Minor refactoring
mirror: Skip pre-zeroing destination if it is already zero
mirror: Skip writing zeroes when target is already zero
tests: Add iotest mirror-sparse for recent patches
mirror: Allow QMP override to declare target already zero
qapi/block-core.json | 8 +-
block/coroutines.h | 4 +-
include/block/block-common.h | 26 ++++
include/block/block-io.h | 2 +
include/block/block_int-common.h | 25 ++--
include/block/block_int-global-state.h | 3 +-
include/block/block_int-io.h | 4 +-
block/io.c | 120 ++++++++++++----
block/blkdebug.c | 6 +-
block/copy-before-write.c | 4 +-
block/file-posix.c | 5 +-
block/gluster.c | 4 +-
block/iscsi.c | 6 +-
block/mirror.c | 121 +++++++++++++---
block/nbd.c | 4 +-
block/null.c | 6 +-
block/parallels.c | 6 +-
block/qcow.c | 2 +-
block/qcow2.c | 6 +-
block/qed.c | 6 +-
block/quorum.c | 4 +-
block/raw-format.c | 4 +-
block/rbd.c | 6 +-
block/snapshot-access.c | 4 +-
block/vdi.c | 4 +-
block/vmdk.c | 2 +-
block/vpc.c | 2 +-
block/vvfat.c | 6 +-
blockdev.c | 18 ++-
tests/unit/test-block-iothread.c | 4 +-
tests/qemu-iotests/common.rc | 6 +
tests/qemu-iotests/194 | 1 +
tests/qemu-iotests/250 | 5 -
tests/qemu-iotests/tests/mirror-sparse | 109 +++++++++++++++
tests/qemu-iotests/tests/mirror-sparse.out | 153 +++++++++++++++++++++
35 files changed, 571 insertions(+), 125 deletions(-)
create mode 100755 tests/qemu-iotests/tests/mirror-sparse
create mode 100644 tests/qemu-iotests/tests/mirror-sparse.out
--
2.49.0
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 01/11] block: Expand block status mode from bool to enum
2025-04-17 18:39 [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
@ 2025-04-17 18:39 ` Eric Blake
2025-04-17 20:17 ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 02/11] file-posix: Handle zero block status hint better Eric Blake
` (11 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2025-04-17 18:39 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-block, vsementsov, Kevin Wolf, Hanna Reitz, John Snow,
Stefan Hajnoczi, Fam Zheng, Ronnie Sahlberg, Paolo Bonzini,
Peter Lieven, Denis V. Lunev, Alberto Garcia, Ilya Dryomov,
Stefan Weil, open list:GLUSTER
This patch is purely mechanical, changing bool want_zero into a new
enum BlockStatusMode. As of this patch, all implementations are
unchanged (the old want_zero==true is now mode==BDRV_BSTAT_PRECISE),
but the callers in io.c are set up so that future patches will be able
to differente between whether the caller cares more about allocation
or about reads-as-zero, for driver implementations that will actually
want to behave differently for those more-specific hints.
As for the background why this patch is useful: right now, the
file-posix driver recognizes that if allocation is being queried, the
entire image can be reported as allocated (there is no backing file to
refer to) - but this throws away information on whether the entire
image reads as zero (trivially true if lseek(SEEK_HOLE) at offset 0
returns -ENXIO, a bit more complicated to prove if the raw file was
created with 'qemu-img create' since we intentionally allocate a small
chunk of all-zero data to help with alignment probing). The next
patches will add a generic algorithm for seeing if an entire file
reads as zeroes, as well as tweak the file-posix driver to react to
the new hints.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
block/coroutines.h | 4 +--
include/block/block-common.h | 26 ++++++++++++++++
include/block/block_int-common.h | 25 +++++++++-------
include/block/block_int-io.h | 4 +--
block/io.c | 51 ++++++++++++++++----------------
block/blkdebug.c | 6 ++--
block/copy-before-write.c | 4 +--
block/file-posix.c | 4 +--
block/gluster.c | 4 +--
block/iscsi.c | 6 ++--
block/nbd.c | 4 +--
block/null.c | 6 ++--
block/parallels.c | 6 ++--
block/qcow.c | 2 +-
block/qcow2.c | 6 ++--
block/qed.c | 6 ++--
block/quorum.c | 4 +--
block/raw-format.c | 4 +--
block/rbd.c | 6 ++--
block/snapshot-access.c | 4 +--
block/vdi.c | 4 +--
block/vmdk.c | 2 +-
block/vpc.c | 2 +-
block/vvfat.c | 6 ++--
tests/unit/test-block-iothread.c | 2 +-
25 files changed, 114 insertions(+), 84 deletions(-)
diff --git a/block/coroutines.h b/block/coroutines.h
index 79e5efbf752..c8323aa67e6 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -47,7 +47,7 @@ int coroutine_fn GRAPH_RDLOCK
bdrv_co_common_block_status_above(BlockDriverState *bs,
BlockDriverState *base,
bool include_base,
- bool want_zero,
+ enum BlockStatusMode mode,
int64_t offset,
int64_t bytes,
int64_t *pnum,
@@ -78,7 +78,7 @@ int co_wrapper_mixed_bdrv_rdlock
bdrv_common_block_status_above(BlockDriverState *bs,
BlockDriverState *base,
bool include_base,
- bool want_zero,
+ enum BlockStatusMode mode,
int64_t offset,
int64_t bytes,
int64_t *pnum,
diff --git a/include/block/block-common.h b/include/block/block-common.h
index 0b831ef87b1..619e75b9c8d 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -508,6 +508,32 @@ enum BdrvChildRoleBits {
| BDRV_CHILD_PRIMARY,
};
+/* Modes for block status calls */
+enum BlockStatusMode {
+ /*
+ * Status should be as accurate as possible: _OFFSET_VALID
+ * and_OFFSET_ZERO should each be set where efficiently possible,
+ * extents may be smaller, and iteration through the entire block
+ * device may take more calls.
+ */
+ BDRV_BSTAT_PRECISE,
+
+ /*
+ * The caller is primarily concerned about overall allocation:
+ * favor larger *pnum, perhaps by coalescing extents and reporting
+ * _DATA instead of _ZERO, and without needing to read data or
+ * bothering with _OFFSET_VALID.
+ */
+ BDRV_BSTAT_ALLOCATED,
+
+ /*
+ * The caller is primarily concerned about whether the device
+ * reads as zero: favor a result of _ZERO, even if it requires
+ * reading a few sectors to verify, without needing _OFFSET_VALID.
+ */
+ BDRV_BSTAT_ZERO,
+};
+
/* Mask of BdrvChildRoleBits values */
typedef unsigned int BdrvChildRole;
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index ebb4e56a503..1fd94b2b568 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -610,13 +610,16 @@ struct BlockDriver {
* and/or BDRV_BLOCK_RAW; if the current layer defers to a backing
* layer, the result should be 0 (and not BDRV_BLOCK_ZERO). See
* block.h for the overall meaning of the bits. As a hint, the
- * flag want_zero is true if the caller cares more about precise
- * mappings (favor accurate _OFFSET_VALID/_ZERO) or false for
- * overall allocation (favor larger *pnum, perhaps by reporting
- * _DATA instead of _ZERO). The block layer guarantees input
- * clamped to bdrv_getlength() and aligned to request_alignment,
- * as well as non-NULL pnum, map, and file; in turn, the driver
- * must return an error or set pnum to an aligned non-zero value.
+ * flag @mode is BDRV_BSTAT_PRECISE if the caller cares more about
+ * precise mappings (favor accurate _OFFSET_VALID/_ZERO),
+ * BDRV_BSTAT_ALLOCATED for overall allocation (favor larger
+ * *pnum, perhaps by reporting _DATA instead of _ZERO), or
+ * BDRV_BSTAT_ZERO for overall reads-as-zero (favor _ZERO, even if
+ * it requires reading a few sectors to verify). The block layer
+ * guarantees input clamped to bdrv_getlength() and aligned to
+ * request_alignment, as well as non-NULL pnum, map, and file; in
+ * turn, the driver must return an error or set pnum to an aligned
+ * non-zero value.
*
* Note that @bytes is just a hint on how big of a region the
* caller wants to inspect. It is not a limit on *pnum.
@@ -628,8 +631,8 @@ struct BlockDriver {
* to clamping *pnum for return to its caller.
*/
int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_block_status)(
- BlockDriverState *bs,
- bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
+ BlockDriverState *bs, enum BlockStatusMode mode,
+ int64_t offset, int64_t bytes, int64_t *pnum,
int64_t *map, BlockDriverState **file);
/*
@@ -653,8 +656,8 @@ struct BlockDriver {
QEMUIOVector *qiov, size_t qiov_offset);
int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_snapshot_block_status)(
- BlockDriverState *bs, bool want_zero, int64_t offset, int64_t bytes,
- int64_t *pnum, int64_t *map, BlockDriverState **file);
+ BlockDriverState *bs, enum BlockStatusMode mode, int64_t offset,
+ int64_t bytes, int64_t *pnum, int64_t *map, BlockDriverState **file);
int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_pdiscard_snapshot)(
BlockDriverState *bs, int64_t offset, int64_t bytes);
diff --git a/include/block/block_int-io.h b/include/block/block_int-io.h
index 4a7cf2b4fdc..e019e81fa0c 100644
--- a/include/block/block_int-io.h
+++ b/include/block/block_int-io.h
@@ -38,8 +38,8 @@
int coroutine_fn GRAPH_RDLOCK bdrv_co_preadv_snapshot(BdrvChild *child,
int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t qiov_offset);
int coroutine_fn GRAPH_RDLOCK bdrv_co_snapshot_block_status(
- BlockDriverState *bs, bool want_zero, int64_t offset, int64_t bytes,
- int64_t *pnum, int64_t *map, BlockDriverState **file);
+ BlockDriverState *bs, enum BlockStatusMode mode, int64_t offset,
+ int64_t bytes, int64_t *pnum, int64_t *map, BlockDriverState **file);
int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard_snapshot(BlockDriverState *bs,
int64_t offset, int64_t bytes);
diff --git a/block/io.c b/block/io.c
index 1ba8d1aeea1..73c96084e62 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2364,10 +2364,8 @@ int bdrv_flush_all(void)
* Drivers not implementing the functionality are assumed to not support
* backing files, hence all their sectors are reported as allocated.
*
- * If 'want_zero' is true, the caller is querying for mapping
- * purposes, with a focus on valid BDRV_BLOCK_OFFSET_VALID, _DATA, and
- * _ZERO where possible; otherwise, the result favors larger 'pnum',
- * with a focus on accurate BDRV_BLOCK_ALLOCATED.
+ * 'mode' serves as a hint as to which results are favored; see enum
+ * BlockStatusMode for details of the supported modes.
*
* If 'offset' is beyond the end of the disk image the return value is
* BDRV_BLOCK_EOF and 'pnum' is set to 0.
@@ -2387,7 +2385,7 @@ int bdrv_flush_all(void)
* set to the host mapping and BDS corresponding to the guest offset.
*/
static int coroutine_fn GRAPH_RDLOCK
-bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
+bdrv_co_do_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
int64_t offset, int64_t bytes,
int64_t *pnum, int64_t *map, BlockDriverState **file)
{
@@ -2476,7 +2474,7 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
local_file = bs;
local_map = aligned_offset;
} else {
- ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
+ ret = bs->drv->bdrv_co_block_status(bs, mode, aligned_offset,
aligned_bytes, pnum, &local_map,
&local_file);
@@ -2488,10 +2486,10 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
* the cache requires an RCU update, so double check here to avoid
* such an update if possible.
*
- * Check want_zero, because we only want to update the cache when we
+ * Check mode, because we only want to update the cache when we
* have accurate information about what is zero and what is data.
*/
- if (want_zero &&
+ if (mode == BDRV_BSTAT_PRECISE &&
ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
QLIST_EMPTY(&bs->children))
{
@@ -2548,7 +2546,7 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
if (ret & BDRV_BLOCK_RAW) {
assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
- ret = bdrv_co_do_block_status(local_file, want_zero, local_map,
+ ret = bdrv_co_do_block_status(local_file, mode, local_map,
*pnum, pnum, &local_map, &local_file);
goto out;
}
@@ -2560,7 +2558,7 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
if (!cow_bs) {
ret |= BDRV_BLOCK_ZERO;
- } else if (want_zero) {
+ } else if (mode == BDRV_BSTAT_PRECISE) {
int64_t size2 = bdrv_co_getlength(cow_bs);
if (size2 >= 0 && offset >= size2) {
@@ -2569,14 +2567,14 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
}
}
- if (want_zero && ret & BDRV_BLOCK_RECURSE &&
+ if (mode == BDRV_BSTAT_PRECISE && ret & BDRV_BLOCK_RECURSE &&
local_file && local_file != bs &&
(ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
(ret & BDRV_BLOCK_OFFSET_VALID)) {
int64_t file_pnum;
int ret2;
- ret2 = bdrv_co_do_block_status(local_file, want_zero, local_map,
+ ret2 = bdrv_co_do_block_status(local_file, mode, local_map,
*pnum, &file_pnum, NULL, NULL);
if (ret2 >= 0) {
/* Ignore errors. This is just providing extra information, it
@@ -2627,7 +2625,7 @@ int coroutine_fn
bdrv_co_common_block_status_above(BlockDriverState *bs,
BlockDriverState *base,
bool include_base,
- bool want_zero,
+ enum BlockStatusMode mode,
int64_t offset,
int64_t bytes,
int64_t *pnum,
@@ -2654,7 +2652,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
return 0;
}
- ret = bdrv_co_do_block_status(bs, want_zero, offset, bytes, pnum,
+ ret = bdrv_co_do_block_status(bs, mode, offset, bytes, pnum,
map, file);
++*depth;
if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
@@ -2671,7 +2669,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
for (p = bdrv_filter_or_cow_bs(bs); include_base || p != base;
p = bdrv_filter_or_cow_bs(p))
{
- ret = bdrv_co_do_block_status(p, want_zero, offset, bytes, pnum,
+ ret = bdrv_co_do_block_status(p, mode, offset, bytes, pnum,
map, file);
++*depth;
if (ret < 0) {
@@ -2734,7 +2732,8 @@ int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
BlockDriverState **file)
{
IO_CODE();
- return bdrv_co_common_block_status_above(bs, base, false, true, offset,
+ return bdrv_co_common_block_status_above(bs, base, false,
+ BDRV_BSTAT_PRECISE, offset,
bytes, pnum, map, file, NULL);
}
@@ -2765,8 +2764,9 @@ int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
return 1;
}
- ret = bdrv_co_common_block_status_above(bs, NULL, false, false, offset,
- bytes, &pnum, NULL, NULL, NULL);
+ ret = bdrv_co_common_block_status_above(bs, NULL, false, BDRV_BSTAT_ZERO,
+ offset, bytes, &pnum, NULL, NULL,
+ NULL);
if (ret < 0) {
return ret;
@@ -2782,9 +2782,9 @@ int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t offset,
int64_t dummy;
IO_CODE();
- ret = bdrv_co_common_block_status_above(bs, bs, true, false, offset,
- bytes, pnum ? pnum : &dummy, NULL,
- NULL, NULL);
+ ret = bdrv_co_common_block_status_above(bs, bs, true, BDRV_BSTAT_ALLOCATED,
+ offset, bytes, pnum ? pnum : &dummy,
+ NULL, NULL, NULL);
if (ret < 0) {
return ret;
}
@@ -2817,7 +2817,8 @@ int coroutine_fn bdrv_co_is_allocated_above(BlockDriverState *bs,
int ret;
IO_CODE();
- ret = bdrv_co_common_block_status_above(bs, base, include_base, false,
+ ret = bdrv_co_common_block_status_above(bs, base, include_base,
+ BDRV_BSTAT_ALLOCATED,
offset, bytes, pnum, NULL, NULL,
&depth);
if (ret < 0) {
@@ -3709,8 +3710,8 @@ bdrv_co_preadv_snapshot(BdrvChild *child, int64_t offset, int64_t bytes,
}
int coroutine_fn
-bdrv_co_snapshot_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset, int64_t bytes,
+bdrv_co_snapshot_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+ int64_t offset, int64_t bytes,
int64_t *pnum, int64_t *map,
BlockDriverState **file)
{
@@ -3728,7 +3729,7 @@ bdrv_co_snapshot_block_status(BlockDriverState *bs,
}
bdrv_inc_in_flight(bs);
- ret = drv->bdrv_co_snapshot_block_status(bs, want_zero, offset, bytes,
+ ret = drv->bdrv_co_snapshot_block_status(bs, mode, offset, bytes,
pnum, map, file);
bdrv_dec_in_flight(bs);
diff --git a/block/blkdebug.c b/block/blkdebug.c
index 1c1967f8e0a..f3eba9e6f27 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -751,9 +751,9 @@ blkdebug_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
}
static int coroutine_fn GRAPH_RDLOCK
-blkdebug_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum, int64_t *map,
- BlockDriverState **file)
+blkdebug_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+ int64_t offset, int64_t bytes, int64_t *pnum,
+ int64_t *map, BlockDriverState **file)
{
int err;
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index fd470f5f926..e75e6925e50 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -291,8 +291,8 @@ cbw_co_preadv_snapshot(BlockDriverState *bs, int64_t offset, int64_t bytes,
}
static int coroutine_fn GRAPH_RDLOCK
-cbw_co_snapshot_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset, int64_t bytes,
+cbw_co_snapshot_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+ int64_t offset, int64_t bytes,
int64_t *pnum, int64_t *map,
BlockDriverState **file)
{
diff --git a/block/file-posix.c b/block/file-posix.c
index 56d1972d156..30129ec9b37 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3201,7 +3201,7 @@ static int find_allocation(BlockDriverState *bs, off_t start,
* well exceed it.
*/
static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
- bool want_zero,
+ enum BlockStatusMode mode,
int64_t offset,
int64_t bytes, int64_t *pnum,
int64_t *map,
@@ -3217,7 +3217,7 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
return ret;
}
- if (!want_zero) {
+ if (mode != BDRV_BSTAT_PRECISE) {
*pnum = bytes;
*map = offset;
*file = bs;
diff --git a/block/gluster.c b/block/gluster.c
index c6d25ae7335..f5ee3cdcc1f 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1465,7 +1465,7 @@ exit:
* (Based on raw_co_block_status() from file-posix.c.)
*/
static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
- bool want_zero,
+ enum BlockStatusMode mode,
int64_t offset,
int64_t bytes,
int64_t *pnum,
@@ -1482,7 +1482,7 @@ static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
return ret;
}
- if (!want_zero) {
+ if (mode != BDRV_BSTAT_PRECISE) {
*pnum = bytes;
*map = offset;
*file = bs;
diff --git a/block/iscsi.c b/block/iscsi.c
index 2f0f4dac097..c7b425597f0 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -694,9 +694,9 @@ out_unlock:
static int coroutine_fn iscsi_co_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum,
- int64_t *map,
+ enum BlockStatusMode mode,
+ int64_t offset, int64_t bytes,
+ int64_t *pnum, int64_t *map,
BlockDriverState **file)
{
IscsiLun *iscsilun = bs->opaque;
diff --git a/block/nbd.c b/block/nbd.c
index 887841bc813..591dedde62e 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1397,8 +1397,8 @@ nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
}
static int coroutine_fn GRAPH_RDLOCK nbd_client_co_block_status(
- BlockDriverState *bs, bool want_zero, int64_t offset, int64_t bytes,
- int64_t *pnum, int64_t *map, BlockDriverState **file)
+ BlockDriverState *bs, enum BlockStatusMode mode, int64_t offset,
+ int64_t bytes, int64_t *pnum, int64_t *map, BlockDriverState **file)
{
int ret, request_ret;
NBDExtent64 extent = { 0 };
diff --git a/block/null.c b/block/null.c
index dc0b1fdbd9b..66470787cfd 100644
--- a/block/null.c
+++ b/block/null.c
@@ -227,9 +227,9 @@ static int null_reopen_prepare(BDRVReopenState *reopen_state,
}
static int coroutine_fn null_co_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum,
- int64_t *map,
+ enum BlockStatusMode mode,
+ int64_t offset, int64_t bytes,
+ int64_t *pnum, int64_t *map,
BlockDriverState **file)
{
BDRVNullState *s = bs->opaque;
diff --git a/block/parallels.c b/block/parallels.c
index 347ca127f34..93b42b47239 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -416,9 +416,9 @@ parallels_co_flush_to_os(BlockDriverState *bs)
}
static int coroutine_fn GRAPH_RDLOCK
-parallels_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum, int64_t *map,
- BlockDriverState **file)
+parallels_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+ int64_t offset, int64_t bytes, int64_t *pnum,
+ int64_t *map, BlockDriverState **file)
{
BDRVParallelsState *s = bs->opaque;
int count;
diff --git a/block/qcow.c b/block/qcow.c
index da8ad4d2430..de7fb42c51c 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -530,7 +530,7 @@ get_cluster_offset(BlockDriverState *bs, uint64_t offset, int allocate,
}
static int coroutine_fn GRAPH_RDLOCK
-qcow_co_block_status(BlockDriverState *bs, bool want_zero,
+qcow_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
int64_t offset, int64_t bytes, int64_t *pnum,
int64_t *map, BlockDriverState **file)
{
diff --git a/block/qcow2.c b/block/qcow2.c
index 7774e7f0909..14fa1c00df1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2141,9 +2141,9 @@ static void qcow2_join_options(QDict *options, QDict *old_options)
}
static int coroutine_fn GRAPH_RDLOCK
-qcow2_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
- int64_t count, int64_t *pnum, int64_t *map,
- BlockDriverState **file)
+qcow2_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+ int64_t offset, int64_t count, int64_t *pnum,
+ int64_t *map, BlockDriverState **file)
{
BDRVQcow2State *s = bs->opaque;
uint64_t host_offset;
diff --git a/block/qed.c b/block/qed.c
index ac24449ffb3..6e57859d05f 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -833,9 +833,9 @@ fail:
}
static int coroutine_fn GRAPH_RDLOCK
-bdrv_qed_co_block_status(BlockDriverState *bs, bool want_zero, int64_t pos,
- int64_t bytes, int64_t *pnum, int64_t *map,
- BlockDriverState **file)
+bdrv_qed_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+ int64_t pos, int64_t bytes, int64_t *pnum,
+ int64_t *map, BlockDriverState **file)
{
BDRVQEDState *s = bs->opaque;
size_t len = MIN(bytes, SIZE_MAX);
diff --git a/block/quorum.c b/block/quorum.c
index 30747a6df93..97091136fcb 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1226,7 +1226,7 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
* region contains zeroes, and BDRV_BLOCK_DATA otherwise.
*/
static int coroutine_fn GRAPH_RDLOCK
-quorum_co_block_status(BlockDriverState *bs, bool want_zero,
+quorum_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
int64_t offset, int64_t count,
int64_t *pnum, int64_t *map, BlockDriverState **file)
{
@@ -1238,7 +1238,7 @@ quorum_co_block_status(BlockDriverState *bs, bool want_zero,
for (i = 0; i < s->num_children; i++) {
int64_t bytes;
ret = bdrv_co_common_block_status_above(s->children[i]->bs, NULL, false,
- want_zero, offset, count,
+ mode, offset, count,
&bytes, NULL, NULL, NULL);
if (ret < 0) {
quorum_report_bad(QUORUM_OP_TYPE_READ, offset, count,
diff --git a/block/raw-format.c b/block/raw-format.c
index e08526e2eca..0ff5367123b 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -283,8 +283,8 @@ fail:
}
static int coroutine_fn GRAPH_RDLOCK
-raw_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum, int64_t *map,
+raw_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+ int64_t offset, int64_t bytes, int64_t *pnum, int64_t *map,
BlockDriverState **file)
{
BDRVRawState *s = bs->opaque;
diff --git a/block/rbd.c b/block/rbd.c
index af984fb7db4..abcdd5e4e76 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1504,9 +1504,9 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
}
static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum,
- int64_t *map,
+ enum BlockStatusMode mode,
+ int64_t offset, int64_t bytes,
+ int64_t *pnum, int64_t *map,
BlockDriverState **file)
{
BDRVRBDState *s = bs->opaque;
diff --git a/block/snapshot-access.c b/block/snapshot-access.c
index 71ac83c01f0..1d91b6df5d7 100644
--- a/block/snapshot-access.c
+++ b/block/snapshot-access.c
@@ -41,11 +41,11 @@ snapshot_access_co_preadv_part(BlockDriverState *bs,
static int coroutine_fn GRAPH_RDLOCK
snapshot_access_co_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset,
+ enum BlockStatusMode mode, int64_t offset,
int64_t bytes, int64_t *pnum,
int64_t *map, BlockDriverState **file)
{
- return bdrv_co_snapshot_block_status(bs->file->bs, want_zero, offset,
+ return bdrv_co_snapshot_block_status(bs->file->bs, mode, offset,
bytes, pnum, map, file);
}
diff --git a/block/vdi.c b/block/vdi.c
index a2da6ecab01..9a9d402c946 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -523,8 +523,8 @@ static int vdi_reopen_prepare(BDRVReopenState *state,
}
static int coroutine_fn GRAPH_RDLOCK
-vdi_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum, int64_t *map,
+vdi_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
+ int64_t offset, int64_t bytes, int64_t *pnum, int64_t *map,
BlockDriverState **file)
{
BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
diff --git a/block/vmdk.c b/block/vmdk.c
index 2adec499122..6e2fd8d16ef 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1777,7 +1777,7 @@ static inline uint64_t vmdk_find_offset_in_cluster(VmdkExtent *extent,
}
static int coroutine_fn GRAPH_RDLOCK
-vmdk_co_block_status(BlockDriverState *bs, bool want_zero,
+vmdk_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
int64_t offset, int64_t bytes, int64_t *pnum,
int64_t *map, BlockDriverState **file)
{
diff --git a/block/vpc.c b/block/vpc.c
index 0309e319f60..4eac32fd1c4 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -726,7 +726,7 @@ fail:
}
static int coroutine_fn GRAPH_RDLOCK
-vpc_co_block_status(BlockDriverState *bs, bool want_zero,
+vpc_co_block_status(BlockDriverState *bs, enum BlockStatusMode mode,
int64_t offset, int64_t bytes,
int64_t *pnum, int64_t *map,
BlockDriverState **file)
diff --git a/block/vvfat.c b/block/vvfat.c
index 91d69b3cc83..336679cac12 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -3134,9 +3134,9 @@ vvfat_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
}
static int coroutine_fn vvfat_co_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset,
- int64_t bytes, int64_t *n,
- int64_t *map,
+ enum BlockStatusMode mode,
+ int64_t offset, int64_t bytes,
+ int64_t *n, int64_t *map,
BlockDriverState **file)
{
*n = bytes;
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index 2b358eaaa82..8189b32fd52 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -63,7 +63,7 @@ bdrv_test_co_truncate(BlockDriverState *bs, int64_t offset, bool exact,
}
static int coroutine_fn bdrv_test_co_block_status(BlockDriverState *bs,
- bool want_zero,
+ enum BlockStatusMode mode,
int64_t offset, int64_t count,
int64_t *pnum, int64_t *map,
BlockDriverState **file)
--
2.49.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 02/11] file-posix: Handle zero block status hint better
2025-04-17 18:39 [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
2025-04-17 18:39 ` [PATCH v2 01/11] block: Expand block status mode from bool to enum Eric Blake
@ 2025-04-17 18:39 ` Eric Blake
2025-04-17 20:58 ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents Eric Blake
` (10 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2025-04-17 18:39 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, vsementsov, Kevin Wolf, Hanna Reitz
Although the previous patch to change 'bool want_zero' into an enum
made no semantic change, it is now time to differentiate. When the
caller specifically wants to know what parts of the file read as zero,
we need to use lseek and actually reporting holes, rather than
short-circuiting and advertising full allocation.
This change will be utilized in later patches to let mirroring
optimize for the case when the destination already reads as zeroes.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
block/file-posix.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/block/file-posix.c b/block/file-posix.c
index 30129ec9b37..8f3e7a8304f 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3217,7 +3217,8 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
return ret;
}
- if (mode != BDRV_BSTAT_PRECISE) {
+ if (mode == BDRV_BSTAT_ALLOCATED) {
+ /* There is no backing file - all bytes are allocated in this file. */
*pnum = bytes;
*map = offset;
*file = bs;
--
2.49.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents
2025-04-17 18:39 [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
2025-04-17 18:39 ` [PATCH v2 01/11] block: Expand block status mode from bool to enum Eric Blake
2025-04-17 18:39 ` [PATCH v2 02/11] file-posix: Handle zero block status hint better Eric Blake
@ 2025-04-17 18:39 ` Eric Blake
2025-04-17 20:21 ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 04/11] block: Add new bdrv_co_is_all_zeroes() function Eric Blake
` (9 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2025-04-17 18:39 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-block, vsementsov, Stefan Hajnoczi, Fam Zheng, Kevin Wolf,
Hanna Reitz
Some BDS drivers have a cap on how much block status they can supply
in one query (for example, NBD talking to an older server cannot
inspect more than 4G per query; and qcow2 tends to cap its answers
rather than cross a cluster boundary of an L1 table). Although the
existing callers of bdrv_co_is_zero_fast are not passing in that large
of a 'bytes' parameter, an upcoming caller wants to query the entire
image at once, and will thus benefit from being able to treat adjacent
zero regions in a coalesced manner, rather than claiming the region is
non-zero merely because pnum was truncated and didn't match the
incoming bytes.
While refactoring this into a loop, note that there is no need to
assign pnum prior to calling bdrv_co_common_block_status_above() (it
is guaranteed to be assigned deeper in the callstack).
Signed-off-by: Eric Blake <eblake@redhat.com>
---
block/io.c | 27 +++++++++++++++------------
1 file changed, 15 insertions(+), 12 deletions(-)
diff --git a/block/io.c b/block/io.c
index 73c96084e62..6ef78070915 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2757,22 +2757,25 @@ int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
int64_t bytes)
{
int ret;
- int64_t pnum = bytes;
+ int64_t pnum;
IO_CODE();
- if (!bytes) {
- return 1;
+ while (bytes) {
+ ret = bdrv_co_common_block_status_above(bs, NULL, false,
+ BDRV_BSTAT_ZERO, offset, bytes,
+ &pnum, NULL, NULL, NULL);
+
+ if (ret < 0) {
+ return ret;
+ }
+ if (!(ret & BDRV_BLOCK_ZERO)) {
+ return 0;
+ }
+ offset += pnum;
+ bytes -= pnum;
}
- ret = bdrv_co_common_block_status_above(bs, NULL, false, BDRV_BSTAT_ZERO,
- offset, bytes, &pnum, NULL, NULL,
- NULL);
-
- if (ret < 0) {
- return ret;
- }
-
- return (pnum == bytes) && (ret & BDRV_BLOCK_ZERO);
+ return 1;
}
int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t offset,
--
2.49.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 04/11] block: Add new bdrv_co_is_all_zeroes() function
2025-04-17 18:39 [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
` (2 preceding siblings ...)
2025-04-17 18:39 ` [PATCH v2 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents Eric Blake
@ 2025-04-17 18:39 ` Eric Blake
2025-04-17 20:35 ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 05/11] iotests: Improve iotest 194 to mirror data Eric Blake
` (8 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2025-04-17 18:39 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-block, vsementsov, Stefan Hajnoczi, Fam Zheng, Kevin Wolf,
Hanna Reitz
There are some optimizations that require knowing if an image starts
out as reading all zeroes, such as making blockdev-mirror faster by
skipping the copying of source zeroes to the destination. The
existing bdrv_co_is_zero_fast() is a good building block for answering
this question, but it tends to give an answer of 0 for a file we just
created via QMP 'blockdev-create' or similar (such as 'qemu-img create
-f raw'). Why? Because file-posix.c insists on allocating a tiny
header to any file rather than leaving it 100% sparse, due to some
filesystems that are unable to answer alignment probes on a hole. But
teaching file-posix.c to read the tiny header doesn't scale - the
problem of a small header is also visible when libvirt sets up an NBD
client to a just-created file on a migration destination host.
So, we need a wrapper function that handles a bit more complexity in a
common manner for all block devices - when the BDS is mostly a hole,
but has a small non-hole header, it is still worth the time to read
that header and check if it reads as all zeroes before giving up and
returning a pessimistic answer.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
include/block/block-io.h | 2 ++
block/io.c | 58 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 60 insertions(+)
diff --git a/include/block/block-io.h b/include/block/block-io.h
index b49e0537dd4..b99cc98d265 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -161,6 +161,8 @@ bdrv_is_allocated_above(BlockDriverState *bs, BlockDriverState *base,
int coroutine_fn GRAPH_RDLOCK
bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset, int64_t bytes);
+int coroutine_fn GRAPH_RDLOCK
+bdrv_co_is_all_zeroes(BlockDriverState *bs);
int GRAPH_RDLOCK
bdrv_apply_auto_read_only(BlockDriverState *bs, const char *errmsg,
diff --git a/block/io.c b/block/io.c
index 6ef78070915..dc1341e4029 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2778,6 +2778,64 @@ int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
return 1;
}
+/*
+ * Check @bs (and its backing chain) to see if the entire image is known
+ * to read as zeroes.
+ * Return 1 if that is the case, 0 otherwise and -errno on error.
+ * This test is meant to be fast rather than accurate so returning 0
+ * does not guarantee non-zero data; however, it can report 1 in more
+ * cases than bdrv_co_is_zero_fast.
+ */
+int coroutine_fn bdrv_co_is_all_zeroes(BlockDriverState *bs)
+{
+ int ret;
+ int64_t pnum, bytes;
+ char *buf;
+ QEMUIOVector local_qiov;
+ IO_CODE();
+
+ bytes = bdrv_co_getlength(bs);
+ if (bytes < 0) {
+ return bytes;
+ }
+
+ /* First probe - see if the entire image reads as zero */
+ ret = bdrv_co_common_block_status_above(bs, NULL, false, BDRV_BSTAT_ZERO,
+ 0, bytes, &pnum, NULL, NULL,
+ NULL);
+ if (ret < 0) {
+ return ret;
+ }
+ if (ret & BDRV_BLOCK_ZERO) {
+ return bdrv_co_is_zero_fast(bs, pnum, bytes - pnum);
+ }
+
+ /*
+ * Because of the way 'blockdev-create' works, raw files tend to
+ * be created with a non-sparse region at the front to make
+ * alignment probing easier. If the block starts with only a
+ * small allocated region, it is still worth the effort to see if
+ * the rest of the image is still sparse, coupled with manually
+ * reading the first region to see if it reads zero after all.
+ */
+ if (pnum > qemu_real_host_page_size()) {
+ return 0;
+ }
+ ret = bdrv_co_is_zero_fast(bs, pnum, bytes - pnum);
+ if (ret <= 0) {
+ return ret;
+ }
+ /* Only the head of the image is unknown, and it's small. Read it. */
+ buf = qemu_blockalign(bs, pnum);
+ qemu_iovec_init_buf(&local_qiov, buf, pnum);
+ ret = bdrv_driver_preadv(bs, 0, pnum, &local_qiov, 0, 0);
+ if (ret >= 0) {
+ ret = buffer_is_zero(buf, pnum);
+ }
+ qemu_vfree(buf);
+ return ret;
+}
+
int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t offset,
int64_t bytes, int64_t *pnum)
{
--
2.49.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 05/11] iotests: Improve iotest 194 to mirror data
2025-04-17 18:39 [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
` (3 preceding siblings ...)
2025-04-17 18:39 ` [PATCH v2 04/11] block: Add new bdrv_co_is_all_zeroes() function Eric Blake
@ 2025-04-17 18:39 ` Eric Blake
2025-04-17 20:39 ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 06/11] mirror: Minor refactoring Eric Blake
` (7 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2025-04-17 18:39 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, vsementsov, Kevin Wolf, Hanna Reitz
Mirroring a completely sparse image to a sparse destination should be
practically instantaneous. It isn't yet, but the test will be more
realistic if it has some non-zero to mirror as well as the holes.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
tests/qemu-iotests/194 | 1 +
1 file changed, 1 insertion(+)
diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
index c0ce82dd257..d0b9c084f5f 100755
--- a/tests/qemu-iotests/194
+++ b/tests/qemu-iotests/194
@@ -34,6 +34,7 @@ with iotests.FilePath('source.img') as source_img_path, \
img_size = '1G'
iotests.qemu_img_create('-f', iotests.imgfmt, source_img_path, img_size)
+ iotests.qemu_io('-f', iotests.imgfmt, '-c', 'write 512M 1M', source_img_path)
iotests.qemu_img_create('-f', iotests.imgfmt, dest_img_path, img_size)
iotests.log('Launching VMs...')
--
2.49.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 06/11] mirror: Minor refactoring
2025-04-17 18:39 [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
` (4 preceding siblings ...)
2025-04-17 18:39 ` [PATCH v2 05/11] iotests: Improve iotest 194 to mirror data Eric Blake
@ 2025-04-17 18:39 ` Eric Blake
2025-04-17 20:42 ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 07/11] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
` (6 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2025-04-17 18:39 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, vsementsov, John Snow, Kevin Wolf, Hanna Reitz
Commit 5791ba52 (v9.2) pre-initialized ret in mirror_dirty_init to
silence a false positive compiler warning, even though in all code
paths where ret is used, it was guaranteed to be reassigned
beforehand. But since the function returns -errno, and -1 is not
always the right errno, it's better to initialize to -EIO.
An upcoming patch wants to track two bitmaps in
do_sync_target_write(); this will be easier if the current variables
related to the dirty bitmap are renamed.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
block/mirror.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index a53582f17bb..34c6c5252e1 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -841,7 +841,7 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
int64_t offset;
BlockDriverState *bs;
BlockDriverState *target_bs = blk_bs(s->target);
- int ret = -1;
+ int ret = -EIO;
int64_t count;
bdrv_graph_co_rdlock();
@@ -1341,7 +1341,7 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
{
int ret;
size_t qiov_offset = 0;
- int64_t bitmap_offset, bitmap_end;
+ int64_t dirty_bitmap_offset, dirty_bitmap_end;
if (!QEMU_IS_ALIGNED(offset, job->granularity) &&
bdrv_dirty_bitmap_get(job->dirty_bitmap, offset))
@@ -1388,11 +1388,11 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
* Tails are either clean or shrunk, so for bitmap resetting
* we safely align the range down.
*/
- bitmap_offset = QEMU_ALIGN_UP(offset, job->granularity);
- bitmap_end = QEMU_ALIGN_DOWN(offset + bytes, job->granularity);
- if (bitmap_offset < bitmap_end) {
- bdrv_reset_dirty_bitmap(job->dirty_bitmap, bitmap_offset,
- bitmap_end - bitmap_offset);
+ dirty_bitmap_offset = QEMU_ALIGN_UP(offset, job->granularity);
+ dirty_bitmap_end = QEMU_ALIGN_DOWN(offset + bytes, job->granularity);
+ if (dirty_bitmap_offset < dirty_bitmap_end) {
+ bdrv_reset_dirty_bitmap(job->dirty_bitmap, dirty_bitmap_offset,
+ dirty_bitmap_end - dirty_bitmap_offset);
}
job_progress_increase_remaining(&job->common.job, bytes);
@@ -1430,10 +1430,10 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
* at function start, and they must be still dirty, as we've locked
* the region for in-flight op.
*/
- bitmap_offset = QEMU_ALIGN_DOWN(offset, job->granularity);
- bitmap_end = QEMU_ALIGN_UP(offset + bytes, job->granularity);
- bdrv_set_dirty_bitmap(job->dirty_bitmap, bitmap_offset,
- bitmap_end - bitmap_offset);
+ dirty_bitmap_offset = QEMU_ALIGN_DOWN(offset, job->granularity);
+ dirty_bitmap_end = QEMU_ALIGN_UP(offset + bytes, job->granularity);
+ bdrv_set_dirty_bitmap(job->dirty_bitmap, dirty_bitmap_offset,
+ dirty_bitmap_end - dirty_bitmap_offset);
qatomic_set(&job->actively_synced, false);
action = mirror_error_action(job, false, -ret);
--
2.49.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 07/11] mirror: Skip pre-zeroing destination if it is already zero
2025-04-17 18:39 [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
` (5 preceding siblings ...)
2025-04-17 18:39 ` [PATCH v2 06/11] mirror: Minor refactoring Eric Blake
@ 2025-04-17 18:39 ` Eric Blake
2025-04-17 20:46 ` Stefan Hajnoczi
2025-04-24 17:10 ` Eric Blake
2025-04-17 18:39 ` [PATCH v2 08/11] mirror: Skip writing zeroes when target " Eric Blake
` (5 subsequent siblings)
12 siblings, 2 replies; 36+ messages in thread
From: Eric Blake @ 2025-04-17 18:39 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, vsementsov, John Snow, Kevin Wolf, Hanna Reitz
When doing a sync=full mirroring, QMP drive-mirror requests full
zeroing if it did not just create the destination, and blockdev-mirror
requests full zeroing unconditionally. This is because during a full
sync, we must ensure that the portions of the disk that are not
otherwise touched by the source still read as zero upon completion.
However, in mirror_dirty_init(), we were blindly assuming that if the
destination allows punching holes, we should pre-zero the entire
image; and if it does not allow punching holes, then treat the entire
source as dirty rather than mirroring just the allocated portions of
the source. Without the ability to punch holes, this results in the
destination file being fully allocated; and even when punching holes
is supported, it causes duplicate I/O to the portions of the
destination corresponding to chunks of the source that are allocated
but read as zero.
Smarter is to avoid the pre-zeroing pass over the destination if it
can be proved the destination already reads as zero. Note that a
later patch will then further improve things to skip writing to the
destination for parts of the image where the source is zero; but even
with just this patch, it is possible to see a difference for any BDS
that can quickly report that it already reads as zero.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
block/mirror.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/block/mirror.c b/block/mirror.c
index 34c6c5252e1..234e3a55e60 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -849,13 +849,23 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
bdrv_graph_co_rdunlock();
if (s->zero_target) {
+ offset = 0;
+ bdrv_graph_co_rdlock();
+ ret = bdrv_co_is_all_zeroes(target_bs);
+ bdrv_graph_co_rdunlock();
+ if (ret < 0) {
+ return ret;
+ }
+ if (ret > 0) {
+ offset = s->bdev_length;
+ }
if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
bdrv_set_dirty_bitmap(s->dirty_bitmap, 0, s->bdev_length);
return 0;
}
s->initial_zeroing_ongoing = true;
- for (offset = 0; offset < s->bdev_length; ) {
+ while (offset < s->bdev_length) {
int bytes = MIN(s->bdev_length - offset,
QEMU_ALIGN_DOWN(INT_MAX, s->granularity));
--
2.49.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 08/11] mirror: Skip writing zeroes when target is already zero
2025-04-17 18:39 [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
` (6 preceding siblings ...)
2025-04-17 18:39 ` [PATCH v2 07/11] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
@ 2025-04-17 18:39 ` Eric Blake
2025-04-17 20:54 ` Stefan Hajnoczi
2025-04-23 16:42 ` Sunny Zhu
2025-04-17 18:39 ` [PATCH v2 09/11] iotests/common.rc: add disk_usage function Eric Blake
` (4 subsequent siblings)
12 siblings, 2 replies; 36+ messages in thread
From: Eric Blake @ 2025-04-17 18:39 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, vsementsov, John Snow, Kevin Wolf, Hanna Reitz
When mirroring, the goal is to ensure that the destination reads the
same as the source; this goal is met whether the destination is sparse
or fully-allocated. However, if the destination cannot efficiently
write zeroes, then any time the mirror operation wants to copy zeroes
from the source to the destination (either during the background over
sparse regions when doing a full mirror, or in the foreground when the
guest actively writes zeroes), we were causing the destination to
fully allocate that portion of the disk, even if it already read as
zeroes.
The effect is especially pronounced when the source is a raw file.
That's because when the source is a qcow2 file, the dirty bitmap only
visits the portions of the source that are allocated, which tend to be
non-zero. But when the source is a raw file,
bdrv_co_is_allocated_above() reports the entire file as allocated so
mirror_dirty_init sets the entire dirty bitmap, and it is only later
during mirror_iteration that we change to consulting the more precise
bdrv_co_block_status_above() to learn where the source reads as zero.
Remember that since a mirror operation can write a cluster more than
once (every time the guest changes the source, the destination is also
changed to keep up), we can't take the shortcut of relying on
s->zero_target (which is static for the life of the job) in
mirror_co_zero() to see if the destination is already zero, because
that information may be stale. Any solution we use must be dynamic in
the face of the guest writing or discarding a cluster while the mirror
has been ongoing.
We could just teach mirror_co_zero() to do a block_status() probe of
the destination, and skip the zeroes if the destination already reads
as zero, but we know from past experience that extra block_status()
calls are not always cheap (tmpfs, anyone?), especially when they are
random access rather than linear. Use of block_status() of the source
by the background task in a linear fashion is not our bottleneck (it's
a background task, after all); but since mirroring can be done while
the source is actively being changed, we don't want a slow
block_status() of the destination to occur on the hot path of the
guest trying to do random-access writes to the source.
So this patch takes a slightly different approach: any time we have to
transfer the full image, we know that mirror_dirty_init() is _already_
doing a pre-zero pass over the entire destination. Therefore, if we
track which clusters of the destination are zero at any given moment,
we don't have to do a block_status() call on the destination, but can
instead just refer to the zero bitmap associated with the job.
With this patch, if I create a raw sparse destination file, connect it
with QMP 'blockdev-add' while leaving it at the default "discard":
"ignore", then run QMP 'blockdev-mirror' with "sync": "full", the
destination remains sparse rather than fully allocated.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
block/mirror.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++----
1 file changed, 65 insertions(+), 5 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index 234e3a55e60..4770d87abf6 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -73,6 +73,7 @@ typedef struct MirrorBlockJob {
size_t buf_size;
int64_t bdev_length;
unsigned long *cow_bitmap;
+ unsigned long *zero_bitmap;
BdrvDirtyBitmap *dirty_bitmap;
BdrvDirtyBitmapIter *dbi;
uint8_t *buf;
@@ -408,15 +409,33 @@ static void coroutine_fn mirror_co_read(void *opaque)
static void coroutine_fn mirror_co_zero(void *opaque)
{
MirrorOp *op = opaque;
- int ret;
+ bool write_needed = true;
+ int ret = 0;
op->s->in_flight++;
op->s->bytes_in_flight += op->bytes;
*op->bytes_handled = op->bytes;
op->is_in_flight = true;
- ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
- op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
+ if (op->s->zero_bitmap) {
+ unsigned long end = DIV_ROUND_UP(op->offset + op->bytes,
+ op->s->granularity);
+ assert(QEMU_IS_ALIGNED(op->offset, op->s->granularity));
+ assert(QEMU_IS_ALIGNED(op->bytes, op->s->granularity) ||
+ op->offset + op->bytes == op->s->bdev_length);
+ if (find_next_zero_bit(op->s->zero_bitmap, end,
+ op->offset / op->s->granularity) == end) {
+ write_needed = false;
+ }
+ }
+ if (write_needed) {
+ ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
+ op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
+ }
+ if (ret >= 0 && op->s->zero_bitmap) {
+ bitmap_set(op->s->zero_bitmap, op->offset / op->s->granularity,
+ DIV_ROUND_UP(op->bytes, op->s->granularity));
+ }
mirror_write_complete(op, ret);
}
@@ -441,6 +460,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
Coroutine *co;
int64_t bytes_handled = -1;
+ assert(QEMU_IS_ALIGNED(offset, s->granularity));
+ assert(QEMU_IS_ALIGNED(bytes, s->granularity) ||
+ offset + bytes == s->bdev_length);
op = g_new(MirrorOp, 1);
*op = (MirrorOp){
.s = s,
@@ -452,12 +474,21 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
switch (mirror_method) {
case MIRROR_METHOD_COPY:
+ if (s->zero_bitmap) {
+ bitmap_clear(s->zero_bitmap, offset / s->granularity,
+ DIV_ROUND_UP(bytes, s->granularity));
+ }
co = qemu_coroutine_create(mirror_co_read, op);
break;
case MIRROR_METHOD_ZERO:
+ /* s->zero_bitmap handled in mirror_co_zero */
co = qemu_coroutine_create(mirror_co_zero, op);
break;
case MIRROR_METHOD_DISCARD:
+ if (s->zero_bitmap) {
+ bitmap_clear(s->zero_bitmap, offset / s->granularity,
+ DIV_ROUND_UP(bytes, s->granularity));
+ }
co = qemu_coroutine_create(mirror_co_discard, op);
break;
default:
@@ -849,6 +880,8 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
bdrv_graph_co_rdunlock();
if (s->zero_target) {
+ int64_t bitmap_length = DIV_ROUND_UP(s->bdev_length, s->granularity);
+
offset = 0;
bdrv_graph_co_rdlock();
ret = bdrv_co_is_all_zeroes(target_bs);
@@ -856,7 +889,9 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
if (ret < 0) {
return ret;
}
+ s->zero_bitmap = bitmap_new(bitmap_length);
if (ret > 0) {
+ bitmap_set(s->zero_bitmap, 0, bitmap_length);
offset = s->bdev_length;
}
if (!bdrv_can_write_zeroes_with_unmap(target_bs)) {
@@ -1173,6 +1208,7 @@ immediate_exit:
assert(s->in_flight == 0);
qemu_vfree(s->buf);
g_free(s->cow_bitmap);
+ g_free(s->zero_bitmap);
g_free(s->in_flight_bitmap);
bdrv_dirty_iter_free(s->dbi);
@@ -1352,6 +1388,7 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
int ret;
size_t qiov_offset = 0;
int64_t dirty_bitmap_offset, dirty_bitmap_end;
+ int64_t zero_bitmap_offset, zero_bitmap_end;
if (!QEMU_IS_ALIGNED(offset, job->granularity) &&
bdrv_dirty_bitmap_get(job->dirty_bitmap, offset))
@@ -1395,8 +1432,9 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
}
/*
- * Tails are either clean or shrunk, so for bitmap resetting
- * we safely align the range down.
+ * Tails are either clean or shrunk, so for dirty bitmap resetting
+ * we safely align the range narrower. But for zero bitmap, round
+ * range wider for checking or clearing, and narrower for setting.
*/
dirty_bitmap_offset = QEMU_ALIGN_UP(offset, job->granularity);
dirty_bitmap_end = QEMU_ALIGN_DOWN(offset + bytes, job->granularity);
@@ -1404,22 +1442,44 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
bdrv_reset_dirty_bitmap(job->dirty_bitmap, dirty_bitmap_offset,
dirty_bitmap_end - dirty_bitmap_offset);
}
+ zero_bitmap_offset = offset / job->granularity;
+ zero_bitmap_end = DIV_ROUND_UP(offset + bytes, job->granularity);
job_progress_increase_remaining(&job->common.job, bytes);
job->active_write_bytes_in_flight += bytes;
switch (method) {
case MIRROR_METHOD_COPY:
+ if (job->zero_bitmap) {
+ bitmap_clear(job->zero_bitmap, zero_bitmap_offset,
+ zero_bitmap_end - zero_bitmap_offset);
+ }
ret = blk_co_pwritev_part(job->target, offset, bytes,
qiov, qiov_offset, flags);
break;
case MIRROR_METHOD_ZERO:
+ if (job->zero_bitmap) {
+ if (find_next_zero_bit(job->zero_bitmap, zero_bitmap_end,
+ zero_bitmap_offset) == zero_bitmap_end) {
+ ret = 0;
+ break;
+ }
+ }
assert(!qiov);
ret = blk_co_pwrite_zeroes(job->target, offset, bytes, flags);
+ if (job->zero_bitmap && ret >= 0) {
+ bitmap_set(job->zero_bitmap, dirty_bitmap_offset / job->granularity,
+ (dirty_bitmap_end - dirty_bitmap_offset) /
+ job->granularity);
+ }
break;
case MIRROR_METHOD_DISCARD:
+ if (job->zero_bitmap) {
+ bitmap_clear(job->zero_bitmap, zero_bitmap_offset,
+ zero_bitmap_end - zero_bitmap_offset);
+ }
assert(!qiov);
ret = blk_co_pdiscard(job->target, offset, bytes);
break;
--
2.49.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 09/11] iotests/common.rc: add disk_usage function
2025-04-17 18:39 [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
` (7 preceding siblings ...)
2025-04-17 18:39 ` [PATCH v2 08/11] mirror: Skip writing zeroes when target " Eric Blake
@ 2025-04-17 18:39 ` Eric Blake
2025-04-17 20:54 ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 10/11] tests: Add iotest mirror-sparse for recent patches Eric Blake
` (3 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2025-04-17 18:39 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-block, vsementsov, Andrey Drobyshev, Alexander Ivanov,
Alberto Garcia, Kevin Wolf, Hanna Reitz
From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Move the definition from iotests/250 to common.rc. This is used to
detect real disk usage of sparse files. In particular, we want to use
it for checking subclusters-based discards.
Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
Reviewed-by: Alberto Garcia <berto@igalia.com>
Message-ID: <20240913163942.423050-6-andrey.drobyshev@virtuozzo.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
---
tests/qemu-iotests/common.rc | 6 ++++++
tests/qemu-iotests/250 | 5 -----
2 files changed, 6 insertions(+), 5 deletions(-)
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 95c12577dd4..237f746af88 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -140,6 +140,12 @@ _optstr_add()
fi
}
+# report real disk usage for sparse files
+disk_usage()
+{
+ du --block-size=1 "$1" | awk '{print $1}'
+}
+
# Set the variables to the empty string to turn Valgrind off
# for specific processes, e.g.
# $ VALGRIND_QEMU_IO= ./check -qcow2 -valgrind 015
diff --git a/tests/qemu-iotests/250 b/tests/qemu-iotests/250
index af48f83abac..c0a0dbc0ff1 100755
--- a/tests/qemu-iotests/250
+++ b/tests/qemu-iotests/250
@@ -52,11 +52,6 @@ _unsupported_imgopts data_file
# bdrv_co_truncate(bs->file) call in qcow2_co_truncate(), which might succeed
# anyway.
-disk_usage()
-{
- du --block-size=1 $1 | awk '{print $1}'
-}
-
size=2100M
_make_test_img -o "cluster_size=1M,preallocation=metadata" $size
--
2.49.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 10/11] tests: Add iotest mirror-sparse for recent patches
2025-04-17 18:39 [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
` (8 preceding siblings ...)
2025-04-17 18:39 ` [PATCH v2 09/11] iotests/common.rc: add disk_usage function Eric Blake
@ 2025-04-17 18:39 ` Eric Blake
2025-04-17 20:55 ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 11/11] mirror: Allow QMP override to declare target already zero Eric Blake
` (2 subsequent siblings)
12 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2025-04-17 18:39 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, vsementsov, Kevin Wolf, Hanna Reitz
Prove that blockdev-mirror can now result in sparse raw destination
files, regardless of whether the source is raw or qcow2. By making
this a separate test, it was possible to test effects of individual
patches for the various pieces that all have to work together for a
sparse mirror to be successful.
Note that ./check -file produces different job lengths than ./check
-qcow2 (the test uses a filter to normalize); that's because when
deciding how much of the image to be mirrored, the code looks at how
much of the source image was allocated (for qcow2, this is only the
written clusters; for raw, it is the entire file). But the important
part is that the destination file ends up smaller than 3M, rather than
the 20M it used to be before this patch series.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
tests/qemu-iotests/tests/mirror-sparse | 109 +++++++++++++++
tests/qemu-iotests/tests/mirror-sparse.out | 153 +++++++++++++++++++++
2 files changed, 262 insertions(+)
create mode 100755 tests/qemu-iotests/tests/mirror-sparse
create mode 100644 tests/qemu-iotests/tests/mirror-sparse.out
diff --git a/tests/qemu-iotests/tests/mirror-sparse b/tests/qemu-iotests/tests/mirror-sparse
new file mode 100755
index 00000000000..ef68542a122
--- /dev/null
+++ b/tests/qemu-iotests/tests/mirror-sparse
@@ -0,0 +1,109 @@
+#!/usr/bin/env bash
+# group: rw auto quick
+#
+# Test blockdev-mirror with raw sparse destination
+#
+# Copyright (C) 2025 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program. If not, see <http://www.gnu.org/licenses/>.
+#
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1 # failure is the default!
+
+_cleanup()
+{
+ _cleanup_test_img
+ _cleanup_qemu
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+cd ..
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt qcow2 raw # Format of the source. dst is always raw file
+_supported_proto file
+_supported_os Linux
+
+filter_len() {
+ sed -e 's/"len": [0-9]*/"len": LEN/g' \
+ -e 's/"offset": [0-9]*/"offset": OFFSET/g'
+}
+
+echo
+echo "=== Initial image setup ==="
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img 20M
+$QEMU_IO -c 'w 8M 2M' -f $IMGFMT "$TEST_IMG.base" | _filter_qemu_io
+
+_launch_qemu -machine q35 \
+ -blockdev '{"driver":"file", "cache":{"direct":true, "no-flush":false},
+ "filename":"'"$TEST_IMG.base"'", "node-name":"src-file"}' \
+ -blockdev '{"driver":"'$IMGFMT'", "node-name":"src", "file":"src-file"}'
+h1=$QEMU_HANDLE
+_send_qemu_cmd $h1 '{"execute": "qmp_capabilities"}' 'return'
+
+# Each of these combinations should result in a sparse destination;
+# the destination should only be fully allocated if pre-allocated
+for creation in external blockdev-create; do
+for discard in ignore unmap; do
+
+echo
+echo "=== Destination with $creation creation and discard=$discard ==="
+echo
+
+rm -f $TEST_IMG
+if test $creation = external; then
+ truncate --size=20M $TEST_IMG
+else
+ _send_qemu_cmd $h1 '{"execute": "blockdev-create", "arguments":
+ {"options": {"driver":"file", "filename":"'$TEST_IMG'",
+ "size":'$((20*1024*1024))'}, "job-id":"job1"}}' 'concluded'
+ _send_qemu_cmd $h1 '{"execute": "job-dismiss", "arguments":
+ {"id": "job1"}}' 'return'
+fi
+_send_qemu_cmd $h1 '{"execute": "blockdev-add", "arguments":
+ {"node-name": "dst", "driver":"file",
+ "filename":"'$TEST_IMG'", "aio":"threads",
+ "auto-read-only":true, "discard":"'$discard'"}}' 'return'
+_send_qemu_cmd $h1 '{"execute":"blockdev-mirror", "arguments":
+ {"sync":"full", "device":"src", "target":"dst",
+ "job-id":"job2"}}' 'return'
+_timed_wait_for $h1 '"ready"'
+_send_qemu_cmd $h1 '{"execute": "job-complete", "arguments":
+ {"id":"job2"}}' 'return' | filter_len
+_send_qemu_cmd $h1 '{"execute": "blockdev-del", "arguments":
+ {"node-name": "dst"}}' 'return' | filter_len
+$QEMU_IMG compare -U -f $IMGFMT -F raw $TEST_IMG.base $TEST_IMG
+if test $(disk_usage $TEST_IMG) -lt $((3*1024*1024)); then
+ echo "destination image is sparse"
+else
+ echo "destination not sparse as expected"
+fi
+
+done
+done
+
+_send_qemu_cmd $h1 '{"execute":"quit"}' ''
+
+# success, all done
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/tests/mirror-sparse.out b/tests/qemu-iotests/tests/mirror-sparse.out
new file mode 100644
index 00000000000..ee569b92868
--- /dev/null
+++ b/tests/qemu-iotests/tests/mirror-sparse.out
@@ -0,0 +1,153 @@
+QA output created by mirror-sparse
+
+=== Initial image setup ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=20971520
+wrote 2097152/2097152 bytes at offset 8388608
+2 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+{"execute": "qmp_capabilities"}
+{"return": {}}
+
+=== Destination with external creation and discard=ignore ===
+
+{"execute": "blockdev-add", "arguments":
+ {"node-name": "dst", "driver":"file",
+ "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+ "auto-read-only":true, "discard":"ignore"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+ {"sync":"full", "device":"src", "target":"dst",
+ "job-id":"job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+ {"id":"job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+ {"node-name": "dst"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+destination image is sparse
+
+=== Destination with external creation and discard=unmap ===
+
+{"execute": "blockdev-add", "arguments":
+ {"node-name": "dst", "driver":"file",
+ "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+ "auto-read-only":true, "discard":"unmap"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+ {"sync":"full", "device":"src", "target":"dst",
+ "job-id":"job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+ {"id":"job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+ {"node-name": "dst"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+destination image is sparse
+
+=== Destination with blockdev-create creation and discard=ignore ===
+
+{"execute": "blockdev-create", "arguments":
+ {"options": {"driver":"file", "filename":"TEST_DIR/t.IMGFMT",
+ "size":20971520}, "job-id":"job1"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job1"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job1"}}
+{"return": {}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job1"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job1"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job1"}}
+{"execute": "job-dismiss", "arguments":
+ {"id": "job1"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job1"}}
+{"return": {}}
+{"execute": "blockdev-add", "arguments":
+ {"node-name": "dst", "driver":"file",
+ "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+ "auto-read-only":true, "discard":"ignore"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+ {"sync":"full", "device":"src", "target":"dst",
+ "job-id":"job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+ {"id":"job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+ {"node-name": "dst"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+destination image is sparse
+
+=== Destination with blockdev-create creation and discard=unmap ===
+
+{"execute": "blockdev-create", "arguments":
+ {"options": {"driver":"file", "filename":"TEST_DIR/t.IMGFMT",
+ "size":20971520}, "job-id":"job1"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job1"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job1"}}
+{"return": {}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job1"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job1"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job1"}}
+{"execute": "job-dismiss", "arguments":
+ {"id": "job1"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job1"}}
+{"return": {}}
+{"execute": "blockdev-add", "arguments":
+ {"node-name": "dst", "driver":"file",
+ "filename":"TEST_DIR/t.IMGFMT", "aio":"threads",
+ "auto-read-only":true, "discard":"unmap"}}
+{"return": {}}
+{"execute":"blockdev-mirror", "arguments":
+ {"sync":"full", "device":"src", "target":"dst",
+ "job-id":"job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "created", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "running", "id": "job2"}}
+{"return": {}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "job2"}}
+{"execute": "job-complete", "arguments":
+ {"id":"job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"return": {}}
+{"execute": "blockdev-del", "arguments":
+ {"node-name": "dst"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "waiting", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "pending", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job2", "len": LEN, "offset": OFFSET, "speed": 0, "type": "mirror"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "concluded", "id": "job2"}}
+{"timestamp": {"seconds": TIMESTAMP, "microseconds": TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "null", "id": "job2"}}
+{"return": {}}
+Images are identical.
+destination image is sparse
+{"execute":"quit"}
+*** done
--
2.49.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2 11/11] mirror: Allow QMP override to declare target already zero
2025-04-17 18:39 [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
` (9 preceding siblings ...)
2025-04-17 18:39 ` [PATCH v2 10/11] tests: Add iotest mirror-sparse for recent patches Eric Blake
@ 2025-04-17 18:39 ` Eric Blake
2025-04-17 20:57 ` Stefan Hajnoczi
2025-04-18 4:47 ` Markus Armbruster
2025-04-17 20:59 ` [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Stefan Hajnoczi
2025-04-18 21:52 ` [PATCH v2.5 01/11] block: Expand block status mode from bool to flags Eric Blake
12 siblings, 2 replies; 36+ messages in thread
From: Eric Blake @ 2025-04-17 18:39 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-block, vsementsov, John Snow, Kevin Wolf, Hanna Reitz,
Markus Armbruster
QEMU's attempts to learn whether a destination file starts life with
all zero contents are just a heuristic. There may be cases where the
caller is aware of information that QEMU cannot learn quickly, in
which case telling QEMU what to assume about the destination can make
the mirror operation faster. Given our existing example of "qemu-img
convert --target-is-zero", it is time to expose this override in QMP
for blockdev-mirror as well.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
qapi/block-core.json | 8 +++++++-
include/block/block_int-global-state.h | 3 ++-
block/mirror.c | 23 +++++++++++++++--------
blockdev.c | 18 +++++++++++-------
tests/unit/test-block-iothread.c | 2 +-
5 files changed, 36 insertions(+), 18 deletions(-)
diff --git a/qapi/block-core.json b/qapi/block-core.json
index b1937780e19..7f70ec6d3cb 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2538,6 +2538,11 @@
# disappear from the query list without user intervention.
# Defaults to true. (Since 3.1)
#
+# @target-is-zero: Assume the destination reads as all zeroes before
+# the mirror started. Setting this to true can speed up the
+# mirror. Setting this to true when the destination is not
+# actually all zero can corrupt the destination. (Since 10.1)
+#
# Since: 2.6
#
# .. qmp-example::
@@ -2557,7 +2562,8 @@
'*on-target-error': 'BlockdevOnError',
'*filter-node-name': 'str',
'*copy-mode': 'MirrorCopyMode',
- '*auto-finalize': 'bool', '*auto-dismiss': 'bool' },
+ '*auto-finalize': 'bool', '*auto-dismiss': 'bool',
+ '*target-is-zero': 'bool'},
'allow-preconfig': true }
##
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index eb2d92a2261..a2b96f90d44 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -140,6 +140,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
* @mode: Whether to collapse all images in the chain to the target.
* @backing_mode: How to establish the target's backing chain after completion.
* @zero_target: Whether the target should be explicitly zero-initialized
+ * @target_is_zero: Whether the target already is zero-initialized
* @on_source_error: The action to take upon error reading from the source.
* @on_target_error: The action to take upon error writing to the target.
* @unmap: Whether to unmap target where source sectors only contain zeroes.
@@ -159,7 +160,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
int creation_flags, int64_t speed,
uint32_t granularity, int64_t buf_size,
MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
- bool zero_target,
+ bool zero_target, bool target_is_zero,
BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
bool unmap, const char *filter_node_name,
diff --git a/block/mirror.c b/block/mirror.c
index 4770d87abf6..22cca5d22aa 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -55,6 +55,8 @@ typedef struct MirrorBlockJob {
BlockMirrorBackingMode backing_mode;
/* Whether the target image requires explicit zero-initialization */
bool zero_target;
+ /* Whether the target should be assumed to be already zero initialized */
+ bool target_is_zero;
/*
* To be accesssed with atomics. Written only under the BQL (required by the
* current implementation of mirror_change()).
@@ -883,9 +885,13 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
int64_t bitmap_length = DIV_ROUND_UP(s->bdev_length, s->granularity);
offset = 0;
- bdrv_graph_co_rdlock();
- ret = bdrv_co_is_all_zeroes(target_bs);
- bdrv_graph_co_rdunlock();
+ if (s->target_is_zero) {
+ ret = 1;
+ } else {
+ bdrv_graph_co_rdlock();
+ ret = bdrv_co_is_all_zeroes(target_bs);
+ bdrv_graph_co_rdunlock();
+ }
if (ret < 0) {
return ret;
}
@@ -1782,7 +1788,7 @@ static BlockJob *mirror_start_job(
const char *replaces, int64_t speed,
uint32_t granularity, int64_t buf_size,
BlockMirrorBackingMode backing_mode,
- bool zero_target,
+ bool zero_target, bool target_is_zero,
BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
bool unmap,
@@ -1951,6 +1957,7 @@ static BlockJob *mirror_start_job(
s->is_none_mode = is_none_mode;
s->backing_mode = backing_mode;
s->zero_target = zero_target;
+ s->target_is_zero = target_is_zero;
qatomic_set(&s->copy_mode, copy_mode);
s->base = base;
s->base_overlay = bdrv_find_overlay(bs, base);
@@ -2079,7 +2086,7 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
int creation_flags, int64_t speed,
uint32_t granularity, int64_t buf_size,
MirrorSyncMode mode, BlockMirrorBackingMode backing_mode,
- bool zero_target,
+ bool zero_target, bool target_is_zero,
BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
bool unmap, const char *filter_node_name,
@@ -2104,8 +2111,8 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
mirror_start_job(job_id, bs, creation_flags, target, replaces,
speed, granularity, buf_size, backing_mode, zero_target,
- on_source_error, on_target_error, unmap, NULL, NULL,
- &mirror_job_driver, is_none_mode, base, false,
+ target_is_zero, on_source_error, on_target_error, unmap,
+ NULL, NULL, &mirror_job_driver, is_none_mode, base, false,
filter_node_name, true, copy_mode, false, errp);
}
@@ -2131,7 +2138,7 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
job = mirror_start_job(
job_id, bs, creation_flags, base, NULL, speed, 0, 0,
- MIRROR_LEAVE_BACKING_CHAIN, false,
+ MIRROR_LEAVE_BACKING_CHAIN, false, false,
on_error, on_error, true, cb, opaque,
&commit_active_job_driver, false, base, auto_complete,
filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
diff --git a/blockdev.c b/blockdev.c
index 1d1f27cfff6..6f5373991c8 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2798,7 +2798,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
const char *replaces,
enum MirrorSyncMode sync,
BlockMirrorBackingMode backing_mode,
- bool zero_target,
+ bool zero_target, bool target_is_zero,
bool has_speed, int64_t speed,
bool has_granularity, uint32_t granularity,
bool has_buf_size, int64_t buf_size,
@@ -2909,11 +2909,10 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
/* pass the node name to replace to mirror start since it's loose coupling
* and will allow to check whether the node still exist at mirror completion
*/
- mirror_start(job_id, bs, target,
- replaces, job_flags,
+ mirror_start(job_id, bs, target, replaces, job_flags,
speed, granularity, buf_size, sync, backing_mode, zero_target,
- on_source_error, on_target_error, unmap, filter_node_name,
- copy_mode, errp);
+ target_is_zero, on_source_error, on_target_error, unmap,
+ filter_node_name, copy_mode, errp);
}
void qmp_drive_mirror(DriveMirror *arg, Error **errp)
@@ -2928,6 +2927,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
int64_t size;
const char *format = arg->format;
bool zero_target;
+ bool target_is_zero;
int ret;
bs = qmp_get_root_bs(arg->device, errp);
@@ -3044,6 +3044,8 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
zero_target = (arg->sync == MIRROR_SYNC_MODE_FULL &&
(arg->mode == NEW_IMAGE_MODE_EXISTING ||
!bdrv_has_zero_init(target_bs)));
+ target_is_zero = (arg->mode == NEW_IMAGE_MODE_ABSOLUTE_PATHS &&
+ bdrv_has_zero_init(target_bs));
bdrv_graph_rdunlock_main_loop();
@@ -3055,7 +3057,7 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
blockdev_mirror_common(arg->job_id, bs, target_bs,
arg->replaces, arg->sync,
- backing_mode, zero_target,
+ backing_mode, zero_target, target_is_zero,
arg->has_speed, arg->speed,
arg->has_granularity, arg->granularity,
arg->has_buf_size, arg->buf_size,
@@ -3085,6 +3087,7 @@ void qmp_blockdev_mirror(const char *job_id,
bool has_copy_mode, MirrorCopyMode copy_mode,
bool has_auto_finalize, bool auto_finalize,
bool has_auto_dismiss, bool auto_dismiss,
+ bool has_target_is_zero, bool target_is_zero,
Error **errp)
{
BlockDriverState *bs;
@@ -3115,7 +3118,8 @@ void qmp_blockdev_mirror(const char *job_id,
blockdev_mirror_common(job_id, bs, target_bs,
replaces, sync, backing_mode,
- zero_target, has_speed, speed,
+ zero_target, has_target_is_zero && target_is_zero,
+ has_speed, speed,
has_granularity, granularity,
has_buf_size, buf_size,
has_on_source_error, on_source_error,
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index 8189b32fd52..ffc878d401e 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -755,7 +755,7 @@ static void test_propagate_mirror(void)
/* Start a mirror job */
mirror_start("job0", src, target, NULL, JOB_DEFAULT, 0, 0, 0,
- MIRROR_SYNC_MODE_NONE, MIRROR_OPEN_BACKING_CHAIN, false,
+ MIRROR_SYNC_MODE_NONE, MIRROR_OPEN_BACKING_CHAIN, false, false,
BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
false, "filter_node", MIRROR_COPY_MODE_BACKGROUND,
&error_abort);
--
2.49.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2 01/11] block: Expand block status mode from bool to enum
2025-04-17 18:39 ` [PATCH v2 01/11] block: Expand block status mode from bool to enum Eric Blake
@ 2025-04-17 20:17 ` Stefan Hajnoczi
2025-04-18 19:02 ` Eric Blake
0 siblings, 1 reply; 36+ messages in thread
From: Stefan Hajnoczi @ 2025-04-17 20:17 UTC (permalink / raw)
To: Eric Blake
Cc: qemu-devel, qemu-block, vsementsov, Kevin Wolf, Hanna Reitz,
John Snow, Fam Zheng, Ronnie Sahlberg, Paolo Bonzini,
Peter Lieven, Denis V. Lunev, Alberto Garcia, Ilya Dryomov,
Stefan Weil, open list:GLUSTER
[-- Attachment #1: Type: text/plain, Size: 5411 bytes --]
On Thu, Apr 17, 2025 at 01:39:06PM -0500, Eric Blake wrote:
> This patch is purely mechanical, changing bool want_zero into a new
> enum BlockStatusMode. As of this patch, all implementations are
> unchanged (the old want_zero==true is now mode==BDRV_BSTAT_PRECISE),
> but the callers in io.c are set up so that future patches will be able
> to differente between whether the caller cares more about allocation
differentiate
> or about reads-as-zero, for driver implementations that will actually
> want to behave differently for those more-specific hints.
>
> As for the background why this patch is useful: right now, the
> file-posix driver recognizes that if allocation is being queried, the
> entire image can be reported as allocated (there is no backing file to
> refer to) - but this throws away information on whether the entire
> image reads as zero (trivially true if lseek(SEEK_HOLE) at offset 0
> returns -ENXIO, a bit more complicated to prove if the raw file was
> created with 'qemu-img create' since we intentionally allocate a small
> chunk of all-zero data to help with alignment probing). The next
> patches will add a generic algorithm for seeing if an entire file
> reads as zeroes, as well as tweak the file-posix driver to react to
> the new hints.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> block/coroutines.h | 4 +--
> include/block/block-common.h | 26 ++++++++++++++++
> include/block/block_int-common.h | 25 +++++++++-------
> include/block/block_int-io.h | 4 +--
> block/io.c | 51 ++++++++++++++++----------------
> block/blkdebug.c | 6 ++--
> block/copy-before-write.c | 4 +--
> block/file-posix.c | 4 +--
> block/gluster.c | 4 +--
> block/iscsi.c | 6 ++--
> block/nbd.c | 4 +--
> block/null.c | 6 ++--
> block/parallels.c | 6 ++--
> block/qcow.c | 2 +-
> block/qcow2.c | 6 ++--
> block/qed.c | 6 ++--
> block/quorum.c | 4 +--
> block/raw-format.c | 4 +--
> block/rbd.c | 6 ++--
> block/snapshot-access.c | 4 +--
> block/vdi.c | 4 +--
> block/vmdk.c | 2 +-
> block/vpc.c | 2 +-
> block/vvfat.c | 6 ++--
> tests/unit/test-block-iothread.c | 2 +-
> 25 files changed, 114 insertions(+), 84 deletions(-)
>
> diff --git a/block/coroutines.h b/block/coroutines.h
> index 79e5efbf752..c8323aa67e6 100644
> --- a/block/coroutines.h
> +++ b/block/coroutines.h
> @@ -47,7 +47,7 @@ int coroutine_fn GRAPH_RDLOCK
> bdrv_co_common_block_status_above(BlockDriverState *bs,
> BlockDriverState *base,
> bool include_base,
> - bool want_zero,
> + enum BlockStatusMode mode,
> int64_t offset,
> int64_t bytes,
> int64_t *pnum,
> @@ -78,7 +78,7 @@ int co_wrapper_mixed_bdrv_rdlock
> bdrv_common_block_status_above(BlockDriverState *bs,
> BlockDriverState *base,
> bool include_base,
> - bool want_zero,
> + enum BlockStatusMode mode,
> int64_t offset,
> int64_t bytes,
> int64_t *pnum,
> diff --git a/include/block/block-common.h b/include/block/block-common.h
> index 0b831ef87b1..619e75b9c8d 100644
> --- a/include/block/block-common.h
> +++ b/include/block/block-common.h
> @@ -508,6 +508,32 @@ enum BdrvChildRoleBits {
> | BDRV_CHILD_PRIMARY,
> };
>
> +/* Modes for block status calls */
> +enum BlockStatusMode {
> + /*
> + * Status should be as accurate as possible: _OFFSET_VALID
> + * and_OFFSET_ZERO should each be set where efficiently possible,
"and _OFFSET_ZERO"
> + * extents may be smaller, and iteration through the entire block
> + * device may take more calls.
> + */
> + BDRV_BSTAT_PRECISE,
> +
> + /*
> + * The caller is primarily concerned about overall allocation:
> + * favor larger *pnum, perhaps by coalescing extents and reporting
> + * _DATA instead of _ZERO, and without needing to read data or
> + * bothering with _OFFSET_VALID.
> + */
> + BDRV_BSTAT_ALLOCATED,
> +
> + /*
> + * The caller is primarily concerned about whether the device
> + * reads as zero: favor a result of _ZERO, even if it requires
> + * reading a few sectors to verify, without needing _OFFSET_VALID.
> + */
> + BDRV_BSTAT_ZERO,
> +};
I have trouble understanding what the exact semantics are of these modes
are. Would it be possible to pass flags to block status calls that can
be ORed together instead: WANT_OFFSET_VALID, WANT_ZERO, etc? The flags
would be orthogonal and easier to understand than modes that seem to
combine multiple flag behaviors.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents
2025-04-17 18:39 ` [PATCH v2 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents Eric Blake
@ 2025-04-17 20:21 ` Stefan Hajnoczi
0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2025-04-17 20:21 UTC (permalink / raw)
To: Eric Blake
Cc: qemu-devel, qemu-block, vsementsov, Fam Zheng, Kevin Wolf,
Hanna Reitz
[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]
On Thu, Apr 17, 2025 at 01:39:08PM -0500, Eric Blake wrote:
> Some BDS drivers have a cap on how much block status they can supply
> in one query (for example, NBD talking to an older server cannot
> inspect more than 4G per query; and qcow2 tends to cap its answers
> rather than cross a cluster boundary of an L1 table). Although the
> existing callers of bdrv_co_is_zero_fast are not passing in that large
> of a 'bytes' parameter, an upcoming caller wants to query the entire
> image at once, and will thus benefit from being able to treat adjacent
> zero regions in a coalesced manner, rather than claiming the region is
> non-zero merely because pnum was truncated and didn't match the
> incoming bytes.
>
> While refactoring this into a loop, note that there is no need to
> assign pnum prior to calling bdrv_co_common_block_status_above() (it
> is guaranteed to be assigned deeper in the callstack).
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> block/io.c | 27 +++++++++++++++------------
> 1 file changed, 15 insertions(+), 12 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 04/11] block: Add new bdrv_co_is_all_zeroes() function
2025-04-17 18:39 ` [PATCH v2 04/11] block: Add new bdrv_co_is_all_zeroes() function Eric Blake
@ 2025-04-17 20:35 ` Stefan Hajnoczi
2025-04-18 19:07 ` Eric Blake
0 siblings, 1 reply; 36+ messages in thread
From: Stefan Hajnoczi @ 2025-04-17 20:35 UTC (permalink / raw)
To: Eric Blake
Cc: qemu-devel, qemu-block, vsementsov, Fam Zheng, Kevin Wolf,
Hanna Reitz
[-- Attachment #1: Type: text/plain, Size: 5072 bytes --]
On Thu, Apr 17, 2025 at 01:39:09PM -0500, Eric Blake wrote:
> There are some optimizations that require knowing if an image starts
> out as reading all zeroes, such as making blockdev-mirror faster by
> skipping the copying of source zeroes to the destination. The
> existing bdrv_co_is_zero_fast() is a good building block for answering
> this question, but it tends to give an answer of 0 for a file we just
> created via QMP 'blockdev-create' or similar (such as 'qemu-img create
> -f raw'). Why? Because file-posix.c insists on allocating a tiny
> header to any file rather than leaving it 100% sparse, due to some
> filesystems that are unable to answer alignment probes on a hole. But
> teaching file-posix.c to read the tiny header doesn't scale - the
> problem of a small header is also visible when libvirt sets up an NBD
> client to a just-created file on a migration destination host.
>
> So, we need a wrapper function that handles a bit more complexity in a
> common manner for all block devices - when the BDS is mostly a hole,
> but has a small non-hole header, it is still worth the time to read
> that header and check if it reads as all zeroes before giving up and
> returning a pessimistic answer.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> include/block/block-io.h | 2 ++
> block/io.c | 58 ++++++++++++++++++++++++++++++++++++++++
> 2 files changed, 60 insertions(+)
>
> diff --git a/include/block/block-io.h b/include/block/block-io.h
> index b49e0537dd4..b99cc98d265 100644
> --- a/include/block/block-io.h
> +++ b/include/block/block-io.h
> @@ -161,6 +161,8 @@ bdrv_is_allocated_above(BlockDriverState *bs, BlockDriverState *base,
>
> int coroutine_fn GRAPH_RDLOCK
> bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset, int64_t bytes);
> +int coroutine_fn GRAPH_RDLOCK
> +bdrv_co_is_all_zeroes(BlockDriverState *bs);
>
> int GRAPH_RDLOCK
> bdrv_apply_auto_read_only(BlockDriverState *bs, const char *errmsg,
> diff --git a/block/io.c b/block/io.c
> index 6ef78070915..dc1341e4029 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2778,6 +2778,64 @@ int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
> return 1;
> }
>
> +/*
> + * Check @bs (and its backing chain) to see if the entire image is known
> + * to read as zeroes.
> + * Return 1 if that is the case, 0 otherwise and -errno on error.
> + * This test is meant to be fast rather than accurate so returning 0
> + * does not guarantee non-zero data; however, it can report 1 in more
False negatives are possible, let's also document that false positives
are not possible:
This test is mean to be fast rather than accurate so returning 0 does
not guarantee non-zero data, but returning 1 does guarantee all zero
data; ...
> + * cases than bdrv_co_is_zero_fast.
> + */
> +int coroutine_fn bdrv_co_is_all_zeroes(BlockDriverState *bs)
> +{
> + int ret;
> + int64_t pnum, bytes;
> + char *buf;
> + QEMUIOVector local_qiov;
> + IO_CODE();
> +
> + bytes = bdrv_co_getlength(bs);
> + if (bytes < 0) {
> + return bytes;
> + }
> +
> + /* First probe - see if the entire image reads as zero */
> + ret = bdrv_co_common_block_status_above(bs, NULL, false, BDRV_BSTAT_ZERO,
> + 0, bytes, &pnum, NULL, NULL,
> + NULL);
> + if (ret < 0) {
> + return ret;
> + }
> + if (ret & BDRV_BLOCK_ZERO) {
> + return bdrv_co_is_zero_fast(bs, pnum, bytes - pnum);
> + }
> +
> + /*
> + * Because of the way 'blockdev-create' works, raw files tend to
> + * be created with a non-sparse region at the front to make
> + * alignment probing easier. If the block starts with only a
> + * small allocated region, it is still worth the effort to see if
> + * the rest of the image is still sparse, coupled with manually
> + * reading the first region to see if it reads zero after all.
> + */
> + if (pnum > qemu_real_host_page_size()) {
Probably not worth it for the corner case, but replacing
qemu_real_host_page_size() with 128 KiB would allow this to work on
images created on different CPU architectures (4 KiB vs 64 KiB page
sizes).
> + return 0;
> + }
> + ret = bdrv_co_is_zero_fast(bs, pnum, bytes - pnum);
> + if (ret <= 0) {
> + return ret;
> + }
> + /* Only the head of the image is unknown, and it's small. Read it. */
> + buf = qemu_blockalign(bs, pnum);
> + qemu_iovec_init_buf(&local_qiov, buf, pnum);
> + ret = bdrv_driver_preadv(bs, 0, pnum, &local_qiov, 0, 0);
> + if (ret >= 0) {
> + ret = buffer_is_zero(buf, pnum);
> + }
> + qemu_vfree(buf);
> + return ret;
> +}
> +
> int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t offset,
> int64_t bytes, int64_t *pnum)
> {
> --
> 2.49.0
>
>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 05/11] iotests: Improve iotest 194 to mirror data
2025-04-17 18:39 ` [PATCH v2 05/11] iotests: Improve iotest 194 to mirror data Eric Blake
@ 2025-04-17 20:39 ` Stefan Hajnoczi
0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2025-04-17 20:39 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, vsementsov, Kevin Wolf, Hanna Reitz
[-- Attachment #1: Type: text/plain, Size: 456 bytes --]
On Thu, Apr 17, 2025 at 01:39:10PM -0500, Eric Blake wrote:
> Mirroring a completely sparse image to a sparse destination should be
> practically instantaneous. It isn't yet, but the test will be more
> realistic if it has some non-zero to mirror as well as the holes.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> tests/qemu-iotests/194 | 1 +
> 1 file changed, 1 insertion(+)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 06/11] mirror: Minor refactoring
2025-04-17 18:39 ` [PATCH v2 06/11] mirror: Minor refactoring Eric Blake
@ 2025-04-17 20:42 ` Stefan Hajnoczi
0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2025-04-17 20:42 UTC (permalink / raw)
To: Eric Blake
Cc: qemu-devel, qemu-block, vsementsov, John Snow, Kevin Wolf,
Hanna Reitz
[-- Attachment #1: Type: text/plain, Size: 781 bytes --]
On Thu, Apr 17, 2025 at 01:39:11PM -0500, Eric Blake wrote:
> Commit 5791ba52 (v9.2) pre-initialized ret in mirror_dirty_init to
> silence a false positive compiler warning, even though in all code
> paths where ret is used, it was guaranteed to be reassigned
> beforehand. But since the function returns -errno, and -1 is not
> always the right errno, it's better to initialize to -EIO.
>
> An upcoming patch wants to track two bitmaps in
> do_sync_target_write(); this will be easier if the current variables
> related to the dirty bitmap are renamed.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> block/mirror.c | 22 +++++++++++-----------
> 1 file changed, 11 insertions(+), 11 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 07/11] mirror: Skip pre-zeroing destination if it is already zero
2025-04-17 18:39 ` [PATCH v2 07/11] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
@ 2025-04-17 20:46 ` Stefan Hajnoczi
2025-04-24 17:10 ` Eric Blake
1 sibling, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2025-04-17 20:46 UTC (permalink / raw)
To: Eric Blake
Cc: qemu-devel, qemu-block, vsementsov, John Snow, Kevin Wolf,
Hanna Reitz
[-- Attachment #1: Type: text/plain, Size: 1626 bytes --]
On Thu, Apr 17, 2025 at 01:39:12PM -0500, Eric Blake wrote:
> When doing a sync=full mirroring, QMP drive-mirror requests full
> zeroing if it did not just create the destination, and blockdev-mirror
> requests full zeroing unconditionally. This is because during a full
> sync, we must ensure that the portions of the disk that are not
> otherwise touched by the source still read as zero upon completion.
>
> However, in mirror_dirty_init(), we were blindly assuming that if the
> destination allows punching holes, we should pre-zero the entire
> image; and if it does not allow punching holes, then treat the entire
> source as dirty rather than mirroring just the allocated portions of
> the source. Without the ability to punch holes, this results in the
> destination file being fully allocated; and even when punching holes
> is supported, it causes duplicate I/O to the portions of the
> destination corresponding to chunks of the source that are allocated
> but read as zero.
>
> Smarter is to avoid the pre-zeroing pass over the destination if it
> can be proved the destination already reads as zero. Note that a
> later patch will then further improve things to skip writing to the
> destination for parts of the image where the source is zero; but even
> with just this patch, it is possible to see a difference for any BDS
> that can quickly report that it already reads as zero.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> block/mirror.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 08/11] mirror: Skip writing zeroes when target is already zero
2025-04-17 18:39 ` [PATCH v2 08/11] mirror: Skip writing zeroes when target " Eric Blake
@ 2025-04-17 20:54 ` Stefan Hajnoczi
2025-04-23 16:42 ` Sunny Zhu
1 sibling, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2025-04-17 20:54 UTC (permalink / raw)
To: Eric Blake
Cc: qemu-devel, qemu-block, vsementsov, John Snow, Kevin Wolf,
Hanna Reitz
[-- Attachment #1: Type: text/plain, Size: 3383 bytes --]
On Thu, Apr 17, 2025 at 01:39:13PM -0500, Eric Blake wrote:
> When mirroring, the goal is to ensure that the destination reads the
> same as the source; this goal is met whether the destination is sparse
> or fully-allocated. However, if the destination cannot efficiently
> write zeroes, then any time the mirror operation wants to copy zeroes
> from the source to the destination (either during the background over
> sparse regions when doing a full mirror, or in the foreground when the
> guest actively writes zeroes), we were causing the destination to
> fully allocate that portion of the disk, even if it already read as
> zeroes.
>
> The effect is especially pronounced when the source is a raw file.
> That's because when the source is a qcow2 file, the dirty bitmap only
> visits the portions of the source that are allocated, which tend to be
> non-zero. But when the source is a raw file,
> bdrv_co_is_allocated_above() reports the entire file as allocated so
> mirror_dirty_init sets the entire dirty bitmap, and it is only later
> during mirror_iteration that we change to consulting the more precise
> bdrv_co_block_status_above() to learn where the source reads as zero.
>
> Remember that since a mirror operation can write a cluster more than
> once (every time the guest changes the source, the destination is also
> changed to keep up), we can't take the shortcut of relying on
> s->zero_target (which is static for the life of the job) in
> mirror_co_zero() to see if the destination is already zero, because
> that information may be stale. Any solution we use must be dynamic in
> the face of the guest writing or discarding a cluster while the mirror
> has been ongoing.
>
> We could just teach mirror_co_zero() to do a block_status() probe of
> the destination, and skip the zeroes if the destination already reads
> as zero, but we know from past experience that extra block_status()
> calls are not always cheap (tmpfs, anyone?), especially when they are
> random access rather than linear. Use of block_status() of the source
> by the background task in a linear fashion is not our bottleneck (it's
> a background task, after all); but since mirroring can be done while
> the source is actively being changed, we don't want a slow
> block_status() of the destination to occur on the hot path of the
> guest trying to do random-access writes to the source.
>
> So this patch takes a slightly different approach: any time we have to
> transfer the full image, we know that mirror_dirty_init() is _already_
> doing a pre-zero pass over the entire destination. Therefore, if we
> track which clusters of the destination are zero at any given moment,
> we don't have to do a block_status() call on the destination, but can
> instead just refer to the zero bitmap associated with the job.
>
> With this patch, if I create a raw sparse destination file, connect it
> with QMP 'blockdev-add' while leaving it at the default "discard":
> "ignore", then run QMP 'blockdev-mirror' with "sync": "full", the
> destination remains sparse rather than fully allocated.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> block/mirror.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 65 insertions(+), 5 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 09/11] iotests/common.rc: add disk_usage function
2025-04-17 18:39 ` [PATCH v2 09/11] iotests/common.rc: add disk_usage function Eric Blake
@ 2025-04-17 20:54 ` Stefan Hajnoczi
0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2025-04-17 20:54 UTC (permalink / raw)
To: Eric Blake
Cc: qemu-devel, qemu-block, vsementsov, Andrey Drobyshev,
Alexander Ivanov, Alberto Garcia, Kevin Wolf, Hanna Reitz
[-- Attachment #1: Type: text/plain, Size: 826 bytes --]
On Thu, Apr 17, 2025 at 01:39:14PM -0500, Eric Blake wrote:
> From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>
> Move the definition from iotests/250 to common.rc. This is used to
> detect real disk usage of sparse files. In particular, we want to use
> it for checking subclusters-based discards.
>
> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> Reviewed-by: Alexander Ivanov <alexander.ivanov@virtuozzo.com>
> Reviewed-by: Alberto Garcia <berto@igalia.com>
> Message-ID: <20240913163942.423050-6-andrey.drobyshev@virtuozzo.com>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> tests/qemu-iotests/common.rc | 6 ++++++
> tests/qemu-iotests/250 | 5 -----
> 2 files changed, 6 insertions(+), 5 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 10/11] tests: Add iotest mirror-sparse for recent patches
2025-04-17 18:39 ` [PATCH v2 10/11] tests: Add iotest mirror-sparse for recent patches Eric Blake
@ 2025-04-17 20:55 ` Stefan Hajnoczi
0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2025-04-17 20:55 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, vsementsov, Kevin Wolf, Hanna Reitz
[-- Attachment #1: Type: text/plain, Size: 1290 bytes --]
On Thu, Apr 17, 2025 at 01:39:15PM -0500, Eric Blake wrote:
> Prove that blockdev-mirror can now result in sparse raw destination
> files, regardless of whether the source is raw or qcow2. By making
> this a separate test, it was possible to test effects of individual
> patches for the various pieces that all have to work together for a
> sparse mirror to be successful.
>
> Note that ./check -file produces different job lengths than ./check
> -qcow2 (the test uses a filter to normalize); that's because when
> deciding how much of the image to be mirrored, the code looks at how
> much of the source image was allocated (for qcow2, this is only the
> written clusters; for raw, it is the entire file). But the important
> part is that the destination file ends up smaller than 3M, rather than
> the 20M it used to be before this patch series.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> tests/qemu-iotests/tests/mirror-sparse | 109 +++++++++++++++
> tests/qemu-iotests/tests/mirror-sparse.out | 153 +++++++++++++++++++++
> 2 files changed, 262 insertions(+)
> create mode 100755 tests/qemu-iotests/tests/mirror-sparse
> create mode 100644 tests/qemu-iotests/tests/mirror-sparse.out
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 11/11] mirror: Allow QMP override to declare target already zero
2025-04-17 18:39 ` [PATCH v2 11/11] mirror: Allow QMP override to declare target already zero Eric Blake
@ 2025-04-17 20:57 ` Stefan Hajnoczi
2025-04-18 4:47 ` Markus Armbruster
1 sibling, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2025-04-17 20:57 UTC (permalink / raw)
To: Eric Blake
Cc: qemu-devel, qemu-block, vsementsov, John Snow, Kevin Wolf,
Hanna Reitz, Markus Armbruster
[-- Attachment #1: Type: text/plain, Size: 993 bytes --]
On Thu, Apr 17, 2025 at 01:39:16PM -0500, Eric Blake wrote:
> QEMU's attempts to learn whether a destination file starts life with
> all zero contents are just a heuristic. There may be cases where the
> caller is aware of information that QEMU cannot learn quickly, in
> which case telling QEMU what to assume about the destination can make
> the mirror operation faster. Given our existing example of "qemu-img
> convert --target-is-zero", it is time to expose this override in QMP
> for blockdev-mirror as well.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> qapi/block-core.json | 8 +++++++-
> include/block/block_int-global-state.h | 3 ++-
> block/mirror.c | 23 +++++++++++++++--------
> blockdev.c | 18 +++++++++++-------
> tests/unit/test-block-iothread.c | 2 +-
> 5 files changed, 36 insertions(+), 18 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 02/11] file-posix: Handle zero block status hint better
2025-04-17 18:39 ` [PATCH v2 02/11] file-posix: Handle zero block status hint better Eric Blake
@ 2025-04-17 20:58 ` Stefan Hajnoczi
0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2025-04-17 20:58 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, vsementsov, Kevin Wolf, Hanna Reitz
[-- Attachment #1: Type: text/plain, Size: 733 bytes --]
On Thu, Apr 17, 2025 at 01:39:07PM -0500, Eric Blake wrote:
> Although the previous patch to change 'bool want_zero' into an enum
> made no semantic change, it is now time to differentiate. When the
> caller specifically wants to know what parts of the file read as zero,
> we need to use lseek and actually reporting holes, rather than
> short-circuiting and advertising full allocation.
>
> This change will be utilized in later patches to let mirroring
> optimize for the case when the destination already reads as zeroes.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> block/file-posix.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases
2025-04-17 18:39 [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
` (10 preceding siblings ...)
2025-04-17 18:39 ` [PATCH v2 11/11] mirror: Allow QMP override to declare target already zero Eric Blake
@ 2025-04-17 20:59 ` Stefan Hajnoczi
2025-04-18 21:52 ` [PATCH v2.5 01/11] block: Expand block status mode from bool to flags Eric Blake
12 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2025-04-17 20:59 UTC (permalink / raw)
To: Eric Blake; +Cc: qemu-devel, qemu-block, vsementsov
[-- Attachment #1: Type: text/plain, Size: 182 bytes --]
Looks good overall. I left some comments on individual patches.
Maybe one day qemu-img convert and the mirror blockjob can be unified.
The write zeroes logic is duplicated.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 11/11] mirror: Allow QMP override to declare target already zero
2025-04-17 18:39 ` [PATCH v2 11/11] mirror: Allow QMP override to declare target already zero Eric Blake
2025-04-17 20:57 ` Stefan Hajnoczi
@ 2025-04-18 4:47 ` Markus Armbruster
1 sibling, 0 replies; 36+ messages in thread
From: Markus Armbruster @ 2025-04-18 4:47 UTC (permalink / raw)
To: Eric Blake
Cc: qemu-devel, qemu-block, vsementsov, John Snow, Kevin Wolf,
Hanna Reitz
Eric Blake <eblake@redhat.com> writes:
> QEMU's attempts to learn whether a destination file starts life with
> all zero contents are just a heuristic. There may be cases where the
> caller is aware of information that QEMU cannot learn quickly, in
> which case telling QEMU what to assume about the destination can make
> the mirror operation faster. Given our existing example of "qemu-img
> convert --target-is-zero", it is time to expose this override in QMP
> for blockdev-mirror as well.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 01/11] block: Expand block status mode from bool to enum
2025-04-17 20:17 ` Stefan Hajnoczi
@ 2025-04-18 19:02 ` Eric Blake
2025-04-18 21:55 ` Eric Blake
0 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2025-04-18 19:02 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, qemu-block, vsementsov, Kevin Wolf, Hanna Reitz,
John Snow, Fam Zheng, Ronnie Sahlberg, Paolo Bonzini,
Peter Lieven, Denis V. Lunev, Alberto Garcia, Ilya Dryomov,
Stefan Weil, open list:GLUSTER
On Thu, Apr 17, 2025 at 04:17:55PM -0400, Stefan Hajnoczi wrote:
> On Thu, Apr 17, 2025 at 01:39:06PM -0500, Eric Blake wrote:
> > This patch is purely mechanical, changing bool want_zero into a new
> > enum BlockStatusMode. As of this patch, all implementations are
> > unchanged (the old want_zero==true is now mode==BDRV_BSTAT_PRECISE),
> > but the callers in io.c are set up so that future patches will be able
> > to differente between whether the caller cares more about allocation
>
> differentiate
>
> > or about reads-as-zero, for driver implementations that will actually
> > want to behave differently for those more-specific hints.
> >
> > As for the background why this patch is useful: right now, the
> > file-posix driver recognizes that if allocation is being queried, the
> > entire image can be reported as allocated (there is no backing file to
> > refer to) - but this throws away information on whether the entire
> > image reads as zero (trivially true if lseek(SEEK_HOLE) at offset 0
> > returns -ENXIO, a bit more complicated to prove if the raw file was
> > created with 'qemu-img create' since we intentionally allocate a small
> > chunk of all-zero data to help with alignment probing). The next
> > patches will add a generic algorithm for seeing if an entire file
> > reads as zeroes, as well as tweak the file-posix driver to react to
> > the new hints.
> >
> > +/* Modes for block status calls */
> > +enum BlockStatusMode {
> > + /*
> > + * Status should be as accurate as possible: _OFFSET_VALID
> > + * and_OFFSET_ZERO should each be set where efficiently possible,
>
> "and _OFFSET_ZERO"
>
> > + * extents may be smaller, and iteration through the entire block
> > + * device may take more calls.
> > + */
> > + BDRV_BSTAT_PRECISE,
> > +
> > + /*
> > + * The caller is primarily concerned about overall allocation:
> > + * favor larger *pnum, perhaps by coalescing extents and reporting
> > + * _DATA instead of _ZERO, and without needing to read data or
> > + * bothering with _OFFSET_VALID.
> > + */
> > + BDRV_BSTAT_ALLOCATED,
> > +
> > + /*
> > + * The caller is primarily concerned about whether the device
> > + * reads as zero: favor a result of _ZERO, even if it requires
> > + * reading a few sectors to verify, without needing _OFFSET_VALID.
> > + */
> > + BDRV_BSTAT_ZERO,
> > +};
>
> I have trouble understanding what the exact semantics are of these modes
> are. Would it be possible to pass flags to block status calls that can
> be ORed together instead: WANT_OFFSET_VALID, WANT_ZERO, etc? The flags
> would be orthogonal and easier to understand than modes that seem to
> combine multiple flag behaviors.
I can give that a try. If I'm understanding the request correctly, I
would map it as follows:
BDRV_BSTAT_PRECISE => WANT_ZERO | WANT_OFFSET_VALID | WANT_ALLOCATED
BDRV_BSTAT_ALLOCATED => WANT_ALLOCATED
BDRV_BSTAT_ZERO => WANT_ZERO
while still trying to keep it a mechanical conversion in this patch.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 04/11] block: Add new bdrv_co_is_all_zeroes() function
2025-04-17 20:35 ` Stefan Hajnoczi
@ 2025-04-18 19:07 ` Eric Blake
0 siblings, 0 replies; 36+ messages in thread
From: Eric Blake @ 2025-04-18 19:07 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, qemu-block, vsementsov, Fam Zheng, Kevin Wolf,
Hanna Reitz
On Thu, Apr 17, 2025 at 04:35:33PM -0400, Stefan Hajnoczi wrote:
> On Thu, Apr 17, 2025 at 01:39:09PM -0500, Eric Blake wrote:
> > There are some optimizations that require knowing if an image starts
> > out as reading all zeroes, such as making blockdev-mirror faster by
> > skipping the copying of source zeroes to the destination. The
> > existing bdrv_co_is_zero_fast() is a good building block for answering
> > this question, but it tends to give an answer of 0 for a file we just
> > created via QMP 'blockdev-create' or similar (such as 'qemu-img create
> > -f raw'). Why? Because file-posix.c insists on allocating a tiny
> > header to any file rather than leaving it 100% sparse, due to some
> > filesystems that are unable to answer alignment probes on a hole. But
> > teaching file-posix.c to read the tiny header doesn't scale - the
> > problem of a small header is also visible when libvirt sets up an NBD
> > client to a just-created file on a migration destination host.
> >
> > So, we need a wrapper function that handles a bit more complexity in a
> > common manner for all block devices - when the BDS is mostly a hole,
> > but has a small non-hole header, it is still worth the time to read
> > that header and check if it reads as all zeroes before giving up and
> > returning a pessimistic answer.
> >
> > Signed-off-by: Eric Blake <eblake@redhat.com>
> > ---
> > include/block/block-io.h | 2 ++
> > block/io.c | 58 ++++++++++++++++++++++++++++++++++++++++
> > 2 files changed, 60 insertions(+)
> >
> > diff --git a/include/block/block-io.h b/include/block/block-io.h
> > index b49e0537dd4..b99cc98d265 100644
> > --- a/include/block/block-io.h
> > +++ b/include/block/block-io.h
> > @@ -161,6 +161,8 @@ bdrv_is_allocated_above(BlockDriverState *bs, BlockDriverState *base,
> >
> > int coroutine_fn GRAPH_RDLOCK
> > bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset, int64_t bytes);
> > +int coroutine_fn GRAPH_RDLOCK
> > +bdrv_co_is_all_zeroes(BlockDriverState *bs);
> >
> > int GRAPH_RDLOCK
> > bdrv_apply_auto_read_only(BlockDriverState *bs, const char *errmsg,
> > diff --git a/block/io.c b/block/io.c
> > index 6ef78070915..dc1341e4029 100644
> > --- a/block/io.c
> > +++ b/block/io.c
> > @@ -2778,6 +2778,64 @@ int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
> > return 1;
> > }
> >
> > +/*
> > + * Check @bs (and its backing chain) to see if the entire image is known
> > + * to read as zeroes.
> > + * Return 1 if that is the case, 0 otherwise and -errno on error.
> > + * This test is meant to be fast rather than accurate so returning 0
> > + * does not guarantee non-zero data; however, it can report 1 in more
>
> False negatives are possible, let's also document that false positives
> are not possible:
>
> This test is mean to be fast rather than accurate so returning 0 does
> not guarantee non-zero data, but returning 1 does guarantee all zero
> data; ...
Copied from bdrv_co_is_zero_fast, but that wording can use a similar
treatment.
>
> > + * cases than bdrv_co_is_zero_fast.
> > + */
> > +int coroutine_fn bdrv_co_is_all_zeroes(BlockDriverState *bs)
> > +{
> > + int ret;
> > + int64_t pnum, bytes;
> > + char *buf;
> > + QEMUIOVector local_qiov;
> > + IO_CODE();
> > +
> > + bytes = bdrv_co_getlength(bs);
> > + if (bytes < 0) {
> > + return bytes;
> > + }
> > +
> > + /* First probe - see if the entire image reads as zero */
> > + ret = bdrv_co_common_block_status_above(bs, NULL, false, BDRV_BSTAT_ZERO,
> > + 0, bytes, &pnum, NULL, NULL,
> > + NULL);
> > + if (ret < 0) {
> > + return ret;
> > + }
> > + if (ret & BDRV_BLOCK_ZERO) {
> > + return bdrv_co_is_zero_fast(bs, pnum, bytes - pnum);
> > + }
> > +
> > + /*
> > + * Because of the way 'blockdev-create' works, raw files tend to
> > + * be created with a non-sparse region at the front to make
> > + * alignment probing easier. If the block starts with only a
> > + * small allocated region, it is still worth the effort to see if
> > + * the rest of the image is still sparse, coupled with manually
> > + * reading the first region to see if it reads zero after all.
> > + */
> > + if (pnum > qemu_real_host_page_size()) {
>
> Probably not worth it for the corner case, but replacing
> qemu_real_host_page_size() with 128 KiB would allow this to work on
> images created on different CPU architectures (4 KiB vs 64 KiB page
> sizes).
I picked the original value of qemu_real_host_page_size() based on
file-posix.c's allocate_first_block(); but agree that picking a
constant 64k or even 128k for all platforms (rather than tying it to
the host's page size) won't hurt. The key point remains that it
should be large enough to account for whatever file-posix.c does, yet
small enough that we aren't negating any potential optimization by the
time spent probing if the image reads as zeroes.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2.5 01/11] block: Expand block status mode from bool to flags
2025-04-17 18:39 [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
` (11 preceding siblings ...)
2025-04-17 20:59 ` [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Stefan Hajnoczi
@ 2025-04-18 21:52 ` Eric Blake
2025-04-18 21:52 ` [PATCH v2.5 02/11] file-posix, gluster: Handle zero block status hint better Eric Blake
` (2 more replies)
12 siblings, 3 replies; 36+ messages in thread
From: Eric Blake @ 2025-04-18 21:52 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-block, stefanha, Kevin Wolf, Hanna Reitz, John Snow,
Vladimir Sementsov-Ogievskiy, Fam Zheng, Ronnie Sahlberg,
Paolo Bonzini, Peter Lieven, Denis V. Lunev, Alberto Garcia,
Ilya Dryomov, Stefan Weil, open list:GLUSTER
This patch is purely mechanical, changing bool want_zero into an
unsigned int for bitwise-or of flags. As of this patch, all
implementations are unchanged (the old want_zero==true is now
mode==BDRV_WANT_PRECISE which is a superset of BDRV_WANT_ZERO); but
the callers in io.c that used to pass want_zero==false are now
prepared for future driver changes that can now distinguish bewteen
BDRV_WANT_ZERO vs. BDRV_WANT_ALLOCATED. The next patch will actually
change the file-posix driver along those lines, now that we have
more-specific hints.
As for the background why this patch is useful: right now, the
file-posix driver recognizes that if allocation is being queried, the
entire image can be reported as allocated (there is no backing file to
refer to) - but this throws away information on whether the entire
image reads as zero (trivially true if lseek(SEEK_HOLE) at offset 0
returns -ENXIO, a bit more complicated to prove if the raw file was
created with 'qemu-img create' since we intentionally allocate a small
chunk of all-zero data to help with alignment probing). Later patches
will add a generic algorithm for seeing if an entire file reads as
zeroes.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
In response to Stefan's ask for what it would look like as a bitmask
of flags instead of an enum. Only the first two patches of the series
change.
block/coroutines.h | 4 +--
include/block/block-common.h | 11 +++++++
include/block/block_int-common.h | 27 +++++++++--------
include/block/block_int-io.h | 4 +--
block/io.c | 51 ++++++++++++++++----------------
block/blkdebug.c | 6 ++--
block/copy-before-write.c | 4 +--
block/file-posix.c | 4 +--
block/gluster.c | 4 +--
block/iscsi.c | 6 ++--
block/nbd.c | 4 +--
block/null.c | 6 ++--
block/parallels.c | 6 ++--
block/qcow.c | 2 +-
block/qcow2.c | 6 ++--
block/qed.c | 6 ++--
block/quorum.c | 4 +--
block/raw-format.c | 4 +--
block/rbd.c | 6 ++--
block/snapshot-access.c | 4 +--
block/vdi.c | 4 +--
block/vmdk.c | 2 +-
block/vpc.c | 2 +-
block/vvfat.c | 6 ++--
tests/unit/test-block-iothread.c | 2 +-
25 files changed, 99 insertions(+), 86 deletions(-)
diff --git a/block/coroutines.h b/block/coroutines.h
index 79e5efbf752..892646bb7aa 100644
--- a/block/coroutines.h
+++ b/block/coroutines.h
@@ -47,7 +47,7 @@ int coroutine_fn GRAPH_RDLOCK
bdrv_co_common_block_status_above(BlockDriverState *bs,
BlockDriverState *base,
bool include_base,
- bool want_zero,
+ unsigned int mode,
int64_t offset,
int64_t bytes,
int64_t *pnum,
@@ -78,7 +78,7 @@ int co_wrapper_mixed_bdrv_rdlock
bdrv_common_block_status_above(BlockDriverState *bs,
BlockDriverState *base,
bool include_base,
- bool want_zero,
+ unsigned int mode,
int64_t offset,
int64_t bytes,
int64_t *pnum,
diff --git a/include/block/block-common.h b/include/block/block-common.h
index 0b831ef87b1..c8c626daeaa 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -333,6 +333,17 @@ typedef enum {
#define BDRV_BLOCK_RECURSE 0x40
#define BDRV_BLOCK_COMPRESSED 0x80
+/*
+ * Block status hints: the bitwise-or of these flags emphasize what
+ * the caller hopes to learn, and some drivers may be able to give
+ * faster answers by doing less work when the hint permits.
+ */
+#define BDRV_WANT_ZERO BDRV_BLOCK_ZERO
+#define BDRV_WANT_OFFSET_VALID BDRV_BLOCK_OFFSET_VALID
+#define BDRV_WANT_ALLOCATED BDRV_BLOCK_ALLOCATED
+#define BDRV_WANT_PRECISE (BDRV_WANT_ZERO | BDRV_WANT_OFFSET_VALID | \
+ BDRV_WANT_OFFSET_VALID)
+
typedef QTAILQ_HEAD(BlockReopenQueue, BlockReopenQueueEntry) BlockReopenQueue;
typedef struct BDRVReopenState {
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index ebb4e56a503..a9c0daa2a4d 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -608,15 +608,16 @@ struct BlockDriver {
* according to the current layer, and should only need to set
* BDRV_BLOCK_DATA, BDRV_BLOCK_ZERO, BDRV_BLOCK_OFFSET_VALID,
* and/or BDRV_BLOCK_RAW; if the current layer defers to a backing
- * layer, the result should be 0 (and not BDRV_BLOCK_ZERO). See
- * block.h for the overall meaning of the bits. As a hint, the
- * flag want_zero is true if the caller cares more about precise
- * mappings (favor accurate _OFFSET_VALID/_ZERO) or false for
- * overall allocation (favor larger *pnum, perhaps by reporting
- * _DATA instead of _ZERO). The block layer guarantees input
- * clamped to bdrv_getlength() and aligned to request_alignment,
- * as well as non-NULL pnum, map, and file; in turn, the driver
- * must return an error or set pnum to an aligned non-zero value.
+ * layer, the result should be 0 (and not BDRV_BLOCK_ZERO). The
+ * caller will synthesize BDRV_BLOCK_ALLOCATED based on the
+ * non-zero results. See block.h for the overall meaning of the
+ * bits. As a hint, the flags in @mode may include a bitwise-or
+ * of BDRV_WANT_ALLOCATED, BDRV_WANT_OFFSET_VALID, or
+ * BDRV_WANT_ZERO based on what the caller is looking for in the
+ * results. The block layer guarantees input clamped to
+ * bdrv_getlength() and aligned to request_alignment, as well as
+ * non-NULL pnum, map, and file; in turn, the driver must return
+ * an error or set pnum to an aligned non-zero value.
*
* Note that @bytes is just a hint on how big of a region the
* caller wants to inspect. It is not a limit on *pnum.
@@ -628,8 +629,8 @@ struct BlockDriver {
* to clamping *pnum for return to its caller.
*/
int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_block_status)(
- BlockDriverState *bs,
- bool want_zero, int64_t offset, int64_t bytes, int64_t *pnum,
+ BlockDriverState *bs, unsigned int mode,
+ int64_t offset, int64_t bytes, int64_t *pnum,
int64_t *map, BlockDriverState **file);
/*
@@ -653,8 +654,8 @@ struct BlockDriver {
QEMUIOVector *qiov, size_t qiov_offset);
int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_snapshot_block_status)(
- BlockDriverState *bs, bool want_zero, int64_t offset, int64_t bytes,
- int64_t *pnum, int64_t *map, BlockDriverState **file);
+ BlockDriverState *bs, unsigned int mode, int64_t offset,
+ int64_t bytes, int64_t *pnum, int64_t *map, BlockDriverState **file);
int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_pdiscard_snapshot)(
BlockDriverState *bs, int64_t offset, int64_t bytes);
diff --git a/include/block/block_int-io.h b/include/block/block_int-io.h
index 4a7cf2b4fdc..4f94eb3c5a2 100644
--- a/include/block/block_int-io.h
+++ b/include/block/block_int-io.h
@@ -38,8 +38,8 @@
int coroutine_fn GRAPH_RDLOCK bdrv_co_preadv_snapshot(BdrvChild *child,
int64_t offset, int64_t bytes, QEMUIOVector *qiov, size_t qiov_offset);
int coroutine_fn GRAPH_RDLOCK bdrv_co_snapshot_block_status(
- BlockDriverState *bs, bool want_zero, int64_t offset, int64_t bytes,
- int64_t *pnum, int64_t *map, BlockDriverState **file);
+ BlockDriverState *bs, unsigned int mode, int64_t offset,
+ int64_t bytes, int64_t *pnum, int64_t *map, BlockDriverState **file);
int coroutine_fn GRAPH_RDLOCK bdrv_co_pdiscard_snapshot(BlockDriverState *bs,
int64_t offset, int64_t bytes);
diff --git a/block/io.c b/block/io.c
index 1ba8d1aeea1..f625bf2ac8b 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2364,10 +2364,8 @@ int bdrv_flush_all(void)
* Drivers not implementing the functionality are assumed to not support
* backing files, hence all their sectors are reported as allocated.
*
- * If 'want_zero' is true, the caller is querying for mapping
- * purposes, with a focus on valid BDRV_BLOCK_OFFSET_VALID, _DATA, and
- * _ZERO where possible; otherwise, the result favors larger 'pnum',
- * with a focus on accurate BDRV_BLOCK_ALLOCATED.
+ * 'mode' serves as a hint as to which results are favored; see enum
+ * BlockStatusMode for details of the supported modes.
*
* If 'offset' is beyond the end of the disk image the return value is
* BDRV_BLOCK_EOF and 'pnum' is set to 0.
@@ -2387,7 +2385,7 @@ int bdrv_flush_all(void)
* set to the host mapping and BDS corresponding to the guest offset.
*/
static int coroutine_fn GRAPH_RDLOCK
-bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
+bdrv_co_do_block_status(BlockDriverState *bs, unsigned int mode,
int64_t offset, int64_t bytes,
int64_t *pnum, int64_t *map, BlockDriverState **file)
{
@@ -2476,7 +2474,7 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
local_file = bs;
local_map = aligned_offset;
} else {
- ret = bs->drv->bdrv_co_block_status(bs, want_zero, aligned_offset,
+ ret = bs->drv->bdrv_co_block_status(bs, mode, aligned_offset,
aligned_bytes, pnum, &local_map,
&local_file);
@@ -2488,10 +2486,10 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
* the cache requires an RCU update, so double check here to avoid
* such an update if possible.
*
- * Check want_zero, because we only want to update the cache when we
+ * Check mode, because we only want to update the cache when we
* have accurate information about what is zero and what is data.
*/
- if (want_zero &&
+ if (mode == BDRV_WANT_PRECISE &&
ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
QLIST_EMPTY(&bs->children))
{
@@ -2548,7 +2546,7 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
if (ret & BDRV_BLOCK_RAW) {
assert(ret & BDRV_BLOCK_OFFSET_VALID && local_file);
- ret = bdrv_co_do_block_status(local_file, want_zero, local_map,
+ ret = bdrv_co_do_block_status(local_file, mode, local_map,
*pnum, pnum, &local_map, &local_file);
goto out;
}
@@ -2560,7 +2558,7 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
if (!cow_bs) {
ret |= BDRV_BLOCK_ZERO;
- } else if (want_zero) {
+ } else if (mode == BDRV_WANT_PRECISE) {
int64_t size2 = bdrv_co_getlength(cow_bs);
if (size2 >= 0 && offset >= size2) {
@@ -2569,14 +2567,14 @@ bdrv_co_do_block_status(BlockDriverState *bs, bool want_zero,
}
}
- if (want_zero && ret & BDRV_BLOCK_RECURSE &&
+ if (mode == BDRV_WANT_PRECISE && ret & BDRV_BLOCK_RECURSE &&
local_file && local_file != bs &&
(ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
(ret & BDRV_BLOCK_OFFSET_VALID)) {
int64_t file_pnum;
int ret2;
- ret2 = bdrv_co_do_block_status(local_file, want_zero, local_map,
+ ret2 = bdrv_co_do_block_status(local_file, mode, local_map,
*pnum, &file_pnum, NULL, NULL);
if (ret2 >= 0) {
/* Ignore errors. This is just providing extra information, it
@@ -2627,7 +2625,7 @@ int coroutine_fn
bdrv_co_common_block_status_above(BlockDriverState *bs,
BlockDriverState *base,
bool include_base,
- bool want_zero,
+ unsigned int mode,
int64_t offset,
int64_t bytes,
int64_t *pnum,
@@ -2654,7 +2652,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
return 0;
}
- ret = bdrv_co_do_block_status(bs, want_zero, offset, bytes, pnum,
+ ret = bdrv_co_do_block_status(bs, mode, offset, bytes, pnum,
map, file);
++*depth;
if (ret < 0 || *pnum == 0 || ret & BDRV_BLOCK_ALLOCATED || bs == base) {
@@ -2671,7 +2669,7 @@ bdrv_co_common_block_status_above(BlockDriverState *bs,
for (p = bdrv_filter_or_cow_bs(bs); include_base || p != base;
p = bdrv_filter_or_cow_bs(p))
{
- ret = bdrv_co_do_block_status(p, want_zero, offset, bytes, pnum,
+ ret = bdrv_co_do_block_status(p, mode, offset, bytes, pnum,
map, file);
++*depth;
if (ret < 0) {
@@ -2734,7 +2732,8 @@ int coroutine_fn bdrv_co_block_status_above(BlockDriverState *bs,
BlockDriverState **file)
{
IO_CODE();
- return bdrv_co_common_block_status_above(bs, base, false, true, offset,
+ return bdrv_co_common_block_status_above(bs, base, false,
+ BDRV_WANT_PRECISE, offset,
bytes, pnum, map, file, NULL);
}
@@ -2765,8 +2764,9 @@ int coroutine_fn bdrv_co_is_zero_fast(BlockDriverState *bs, int64_t offset,
return 1;
}
- ret = bdrv_co_common_block_status_above(bs, NULL, false, false, offset,
- bytes, &pnum, NULL, NULL, NULL);
+ ret = bdrv_co_common_block_status_above(bs, NULL, false, BDRV_WANT_ZERO,
+ offset, bytes, &pnum, NULL, NULL,
+ NULL);
if (ret < 0) {
return ret;
@@ -2782,9 +2782,9 @@ int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t offset,
int64_t dummy;
IO_CODE();
- ret = bdrv_co_common_block_status_above(bs, bs, true, false, offset,
- bytes, pnum ? pnum : &dummy, NULL,
- NULL, NULL);
+ ret = bdrv_co_common_block_status_above(bs, bs, true, BDRV_WANT_ALLOCATED,
+ offset, bytes, pnum ? pnum : &dummy,
+ NULL, NULL, NULL);
if (ret < 0) {
return ret;
}
@@ -2817,7 +2817,8 @@ int coroutine_fn bdrv_co_is_allocated_above(BlockDriverState *bs,
int ret;
IO_CODE();
- ret = bdrv_co_common_block_status_above(bs, base, include_base, false,
+ ret = bdrv_co_common_block_status_above(bs, base, include_base,
+ BDRV_WANT_ALLOCATED,
offset, bytes, pnum, NULL, NULL,
&depth);
if (ret < 0) {
@@ -3709,8 +3710,8 @@ bdrv_co_preadv_snapshot(BdrvChild *child, int64_t offset, int64_t bytes,
}
int coroutine_fn
-bdrv_co_snapshot_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset, int64_t bytes,
+bdrv_co_snapshot_block_status(BlockDriverState *bs, unsigned int mode,
+ int64_t offset, int64_t bytes,
int64_t *pnum, int64_t *map,
BlockDriverState **file)
{
@@ -3728,7 +3729,7 @@ bdrv_co_snapshot_block_status(BlockDriverState *bs,
}
bdrv_inc_in_flight(bs);
- ret = drv->bdrv_co_snapshot_block_status(bs, want_zero, offset, bytes,
+ ret = drv->bdrv_co_snapshot_block_status(bs, mode, offset, bytes,
pnum, map, file);
bdrv_dec_in_flight(bs);
diff --git a/block/blkdebug.c b/block/blkdebug.c
index 1c1967f8e0a..c54aee0c84b 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -751,9 +751,9 @@ blkdebug_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
}
static int coroutine_fn GRAPH_RDLOCK
-blkdebug_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum, int64_t *map,
- BlockDriverState **file)
+blkdebug_co_block_status(BlockDriverState *bs, unsigned int mode,
+ int64_t offset, int64_t bytes, int64_t *pnum,
+ int64_t *map, BlockDriverState **file)
{
int err;
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index fd470f5f926..2badb3a8856 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -291,8 +291,8 @@ cbw_co_preadv_snapshot(BlockDriverState *bs, int64_t offset, int64_t bytes,
}
static int coroutine_fn GRAPH_RDLOCK
-cbw_co_snapshot_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset, int64_t bytes,
+cbw_co_snapshot_block_status(BlockDriverState *bs, unsigned int mode,
+ int64_t offset, int64_t bytes,
int64_t *pnum, int64_t *map,
BlockDriverState **file)
{
diff --git a/block/file-posix.c b/block/file-posix.c
index 56d1972d156..91deb5bf5af 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3201,7 +3201,7 @@ static int find_allocation(BlockDriverState *bs, off_t start,
* well exceed it.
*/
static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
- bool want_zero,
+ unsigned int mode,
int64_t offset,
int64_t bytes, int64_t *pnum,
int64_t *map,
@@ -3217,7 +3217,7 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
return ret;
}
- if (!want_zero) {
+ if (mode != BDRV_WANT_PRECISE) {
*pnum = bytes;
*map = offset;
*file = bs;
diff --git a/block/gluster.c b/block/gluster.c
index c6d25ae7335..8197b0ecefa 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1465,7 +1465,7 @@ exit:
* (Based on raw_co_block_status() from file-posix.c.)
*/
static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
- bool want_zero,
+ unsigned int mode,
int64_t offset,
int64_t bytes,
int64_t *pnum,
@@ -1482,7 +1482,7 @@ static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
return ret;
}
- if (!want_zero) {
+ if (mode != BDRV_WANT_PRECISE) {
*pnum = bytes;
*map = offset;
*file = bs;
diff --git a/block/iscsi.c b/block/iscsi.c
index 2f0f4dac097..15b96ee8800 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -694,9 +694,9 @@ out_unlock:
static int coroutine_fn iscsi_co_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum,
- int64_t *map,
+ unsigned int mode,
+ int64_t offset, int64_t bytes,
+ int64_t *pnum, int64_t *map,
BlockDriverState **file)
{
IscsiLun *iscsilun = bs->opaque;
diff --git a/block/nbd.c b/block/nbd.c
index 887841bc813..d5a2b21c6d1 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1397,8 +1397,8 @@ nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
}
static int coroutine_fn GRAPH_RDLOCK nbd_client_co_block_status(
- BlockDriverState *bs, bool want_zero, int64_t offset, int64_t bytes,
- int64_t *pnum, int64_t *map, BlockDriverState **file)
+ BlockDriverState *bs, unsigned int mode, int64_t offset,
+ int64_t bytes, int64_t *pnum, int64_t *map, BlockDriverState **file)
{
int ret, request_ret;
NBDExtent64 extent = { 0 };
diff --git a/block/null.c b/block/null.c
index dc0b1fdbd9b..4e448d593d7 100644
--- a/block/null.c
+++ b/block/null.c
@@ -227,9 +227,9 @@ static int null_reopen_prepare(BDRVReopenState *reopen_state,
}
static int coroutine_fn null_co_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum,
- int64_t *map,
+ unsigned int mode,
+ int64_t offset, int64_t bytes,
+ int64_t *pnum, int64_t *map,
BlockDriverState **file)
{
BDRVNullState *s = bs->opaque;
diff --git a/block/parallels.c b/block/parallels.c
index 347ca127f34..3a375e2a8ab 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -416,9 +416,9 @@ parallels_co_flush_to_os(BlockDriverState *bs)
}
static int coroutine_fn GRAPH_RDLOCK
-parallels_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum, int64_t *map,
- BlockDriverState **file)
+parallels_co_block_status(BlockDriverState *bs, unsigned int mode,
+ int64_t offset, int64_t bytes, int64_t *pnum,
+ int64_t *map, BlockDriverState **file)
{
BDRVParallelsState *s = bs->opaque;
int count;
diff --git a/block/qcow.c b/block/qcow.c
index da8ad4d2430..8a3e7591a92 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -530,7 +530,7 @@ get_cluster_offset(BlockDriverState *bs, uint64_t offset, int allocate,
}
static int coroutine_fn GRAPH_RDLOCK
-qcow_co_block_status(BlockDriverState *bs, bool want_zero,
+qcow_co_block_status(BlockDriverState *bs, unsigned int mode,
int64_t offset, int64_t bytes, int64_t *pnum,
int64_t *map, BlockDriverState **file)
{
diff --git a/block/qcow2.c b/block/qcow2.c
index 7774e7f0909..66fba89b414 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -2141,9 +2141,9 @@ static void qcow2_join_options(QDict *options, QDict *old_options)
}
static int coroutine_fn GRAPH_RDLOCK
-qcow2_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
- int64_t count, int64_t *pnum, int64_t *map,
- BlockDriverState **file)
+qcow2_co_block_status(BlockDriverState *bs, unsigned int mode,
+ int64_t offset, int64_t count, int64_t *pnum,
+ int64_t *map, BlockDriverState **file)
{
BDRVQcow2State *s = bs->opaque;
uint64_t host_offset;
diff --git a/block/qed.c b/block/qed.c
index ac24449ffb3..4a36fb39294 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -833,9 +833,9 @@ fail:
}
static int coroutine_fn GRAPH_RDLOCK
-bdrv_qed_co_block_status(BlockDriverState *bs, bool want_zero, int64_t pos,
- int64_t bytes, int64_t *pnum, int64_t *map,
- BlockDriverState **file)
+bdrv_qed_co_block_status(BlockDriverState *bs, unsigned int mode,
+ int64_t pos, int64_t bytes, int64_t *pnum,
+ int64_t *map, BlockDriverState **file)
{
BDRVQEDState *s = bs->opaque;
size_t len = MIN(bytes, SIZE_MAX);
diff --git a/block/quorum.c b/block/quorum.c
index 30747a6df93..ed8ce801ee3 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -1226,7 +1226,7 @@ static void quorum_child_perm(BlockDriverState *bs, BdrvChild *c,
* region contains zeroes, and BDRV_BLOCK_DATA otherwise.
*/
static int coroutine_fn GRAPH_RDLOCK
-quorum_co_block_status(BlockDriverState *bs, bool want_zero,
+quorum_co_block_status(BlockDriverState *bs, unsigned int mode,
int64_t offset, int64_t count,
int64_t *pnum, int64_t *map, BlockDriverState **file)
{
@@ -1238,7 +1238,7 @@ quorum_co_block_status(BlockDriverState *bs, bool want_zero,
for (i = 0; i < s->num_children; i++) {
int64_t bytes;
ret = bdrv_co_common_block_status_above(s->children[i]->bs, NULL, false,
- want_zero, offset, count,
+ mode, offset, count,
&bytes, NULL, NULL, NULL);
if (ret < 0) {
quorum_report_bad(QUORUM_OP_TYPE_READ, offset, count,
diff --git a/block/raw-format.c b/block/raw-format.c
index e08526e2eca..df16ac1ea25 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -283,8 +283,8 @@ fail:
}
static int coroutine_fn GRAPH_RDLOCK
-raw_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum, int64_t *map,
+raw_co_block_status(BlockDriverState *bs, unsigned int mode,
+ int64_t offset, int64_t bytes, int64_t *pnum, int64_t *map,
BlockDriverState **file)
{
BDRVRawState *s = bs->opaque;
diff --git a/block/rbd.c b/block/rbd.c
index af984fb7db4..4f3d42a8e7f 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1504,9 +1504,9 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
}
static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum,
- int64_t *map,
+ unsigned int mode,
+ int64_t offset, int64_t bytes,
+ int64_t *pnum, int64_t *map,
BlockDriverState **file)
{
BDRVRBDState *s = bs->opaque;
diff --git a/block/snapshot-access.c b/block/snapshot-access.c
index 71ac83c01f0..17ed2402db8 100644
--- a/block/snapshot-access.c
+++ b/block/snapshot-access.c
@@ -41,11 +41,11 @@ snapshot_access_co_preadv_part(BlockDriverState *bs,
static int coroutine_fn GRAPH_RDLOCK
snapshot_access_co_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset,
+ unsigned int mode, int64_t offset,
int64_t bytes, int64_t *pnum,
int64_t *map, BlockDriverState **file)
{
- return bdrv_co_snapshot_block_status(bs->file->bs, want_zero, offset,
+ return bdrv_co_snapshot_block_status(bs->file->bs, mode, offset,
bytes, pnum, map, file);
}
diff --git a/block/vdi.c b/block/vdi.c
index a2da6ecab01..3ddc62a5690 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -523,8 +523,8 @@ static int vdi_reopen_prepare(BDRVReopenState *state,
}
static int coroutine_fn GRAPH_RDLOCK
-vdi_co_block_status(BlockDriverState *bs, bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum, int64_t *map,
+vdi_co_block_status(BlockDriverState *bs, unsigned int mode,
+ int64_t offset, int64_t bytes, int64_t *pnum, int64_t *map,
BlockDriverState **file)
{
BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
diff --git a/block/vmdk.c b/block/vmdk.c
index 2adec499122..9c7ab037e14 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1777,7 +1777,7 @@ static inline uint64_t vmdk_find_offset_in_cluster(VmdkExtent *extent,
}
static int coroutine_fn GRAPH_RDLOCK
-vmdk_co_block_status(BlockDriverState *bs, bool want_zero,
+vmdk_co_block_status(BlockDriverState *bs, unsigned int mode,
int64_t offset, int64_t bytes, int64_t *pnum,
int64_t *map, BlockDriverState **file)
{
diff --git a/block/vpc.c b/block/vpc.c
index 0309e319f60..801ff5793f8 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -726,7 +726,7 @@ fail:
}
static int coroutine_fn GRAPH_RDLOCK
-vpc_co_block_status(BlockDriverState *bs, bool want_zero,
+vpc_co_block_status(BlockDriverState *bs, unsigned int mode,
int64_t offset, int64_t bytes,
int64_t *pnum, int64_t *map,
BlockDriverState **file)
diff --git a/block/vvfat.c b/block/vvfat.c
index 91d69b3cc83..814796d9185 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -3134,9 +3134,9 @@ vvfat_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
}
static int coroutine_fn vvfat_co_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset,
- int64_t bytes, int64_t *n,
- int64_t *map,
+ unsigned int mode,
+ int64_t offset, int64_t bytes,
+ int64_t *n, int64_t *map,
BlockDriverState **file)
{
*n = bytes;
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index 2b358eaaa82..e26b3be5939 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -63,7 +63,7 @@ bdrv_test_co_truncate(BlockDriverState *bs, int64_t offset, bool exact,
}
static int coroutine_fn bdrv_test_co_block_status(BlockDriverState *bs,
- bool want_zero,
+ unsigned int mode,
int64_t offset, int64_t count,
int64_t *pnum, int64_t *map,
BlockDriverState **file)
--
2.49.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* [PATCH v2.5 02/11] file-posix, gluster: Handle zero block status hint better
2025-04-18 21:52 ` [PATCH v2.5 01/11] block: Expand block status mode from bool to flags Eric Blake
@ 2025-04-18 21:52 ` Eric Blake
2025-04-22 14:43 ` Stefan Hajnoczi
2025-04-22 14:43 ` [PATCH v2.5 01/11] block: Expand block status mode from bool to flags Stefan Hajnoczi
2025-04-24 18:08 ` Eric Blake
2 siblings, 1 reply; 36+ messages in thread
From: Eric Blake @ 2025-04-18 21:52 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-block, stefanha, Kevin Wolf, Hanna Reitz, open list:GLUSTER
Although the previous patch to change 'bool want_zero' into a bitmask
made no semantic change, it is now time to differentiate. When the
caller specifically wants to know what parts of the file read as zero,
we need to use lseek and actually reporting holes, rather than
short-circuiting and advertising full allocation.
This change will be utilized in later patches to let mirroring
optimize for the case when the destination already reads as zeroes.
Signed-off-by: Eric Blake <eblake@redhat.com>
---
Compared to v2, also adjust the gluster driver which copies file-posix.
block/file-posix.c | 3 ++-
block/gluster.c | 2 +-
2 files changed, 3 insertions(+), 2 deletions(-)
diff --git a/block/file-posix.c b/block/file-posix.c
index 91deb5bf5af..575cbfba07d 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -3217,7 +3217,8 @@ static int coroutine_fn raw_co_block_status(BlockDriverState *bs,
return ret;
}
- if (mode != BDRV_WANT_PRECISE) {
+ if (!(mode & BDRV_WANT_ZERO)) {
+ /* There is no backing file - all bytes are allocated in this file. */
*pnum = bytes;
*map = offset;
*file = bs;
diff --git a/block/gluster.c b/block/gluster.c
index 8197b0ecefa..e702666cbce 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -1482,7 +1482,7 @@ static int coroutine_fn qemu_gluster_co_block_status(BlockDriverState *bs,
return ret;
}
- if (mode != BDRV_WANT_PRECISE) {
+ if (!(mode & BDRV_WANT_ZERO)) {
*pnum = bytes;
*map = offset;
*file = bs;
--
2.49.0
^ permalink raw reply related [flat|nested] 36+ messages in thread
* Re: [PATCH v2 01/11] block: Expand block status mode from bool to enum
2025-04-18 19:02 ` Eric Blake
@ 2025-04-18 21:55 ` Eric Blake
0 siblings, 0 replies; 36+ messages in thread
From: Eric Blake @ 2025-04-18 21:55 UTC (permalink / raw)
To: Stefan Hajnoczi
Cc: qemu-devel, qemu-block, vsementsov, Kevin Wolf, Hanna Reitz,
John Snow, Fam Zheng, Ronnie Sahlberg, Paolo Bonzini,
Peter Lieven, Denis V. Lunev, Alberto Garcia, Ilya Dryomov,
Stefan Weil, open list:GLUSTER
On Fri, Apr 18, 2025 at 02:02:20PM -0500, Eric Blake wrote:
> > I have trouble understanding what the exact semantics are of these modes
> > are. Would it be possible to pass flags to block status calls that can
> > be ORed together instead: WANT_OFFSET_VALID, WANT_ZERO, etc? The flags
> > would be orthogonal and easier to understand than modes that seem to
> > combine multiple flag behaviors.
>
> I can give that a try. If I'm understanding the request correctly, I
> would map it as follows:
>
> BDRV_BSTAT_PRECISE => WANT_ZERO | WANT_OFFSET_VALID | WANT_ALLOCATED
> BDRV_BSTAT_ALLOCATED => WANT_ALLOCATED
> BDRV_BSTAT_ZERO => WANT_ZERO
>
> while still trying to keep it a mechanical conversion in this patch.
I've done that as a v2.5 reply to patch 1 and 2 of this series (the
rest of the series is unchanged except for one obvious word change in
the addition of bdrv_co_is_all_zeroes). If we like it better, I can
resend the full series as v3.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2.5 01/11] block: Expand block status mode from bool to flags
2025-04-18 21:52 ` [PATCH v2.5 01/11] block: Expand block status mode from bool to flags Eric Blake
2025-04-18 21:52 ` [PATCH v2.5 02/11] file-posix, gluster: Handle zero block status hint better Eric Blake
@ 2025-04-22 14:43 ` Stefan Hajnoczi
2025-04-24 18:08 ` Eric Blake
2 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2025-04-22 14:43 UTC (permalink / raw)
To: Eric Blake
Cc: qemu-devel, qemu-block, Kevin Wolf, Hanna Reitz, John Snow,
Vladimir Sementsov-Ogievskiy, Fam Zheng, Ronnie Sahlberg,
Paolo Bonzini, Peter Lieven, Denis V. Lunev, Alberto Garcia,
Ilya Dryomov, Stefan Weil, open list:GLUSTER
[-- Attachment #1: Type: text/plain, Size: 2825 bytes --]
On Fri, Apr 18, 2025 at 04:52:46PM -0500, Eric Blake wrote:
> This patch is purely mechanical, changing bool want_zero into an
> unsigned int for bitwise-or of flags. As of this patch, all
> implementations are unchanged (the old want_zero==true is now
> mode==BDRV_WANT_PRECISE which is a superset of BDRV_WANT_ZERO); but
> the callers in io.c that used to pass want_zero==false are now
> prepared for future driver changes that can now distinguish bewteen
> BDRV_WANT_ZERO vs. BDRV_WANT_ALLOCATED. The next patch will actually
> change the file-posix driver along those lines, now that we have
> more-specific hints.
>
> As for the background why this patch is useful: right now, the
> file-posix driver recognizes that if allocation is being queried, the
> entire image can be reported as allocated (there is no backing file to
> refer to) - but this throws away information on whether the entire
> image reads as zero (trivially true if lseek(SEEK_HOLE) at offset 0
> returns -ENXIO, a bit more complicated to prove if the raw file was
> created with 'qemu-img create' since we intentionally allocate a small
> chunk of all-zero data to help with alignment probing). Later patches
> will add a generic algorithm for seeing if an entire file reads as
> zeroes.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>
> In response to Stefan's ask for what it would look like as a bitmask
> of flags instead of an enum. Only the first two patches of the series
> change.
>
> block/coroutines.h | 4 +--
> include/block/block-common.h | 11 +++++++
> include/block/block_int-common.h | 27 +++++++++--------
> include/block/block_int-io.h | 4 +--
> block/io.c | 51 ++++++++++++++++----------------
> block/blkdebug.c | 6 ++--
> block/copy-before-write.c | 4 +--
> block/file-posix.c | 4 +--
> block/gluster.c | 4 +--
> block/iscsi.c | 6 ++--
> block/nbd.c | 4 +--
> block/null.c | 6 ++--
> block/parallels.c | 6 ++--
> block/qcow.c | 2 +-
> block/qcow2.c | 6 ++--
> block/qed.c | 6 ++--
> block/quorum.c | 4 +--
> block/raw-format.c | 4 +--
> block/rbd.c | 6 ++--
> block/snapshot-access.c | 4 +--
> block/vdi.c | 4 +--
> block/vmdk.c | 2 +-
> block/vpc.c | 2 +-
> block/vvfat.c | 6 ++--
> tests/unit/test-block-iothread.c | 2 +-
> 25 files changed, 99 insertions(+), 86 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 484 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2.5 02/11] file-posix, gluster: Handle zero block status hint better
2025-04-18 21:52 ` [PATCH v2.5 02/11] file-posix, gluster: Handle zero block status hint better Eric Blake
@ 2025-04-22 14:43 ` Stefan Hajnoczi
0 siblings, 0 replies; 36+ messages in thread
From: Stefan Hajnoczi @ 2025-04-22 14:43 UTC (permalink / raw)
To: Eric Blake
Cc: qemu-devel, qemu-block, Kevin Wolf, Hanna Reitz,
open list:GLUSTER
[-- Attachment #1: Type: text/plain, Size: 850 bytes --]
On Fri, Apr 18, 2025 at 04:52:47PM -0500, Eric Blake wrote:
> Although the previous patch to change 'bool want_zero' into a bitmask
> made no semantic change, it is now time to differentiate. When the
> caller specifically wants to know what parts of the file read as zero,
> we need to use lseek and actually reporting holes, rather than
> short-circuiting and advertising full allocation.
>
> This change will be utilized in later patches to let mirroring
> optimize for the case when the destination already reads as zeroes.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>
> Compared to v2, also adjust the gluster driver which copies file-posix.
>
> block/file-posix.c | 3 ++-
> block/gluster.c | 2 +-
> 2 files changed, 3 insertions(+), 2 deletions(-)
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 08/11] mirror: Skip writing zeroes when target is already zero
2025-04-17 18:39 ` [PATCH v2 08/11] mirror: Skip writing zeroes when target " Eric Blake
2025-04-17 20:54 ` Stefan Hajnoczi
@ 2025-04-23 16:42 ` Sunny Zhu
2025-04-23 19:12 ` Eric Blake
1 sibling, 1 reply; 36+ messages in thread
From: Sunny Zhu @ 2025-04-23 16:42 UTC (permalink / raw)
To: eblake; +Cc: hreitz, jsnow, kwolf, qemu-block, qemu-devel, vsementsov,
stefanha
on Thu, 17 Apr 2025 13:39:13 -0500, Eric Blake wrote:
> When mirroring, the goal is to ensure that the destination reads the
> same as the source; this goal is met whether the destination is sparse
> or fully-allocated. However, if the destination cannot efficiently
> write zeroes, then any time the mirror operation wants to copy zeroes
> from the source to the destination (either during the background over
> sparse regions when doing a full mirror, or in the foreground when the
> guest actively writes zeroes), we were causing the destination to
> fully allocate that portion of the disk, even if it already read as
> zeroes.
>
> The effect is especially pronounced when the source is a raw file.
> That's because when the source is a qcow2 file, the dirty bitmap only
> visits the portions of the source that are allocated, which tend to be
> non-zero. But when the source is a raw file,
> bdrv_co_is_allocated_above() reports the entire file as allocated so
> mirror_dirty_init sets the entire dirty bitmap, and it is only later
> during mirror_iteration that we change to consulting the more precise
> bdrv_co_block_status_above() to learn where the source reads as zero.
>
> Remember that since a mirror operation can write a cluster more than
> once (every time the guest changes the source, the destination is also
> changed to keep up), we can't take the shortcut of relying on
> s->zero_target (which is static for the life of the job) in
> mirror_co_zero() to see if the destination is already zero, because
> that information may be stale. Any solution we use must be dynamic in
> the face of the guest writing or discarding a cluster while the mirror
> has been ongoing.
>
> We could just teach mirror_co_zero() to do a block_status() probe of
> the destination, and skip the zeroes if the destination already reads
> as zero, but we know from past experience that extra block_status()
> calls are not always cheap (tmpfs, anyone?), especially when they are
> random access rather than linear. Use of block_status() of the source
> by the background task in a linear fashion is not our bottleneck (it's
> a background task, after all); but since mirroring can be done while
> the source is actively being changed, we don't want a slow
> block_status() of the destination to occur on the hot path of the
> guest trying to do random-access writes to the source.
>
> So this patch takes a slightly different approach: any time we have to
> transfer the full image, we know that mirror_dirty_init() is _already_
> doing a pre-zero pass over the entire destination. Therefore, if we
> track which clusters of the destination are zero at any given moment,
> we don't have to do a block_status() call on the destination, but can
> instead just refer to the zero bitmap associated with the job.
>
> With this patch, if I create a raw sparse destination file, connect it
> with QMP 'blockdev-add' while leaving it at the default "discard":
> "ignore", then run QMP 'blockdev-mirror' with "sync": "full", the
> destination remains sparse rather than fully allocated.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
> block/mirror.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++----
> 1 file changed, 65 insertions(+), 5 deletions(-)
>
> diff --git a/block/mirror.c b/block/mirror.c
> index 234e3a55e60..4770d87abf6 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -73,6 +73,7 @@ typedef struct MirrorBlockJob {
> size_t buf_size;
> int64_t bdev_length;
> unsigned long *cow_bitmap;
> + unsigned long *zero_bitmap;
> BdrvDirtyBitmap *dirty_bitmap;
> BdrvDirtyBitmapIter *dbi;
> uint8_t *buf;
> @@ -408,15 +409,33 @@ static void coroutine_fn mirror_co_read(void *opaque)
> static void coroutine_fn mirror_co_zero(void *opaque)
> {
> MirrorOp *op = opaque;
> - int ret;
> + bool write_needed = true;
> + int ret = 0;
>
> op->s->in_flight++;
> op->s->bytes_in_flight += op->bytes;
> *op->bytes_handled = op->bytes;
> op->is_in_flight = true;
>
> - ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
> - op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
> + if (op->s->zero_bitmap) {
> + unsigned long end = DIV_ROUND_UP(op->offset + op->bytes,
> + op->s->granularity);
> + assert(QEMU_IS_ALIGNED(op->offset, op->s->granularity));
> + assert(QEMU_IS_ALIGNED(op->bytes, op->s->granularity) ||
> + op->offset + op->bytes == op->s->bdev_length);
> + if (find_next_zero_bit(op->s->zero_bitmap, end,
> + op->offset / op->s->granularity) == end) {
> + write_needed = false;
> + }
> + }
> + if (write_needed) {
> + ret = blk_co_pwrite_zeroes(op->s->target, op->offset, op->bytes,
> + op->s->unmap ? BDRV_REQ_MAY_UNMAP : 0);
> + }
> + if (ret >= 0 && op->s->zero_bitmap) {
> + bitmap_set(op->s->zero_bitmap, op->offset / op->s->granularity,
> + DIV_ROUND_UP(op->bytes, op->s->granularity));
> + }
> mirror_write_complete(op, ret);
> }
>
> @@ -441,6 +460,9 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
> Coroutine *co;
> int64_t bytes_handled = -1;
>
> + assert(QEMU_IS_ALIGNED(offset, s->granularity));
> + assert(QEMU_IS_ALIGNED(bytes, s->granularity) ||
> + offset + bytes == s->bdev_length);
> op = g_new(MirrorOp, 1);
> *op = (MirrorOp){
> .s = s,
> @@ -452,12 +474,21 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
>
> switch (mirror_method) {
> case MIRROR_METHOD_COPY:
> + if (s->zero_bitmap) {
> + bitmap_clear(s->zero_bitmap, offset / s->granularity,
> + DIV_ROUND_UP(bytes, s->granularity));
> + }
> co = qemu_coroutine_create(mirror_co_read, op);
> break;
> case MIRROR_METHOD_ZERO:
> + /* s->zero_bitmap handled in mirror_co_zero */
> co = qemu_coroutine_create(mirror_co_zero, op);
> break;
> case MIRROR_METHOD_DISCARD:
> + if (s->zero_bitmap) {
> + bitmap_clear(s->zero_bitmap, offset / s->granularity,
> + DIV_ROUND_UP(bytes, s->granularity));
> + }
> co = qemu_coroutine_create(mirror_co_discard, op);
> break;
> default:
>
If we have performed the skip-zero operation, it should not be constrained
by mirror job bandwidth limits. Therefore, it is preferable to exclude it
from rate limiting.
bool skip_write_zero = false;
io_bytes = mirror_perform(s, offset, io_bytes, mirror_method, &skip_write_zero);
if (skip_write_zero || (mirror_method != MIRROR_METHOD_COPY && write_zeroes_ok)) {
io_bytes_acct = 0;
} ..
> @@ -849,6 +880,8 @@ static int coroutine_fn GRAPH_UNLOCKED mirror_dirty_init(MirrorBlockJob *s)
> bdrv_graph_co_rdunlock();
>
> if (s->zero_target) {
> + int64_t bitmap_length = DIV_ROUND_UP(s->bdev_length, s->granularity);
> +
> offset = 0;
> bdrv_graph_co_rdlock();
> ret = bdrv_co_is_all_zeroes(target_bs);
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 08/11] mirror: Skip writing zeroes when target is already zero
2025-04-23 16:42 ` Sunny Zhu
@ 2025-04-23 19:12 ` Eric Blake
0 siblings, 0 replies; 36+ messages in thread
From: Eric Blake @ 2025-04-23 19:12 UTC (permalink / raw)
To: Sunny Zhu
Cc: hreitz, jsnow, kwolf, qemu-block, qemu-devel, vsementsov,
stefanha
On Thu, Apr 24, 2025 at 12:42:45AM +0800, Sunny Zhu wrote:
> on Thu, 17 Apr 2025 13:39:13 -0500, Eric Blake wrote:
> > When mirroring, the goal is to ensure that the destination reads the
> > same as the source; this goal is met whether the destination is sparse
> > or fully-allocated. However, if the destination cannot efficiently
> > write zeroes, then any time the mirror operation wants to copy zeroes
> > from the source to the destination (either during the background over
> > sparse regions when doing a full mirror, or in the foreground when the
> > guest actively writes zeroes), we were causing the destination to
> > fully allocate that portion of the disk, even if it already read as
> > zeroes.
> >
> > @@ -452,12 +474,21 @@ static unsigned mirror_perform(MirrorBlockJob *s, int64_t offset,
> >
> > switch (mirror_method) {
> > case MIRROR_METHOD_COPY:
> > + if (s->zero_bitmap) {
> > + bitmap_clear(s->zero_bitmap, offset / s->granularity,
> > + DIV_ROUND_UP(bytes, s->granularity));
> > + }
> > co = qemu_coroutine_create(mirror_co_read, op);
> > break;
> > case MIRROR_METHOD_ZERO:
> > + /* s->zero_bitmap handled in mirror_co_zero */
> > co = qemu_coroutine_create(mirror_co_zero, op);
> > break;
> > case MIRROR_METHOD_DISCARD:
> > + if (s->zero_bitmap) {
> > + bitmap_clear(s->zero_bitmap, offset / s->granularity,
> > + DIV_ROUND_UP(bytes, s->granularity));
> > + }
> > co = qemu_coroutine_create(mirror_co_discard, op);
> > break;
> > default:
> >
>
> If we have performed the skip-zero operation, it should not be constrained
> by mirror job bandwidth limits. Therefore, it is preferable to exclude it
> from rate limiting.
Indeed, that makes sense. And it may impact the iotests: test 194
should have a smaller amount of bytes transferred, due to skipping
zeroes, so I may need to hoist the filtering that I added in the later
patch for iotest mirror-sparse into common code.
>
> bool skip_write_zero = false;
>
> io_bytes = mirror_perform(s, offset, io_bytes, mirror_method, &skip_write_zero);
> if (skip_write_zero || (mirror_method != MIRROR_METHOD_COPY && write_zeroes_ok)) {
> io_bytes_acct = 0;
> } ..
>
Thanks; that's helpful. I'll incorporate it into v3.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 07/11] mirror: Skip pre-zeroing destination if it is already zero
2025-04-17 18:39 ` [PATCH v2 07/11] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
2025-04-17 20:46 ` Stefan Hajnoczi
@ 2025-04-24 17:10 ` Eric Blake
1 sibling, 0 replies; 36+ messages in thread
From: Eric Blake @ 2025-04-24 17:10 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, vsementsov, John Snow, Kevin Wolf, Hanna Reitz
On Thu, Apr 17, 2025 at 01:39:12PM -0500, Eric Blake wrote:
> When doing a sync=full mirroring, QMP drive-mirror requests full
> zeroing if it did not just create the destination, and blockdev-mirror
> requests full zeroing unconditionally. This is because during a full
> sync, we must ensure that the portions of the disk that are not
> otherwise touched by the source still read as zero upon completion.
>
> However, in mirror_dirty_init(), we were blindly assuming that if the
> destination allows punching holes, we should pre-zero the entire
> image; and if it does not allow punching holes, then treat the entire
> source as dirty rather than mirroring just the allocated portions of
> the source. Without the ability to punch holes, this results in the
> destination file being fully allocated; and even when punching holes
> is supported, it causes duplicate I/O to the portions of the
> destination corresponding to chunks of the source that are allocated
> but read as zero.
>
> Smarter is to avoid the pre-zeroing pass over the destination if it
> can be proved the destination already reads as zero. Note that a
> later patch will then further improve things to skip writing to the
> destination for parts of the image where the source is zero; but even
> with just this patch, it is possible to see a difference for any BDS
> that can quickly report that it already reads as zero.
Hmm. When the destination reads as all zeroes, but is not (yet)
sparse, and the user has opened the destination image with
"discard":"unmap" and "detect-zeroes":"unmap", then pre-patch this
would sparsify the destination, but post-patch it leaves the
destination allocated.
When "detect-zeroes" is at its default of 'off', or even at 'on'
(which says optimize zero writes, but don't worry about punching
holes), that's not a problem. But when "detect-zeroes" is at 'unamp',
this is a regression in behavior. I'll see if I can quickly adjust
that in v3.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2.5 01/11] block: Expand block status mode from bool to flags
2025-04-18 21:52 ` [PATCH v2.5 01/11] block: Expand block status mode from bool to flags Eric Blake
2025-04-18 21:52 ` [PATCH v2.5 02/11] file-posix, gluster: Handle zero block status hint better Eric Blake
2025-04-22 14:43 ` [PATCH v2.5 01/11] block: Expand block status mode from bool to flags Stefan Hajnoczi
@ 2025-04-24 18:08 ` Eric Blake
2 siblings, 0 replies; 36+ messages in thread
From: Eric Blake @ 2025-04-24 18:08 UTC (permalink / raw)
To: qemu-devel
Cc: qemu-block, stefanha, Kevin Wolf, Hanna Reitz, John Snow,
Vladimir Sementsov-Ogievskiy, Fam Zheng, Ronnie Sahlberg,
Paolo Bonzini, Peter Lieven, Denis V. Lunev, Alberto Garcia,
Ilya Dryomov, Stefan Weil, open list:GLUSTER
On Fri, Apr 18, 2025 at 04:52:46PM -0500, Eric Blake wrote:
> This patch is purely mechanical, changing bool want_zero into an
> unsigned int for bitwise-or of flags. As of this patch, all
> implementations are unchanged (the old want_zero==true is now
> mode==BDRV_WANT_PRECISE which is a superset of BDRV_WANT_ZERO); but
> the callers in io.c that used to pass want_zero==false are now
> prepared for future driver changes that can now distinguish bewteen
> BDRV_WANT_ZERO vs. BDRV_WANT_ALLOCATED. The next patch will actually
> change the file-posix driver along those lines, now that we have
> more-specific hints.
>
> As for the background why this patch is useful: right now, the
> file-posix driver recognizes that if allocation is being queried, the
> entire image can be reported as allocated (there is no backing file to
> refer to) - but this throws away information on whether the entire
> image reads as zero (trivially true if lseek(SEEK_HOLE) at offset 0
> returns -ENXIO, a bit more complicated to prove if the raw file was
> created with 'qemu-img create' since we intentionally allocate a small
> chunk of all-zero data to help with alignment probing). Later patches
> will add a generic algorithm for seeing if an entire file reads as
> zeroes.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>
> In response to Stefan's ask for what it would look like as a bitmask
> of flags instead of an enum. Only the first two patches of the series
> change.
> +++ b/block/io.c
> @@ -2364,10 +2364,8 @@ int bdrv_flush_all(void)
> * Drivers not implementing the functionality are assumed to not support
> * backing files, hence all their sectors are reported as allocated.
> *
> - * If 'want_zero' is true, the caller is querying for mapping
> - * purposes, with a focus on valid BDRV_BLOCK_OFFSET_VALID, _DATA, and
> - * _ZERO where possible; otherwise, the result favors larger 'pnum',
> - * with a focus on accurate BDRV_BLOCK_ALLOCATED.
> + * 'mode' serves as a hint as to which results are favored; see enum
> + * BlockStatusMode for details of the supported modes.
Stale comment applicable to v2 but not v2.5; will fix for v3.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization: qemu.org | libguestfs.org
^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2025-04-24 18:09 UTC | newest]
Thread overview: 36+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-17 18:39 [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Eric Blake
2025-04-17 18:39 ` [PATCH v2 01/11] block: Expand block status mode from bool to enum Eric Blake
2025-04-17 20:17 ` Stefan Hajnoczi
2025-04-18 19:02 ` Eric Blake
2025-04-18 21:55 ` Eric Blake
2025-04-17 18:39 ` [PATCH v2 02/11] file-posix: Handle zero block status hint better Eric Blake
2025-04-17 20:58 ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 03/11] block: Let bdrv_co_is_zero_fast consolidate adjacent extents Eric Blake
2025-04-17 20:21 ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 04/11] block: Add new bdrv_co_is_all_zeroes() function Eric Blake
2025-04-17 20:35 ` Stefan Hajnoczi
2025-04-18 19:07 ` Eric Blake
2025-04-17 18:39 ` [PATCH v2 05/11] iotests: Improve iotest 194 to mirror data Eric Blake
2025-04-17 20:39 ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 06/11] mirror: Minor refactoring Eric Blake
2025-04-17 20:42 ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 07/11] mirror: Skip pre-zeroing destination if it is already zero Eric Blake
2025-04-17 20:46 ` Stefan Hajnoczi
2025-04-24 17:10 ` Eric Blake
2025-04-17 18:39 ` [PATCH v2 08/11] mirror: Skip writing zeroes when target " Eric Blake
2025-04-17 20:54 ` Stefan Hajnoczi
2025-04-23 16:42 ` Sunny Zhu
2025-04-23 19:12 ` Eric Blake
2025-04-17 18:39 ` [PATCH v2 09/11] iotests/common.rc: add disk_usage function Eric Blake
2025-04-17 20:54 ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 10/11] tests: Add iotest mirror-sparse for recent patches Eric Blake
2025-04-17 20:55 ` Stefan Hajnoczi
2025-04-17 18:39 ` [PATCH v2 11/11] mirror: Allow QMP override to declare target already zero Eric Blake
2025-04-17 20:57 ` Stefan Hajnoczi
2025-04-18 4:47 ` Markus Armbruster
2025-04-17 20:59 ` [PATCH v2 00/11] Make blockdev-mirror dest sparse in more cases Stefan Hajnoczi
2025-04-18 21:52 ` [PATCH v2.5 01/11] block: Expand block status mode from bool to flags Eric Blake
2025-04-18 21:52 ` [PATCH v2.5 02/11] file-posix, gluster: Handle zero block status hint better Eric Blake
2025-04-22 14:43 ` Stefan Hajnoczi
2025-04-22 14:43 ` [PATCH v2.5 01/11] block: Expand block status mode from bool to flags Stefan Hajnoczi
2025-04-24 18:08 ` Eric Blake
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).