[PULL 00/27] Block layer patches

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PULL 00/27] Block layer patches
@ 2023-10-31 18:58 Kevin Wolf
  2023-10-31 18:58 ` [PULL 01/27] qemu-img: rebase: stop when reaching EOF of old backing file Kevin Wolf
                   ` (27 more replies)
  0 siblings, 28 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:58 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

The following changes since commit 516fffc9933cb21fad41ca8f7bf465d238d4d375:

  Merge tag 'pull-lu-20231030' of https://gitlab.com/rth7680/qemu into staging (2023-10-31 07:12:40 +0900)

are available in the Git repository at:

  https://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 900e7d413d630ebd3f5d64bae0e6249621ec0c7f:

  iotests: add test for changing mirror's copy_mode (2023-10-31 19:46:51 +0100)

----------------------------------------------------------------
Block layer patches

- virtio-blk: use blk_io_plug_call() instead of notification BH
- mirror: allow switching from background to active mode
- qemu-img rebase: add compression support
- Fix locking in media change monitor commands
- Fix a few blockjob-related deadlocks when using iothread

----------------------------------------------------------------
Andrey Drobyshev (8):
      qemu-img: rebase: stop when reaching EOF of old backing file
      qemu-iotests: 024: add rebasing test case for overlay_size > backing_size
      qemu-img: rebase: use backing files' BlockBackend for buffer alignment
      qemu-img: add chunk size parameter to compare_buffers()
      qemu-img: rebase: avoid unnecessary COW operations
      iotests/{024, 271}: add testcases for qemu-img rebase
      qemu-img: add compression option to rebase subcommand
      iotests: add tests for "qemu-img rebase" with compression

Fiona Ebner (13):
      blockjob: drop AioContext lock before calling bdrv_graph_wrlock()
      block: avoid potential deadlock during bdrv_graph_wrlock() in bdrv_close()
      blockdev: mirror: avoid potential deadlock when using iothread
      blockjob: introduce block-job-change QMP command
      block/mirror: set actively_synced even after the job is ready
      block/mirror: move dirty bitmap to filter
      block/mirror: determine copy_to_target only once
      mirror: implement mirror_change method
      qapi/block-core: use JobType for BlockJobInfo's type
      qapi/block-core: turn BlockJobInfo into a union
      blockjob: query driver-specific info via a new 'query' driver method
      mirror: return mirror-specific information upon query
      iotests: add test for changing mirror's copy_mode

Kevin Wolf (2):
      block: Fix locking in media change monitor commands
      iotests: Test media change with iothreads

Stefan Hajnoczi (4):
      block: rename blk_io_plug_call() API to defer_call()
      util/defer-call: move defer_call() to util/
      virtio: use defer_call() in virtio_irqfd_notify()
      virtio-blk: remove batch notification BH

 qapi/block-core.json                               |  59 ++++++-
 qapi/job.json                                      |   4 +-
 docs/tools/qemu-img.rst                            |   6 +-
 include/block/blockjob.h                           |  11 ++
 include/block/blockjob_int.h                       |  12 ++
 include/qemu/defer-call.h                          |  16 ++
 include/sysemu/block-backend-io.h                  |   4 -
 block.c                                            |   2 +-
 block/blkio.c                                      |   9 +-
 block/io_uring.c                                   |  11 +-
 block/linux-aio.c                                  |   9 +-
 block/mirror.c                                     | 131 ++++++++++----
 block/monitor/block-hmp-cmds.c                     |   4 +-
 block/nvme.c                                       |   5 +-
 block/plug.c                                       | 159 -----------------
 block/qapi-sysemu.c                                |   5 +
 blockdev.c                                         |  28 ++-
 blockjob.c                                         |  30 +++-
 hw/block/dataplane/virtio-blk.c                    |  48 +----
 hw/block/dataplane/xen-block.c                     |  11 +-
 hw/block/virtio-blk.c                              |   5 +-
 hw/scsi/virtio-scsi.c                              |   7 +-
 hw/virtio/virtio.c                                 |  13 +-
 job.c                                              |   1 +
 qemu-img.c                                         | 136 +++++++++++----
 util/defer-call.c                                  | 156 +++++++++++++++++
 util/thread-pool.c                                 |   5 +
 MAINTAINERS                                        |   3 +-
 block/meson.build                                  |   1 -
 hw/virtio/trace-events                             |   1 +
 qemu-img-cmds.hx                                   |   4 +-
 tests/qemu-iotests/024                             | 117 +++++++++++++
 tests/qemu-iotests/024.out                         |  73 ++++++++
 tests/qemu-iotests/109.out                         |  24 +--
 tests/qemu-iotests/118                             |   6 +-
 tests/qemu-iotests/271                             | 131 ++++++++++++++
 tests/qemu-iotests/271.out                         |  82 +++++++++
 tests/qemu-iotests/314                             | 165 ++++++++++++++++++
 tests/qemu-iotests/314.out                         |  75 ++++++++
 tests/qemu-iotests/tests/mirror-change-copy-mode   | 193 +++++++++++++++++++++
 .../qemu-iotests/tests/mirror-change-copy-mode.out |   5 +
 util/meson.build                                   |   1 +
 42 files changed, 1437 insertions(+), 331 deletions(-)
 create mode 100644 include/qemu/defer-call.h
 delete mode 100644 block/plug.c
 create mode 100644 util/defer-call.c
 create mode 100755 tests/qemu-iotests/314
 create mode 100644 tests/qemu-iotests/314.out
 create mode 100755 tests/qemu-iotests/tests/mirror-change-copy-mode
 create mode 100644 tests/qemu-iotests/tests/mirror-change-copy-mode.out



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PULL 01/27] qemu-img: rebase: stop when reaching EOF of old backing file
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
@ 2023-10-31 18:58 ` Kevin Wolf
  2023-10-31 18:58 ` [PULL 02/27] qemu-iotests: 024: add rebasing test case for overlay_size > backing_size Kevin Wolf
                   ` (26 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:58 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

In case when we're rebasing within one backing chain, and when target image
is larger than old backing file, bdrv_is_allocated_above() ends up setting
*pnum = 0.  As a result, target offset isn't getting incremented, and we
get stuck in an infinite for loop.  Let's detect this case and proceed
further down the loop body, as the offsets beyond the old backing size need
to be explicitly zeroed.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20230919165804.439110-2-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-img.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/qemu-img.c b/qemu-img.c
index 585b65640f..2b2a3a86ca 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3814,6 +3814,8 @@ static int img_rebase(int argc, char **argv)
             }
 
             if (prefix_chain_bs) {
+                uint64_t bytes = n;
+
                 /*
                  * If cluster wasn't changed since prefix_chain, we don't need
                  * to take action
@@ -3826,9 +3828,18 @@ static int img_rebase(int argc, char **argv)
                                  strerror(-ret));
                     goto out;
                 }
-                if (!ret) {
+                if (!ret && n) {
                     continue;
                 }
+                if (!n) {
+                    /*
+                     * If we've reached EOF of the old backing, it means that
+                     * offsets beyond the old backing size were read as zeroes.
+                     * Now we will need to explicitly zero the cluster in
+                     * order to preserve that state after the rebase.
+                     */
+                    n = bytes;
+                }
             }
 
             /*
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 02/27] qemu-iotests: 024: add rebasing test case for overlay_size > backing_size
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
  2023-10-31 18:58 ` [PULL 01/27] qemu-img: rebase: stop when reaching EOF of old backing file Kevin Wolf
@ 2023-10-31 18:58 ` Kevin Wolf
  2023-10-31 18:58 ` [PULL 03/27] qemu-img: rebase: use backing files' BlockBackend for buffer alignment Kevin Wolf
                   ` (25 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:58 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

Before previous commit, rebase was getting infitely stuck in case of
rebasing within the same backing chain and when overlay_size > backing_size.
Let's add this case to the rebasing test 024 to make sure it doesn't
break again.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20230919165804.439110-3-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/024     | 57 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/024.out | 30 ++++++++++++++++++++
 2 files changed, 87 insertions(+)

diff --git a/tests/qemu-iotests/024 b/tests/qemu-iotests/024
index 25a564a150..98a7c8fd65 100755
--- a/tests/qemu-iotests/024
+++ b/tests/qemu-iotests/024
@@ -199,6 +199,63 @@ echo
 # $BASE_OLD and $BASE_NEW)
 $QEMU_IMG map "$OVERLAY" | _filter_qemu_img_map
 
+# Check that rebase within the chain is working when
+# overlay_size > old_backing_size
+#
+# base_new <-- base_old <-- overlay
+#
+# Backing (new): 11 11 11 11 11
+# Backing (old): 22 22 22 22
+# Overlay:       -- -- -- -- --
+#
+# As a result, overlay should contain data identical to base_old, with the
+# last cluster remaining unallocated.
+
+echo
+echo "=== Test rebase within one backing chain ==="
+echo
+
+echo "Creating backing chain"
+echo
+
+TEST_IMG=$BASE_NEW _make_test_img $(( CLUSTER_SIZE * 5 ))
+TEST_IMG=$BASE_OLD _make_test_img -b "$BASE_NEW" -F $IMGFMT \
+    $(( CLUSTER_SIZE * 4 ))
+TEST_IMG=$OVERLAY _make_test_img -b "$BASE_OLD" -F $IMGFMT \
+    $(( CLUSTER_SIZE * 5 ))
+
+echo
+echo "Fill backing files with data"
+echo
+
+$QEMU_IO "$BASE_NEW" -c "write -P 0x11 0 $(( CLUSTER_SIZE * 5 ))" \
+    | _filter_qemu_io
+$QEMU_IO "$BASE_OLD" -c "write -P 0x22 0 $(( CLUSTER_SIZE * 4 ))" \
+    | _filter_qemu_io
+
+echo
+echo "Check the last cluster is zeroed in overlay before the rebase"
+echo
+$QEMU_IO "$OVERLAY" -c "read -P 0x00 $(( CLUSTER_SIZE * 4 )) $CLUSTER_SIZE" \
+    | _filter_qemu_io
+
+echo
+echo "Rebase onto another image in the same chain"
+echo
+
+$QEMU_IMG rebase -b "$BASE_NEW" -F $IMGFMT "$OVERLAY"
+
+echo "Verify that data is read the same before and after rebase"
+echo
+
+# Verify the first 4 clusters are still read the same as in the old base
+$QEMU_IO "$OVERLAY" -c "read -P 0x22 0 $(( CLUSTER_SIZE * 4 ))" \
+    | _filter_qemu_io
+# Verify the last cluster still reads as zeroes
+$QEMU_IO "$OVERLAY" -c "read -P 0x00 $(( CLUSTER_SIZE * 4 )) $CLUSTER_SIZE" \
+    | _filter_qemu_io
+
+echo
 
 # success, all done
 echo "*** done"
diff --git a/tests/qemu-iotests/024.out b/tests/qemu-iotests/024.out
index 973a5a3711..245fe8b1d1 100644
--- a/tests/qemu-iotests/024.out
+++ b/tests/qemu-iotests/024.out
@@ -171,4 +171,34 @@ read 65536/65536 bytes at offset 196608
 Offset          Length          File
 0               0x30000         TEST_DIR/subdir/t.IMGFMT
 0x30000         0x10000         TEST_DIR/subdir/t.IMGFMT.base_new
+
+=== Test rebase within one backing chain ===
+
+Creating backing chain
+
+Formatting 'TEST_DIR/subdir/t.IMGFMT.base_new', fmt=IMGFMT size=327680
+Formatting 'TEST_DIR/subdir/t.IMGFMT.base_old', fmt=IMGFMT size=262144 backing_file=TEST_DIR/subdir/t.IMGFMT.base_new backing_fmt=IMGFMT
+Formatting 'TEST_DIR/subdir/t.IMGFMT', fmt=IMGFMT size=327680 backing_file=TEST_DIR/subdir/t.IMGFMT.base_old backing_fmt=IMGFMT
+
+Fill backing files with data
+
+wrote 327680/327680 bytes at offset 0
+320 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 262144/262144 bytes at offset 0
+256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Check the last cluster is zeroed in overlay before the rebase
+
+read 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Rebase onto another image in the same chain
+
+Verify that data is read the same before and after rebase
+
+read 262144/262144 bytes at offset 0
+256 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
 *** done
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 03/27] qemu-img: rebase: use backing files' BlockBackend for buffer alignment
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
  2023-10-31 18:58 ` [PULL 01/27] qemu-img: rebase: stop when reaching EOF of old backing file Kevin Wolf
  2023-10-31 18:58 ` [PULL 02/27] qemu-iotests: 024: add rebasing test case for overlay_size > backing_size Kevin Wolf
@ 2023-10-31 18:58 ` Kevin Wolf
  2023-10-31 18:58 ` [PULL 04/27] qemu-img: add chunk size parameter to compare_buffers() Kevin Wolf
                   ` (24 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:58 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

Since commit bb1c05973cf ("qemu-img: Use qemu_blockalign"), buffers for
the data read from the old and new backing files are aligned using
BlockDriverState (or BlockBackend later on) referring to the target image.
However, this isn't quite right, because buf_new is only being used for
reading from the new backing, while buf_old is being used for both reading
from the old backing and writing to the target.  Let's take that into account
and use more appropriate values as alignments.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Message-ID: <20230919165804.439110-4-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-img.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 2b2a3a86ca..e61d996e0f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3759,8 +3759,13 @@ static int img_rebase(int argc, char **argv)
         int64_t n;
         float local_progress = 0;
 
-        buf_old = blk_blockalign(blk, IO_BUF_SIZE);
-        buf_new = blk_blockalign(blk, IO_BUF_SIZE);
+        if (blk_old_backing && bdrv_opt_mem_align(blk_bs(blk_old_backing)) >
+            bdrv_opt_mem_align(blk_bs(blk))) {
+            buf_old = blk_blockalign(blk_old_backing, IO_BUF_SIZE);
+        } else {
+            buf_old = blk_blockalign(blk, IO_BUF_SIZE);
+        }
+        buf_new = blk_blockalign(blk_new_backing, IO_BUF_SIZE);
 
         size = blk_getlength(blk);
         if (size < 0) {
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 04/27] qemu-img: add chunk size parameter to compare_buffers()
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (2 preceding siblings ...)
  2023-10-31 18:58 ` [PULL 03/27] qemu-img: rebase: use backing files' BlockBackend for buffer alignment Kevin Wolf
@ 2023-10-31 18:58 ` Kevin Wolf
  2023-10-31 18:58 ` [PULL 05/27] qemu-img: rebase: avoid unnecessary COW operations Kevin Wolf
                   ` (23 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:58 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

Add @chsize param to the function which, if non-zero, would represent
the chunk size to be used for comparison.  If it's zero, then
BDRV_SECTOR_SIZE is used as default chunk size, which is the previous
behaviour.

In particular, we're going to use this param in img_rebase() to make the
write requests aligned to a predefined alignment value.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20230919165804.439110-5-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-img.c | 24 +++++++++++++++---------
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index e61d996e0f..a64a664a37 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1274,23 +1274,29 @@ static int is_allocated_sectors_min(const uint8_t *buf, int n, int *pnum,
 }
 
 /*
- * Compares two buffers sector by sector. Returns 0 if the first
- * sector of each buffer matches, non-zero otherwise.
+ * Compares two buffers chunk by chunk, where @chsize is the chunk size.
+ * If @chsize is 0, default chunk size of BDRV_SECTOR_SIZE is used.
+ * Returns 0 if the first chunk of each buffer matches, non-zero otherwise.
  *
- * pnum is set to the sector-aligned size of the buffer prefix that
- * has the same matching status as the first sector.
+ * @pnum is set to the size of the buffer prefix aligned to @chsize that
+ * has the same matching status as the first chunk.
  */
 static int compare_buffers(const uint8_t *buf1, const uint8_t *buf2,
-                           int64_t bytes, int64_t *pnum)
+                           int64_t bytes, uint64_t chsize, int64_t *pnum)
 {
     bool res;
-    int64_t i = MIN(bytes, BDRV_SECTOR_SIZE);
+    int64_t i;
 
     assert(bytes > 0);
 
+    if (!chsize) {
+        chsize = BDRV_SECTOR_SIZE;
+    }
+    i = MIN(bytes, chsize);
+
     res = !!memcmp(buf1, buf2, i);
     while (i < bytes) {
-        int64_t len = MIN(bytes - i, BDRV_SECTOR_SIZE);
+        int64_t len = MIN(bytes - i, chsize);
 
         if (!!memcmp(buf1 + i, buf2 + i, len) != res) {
             break;
@@ -1559,7 +1565,7 @@ static int img_compare(int argc, char **argv)
                     ret = 4;
                     goto out;
                 }
-                ret = compare_buffers(buf1, buf2, chunk, &pnum);
+                ret = compare_buffers(buf1, buf2, chunk, 0, &pnum);
                 if (ret || pnum != chunk) {
                     qprintf(quiet, "Content mismatch at offset %" PRId64 "!\n",
                             offset + (ret ? 0 : pnum));
@@ -3887,7 +3893,7 @@ static int img_rebase(int argc, char **argv)
                 int64_t pnum;
 
                 if (compare_buffers(buf_old + written, buf_new + written,
-                                    n - written, &pnum))
+                                    n - written, 0, &pnum))
                 {
                     if (buf_old_is_zero) {
                         ret = blk_pwrite_zeroes(blk, offset + written, pnum, 0);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 05/27] qemu-img: rebase: avoid unnecessary COW operations
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (3 preceding siblings ...)
  2023-10-31 18:58 ` [PULL 04/27] qemu-img: add chunk size parameter to compare_buffers() Kevin Wolf
@ 2023-10-31 18:58 ` Kevin Wolf
  2023-10-31 18:58 ` [PULL 06/27] iotests/{024, 271}: add testcases for qemu-img rebase Kevin Wolf
                   ` (22 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:58 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

When rebasing an image from one backing file to another, we need to
compare data from old and new backings.  If the diff between that data
happens to be unaligned to the target cluster size, we might end up
doing partial writes, which would lead to copy-on-write and additional IO.

Consider the following simple case (virtual_size == cluster_size == 64K):

base <-- inc1 <-- inc2

qemu-io -c "write -P 0xaa 0 32K" base.qcow2
qemu-io -c "write -P 0xcc 32K 32K" base.qcow2
qemu-io -c "write -P 0xbb 0 32K" inc1.qcow2
qemu-io -c "write -P 0xcc 32K 32K" inc1.qcow2
qemu-img rebase -f qcow2 -b base.qcow2 -F qcow2 inc2.qcow2

While doing rebase, we'll write a half of the cluster to inc2, and block
layer will have to read the 2nd half of the same cluster from the base image
inc1 while doing this write operation, although the whole cluster is already
read earlier to perform data comparison.

In order to avoid these unnecessary IO cycles, let's make sure every
write request is aligned to the overlay subcluster boundaries.  Using
subcluster size is universal as for the images which don't have them
this size equals to the cluster size. so in any case we end up aligning
to the smallest unit of allocation.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Message-ID: <20230919165804.439110-6-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qemu-img.c | 74 +++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 54 insertions(+), 20 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index a64a664a37..01de77295e 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3530,6 +3530,7 @@ static int img_rebase(int argc, char **argv)
     uint8_t *buf_new = NULL;
     BlockDriverState *bs = NULL, *prefix_chain_bs = NULL;
     BlockDriverState *unfiltered_bs;
+    BlockDriverInfo bdi = {0};
     char *filename;
     const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
     int c, flags, src_flags, ret;
@@ -3540,6 +3541,7 @@ static int img_rebase(int argc, char **argv)
     bool quiet = false;
     Error *local_err = NULL;
     bool image_opts = false;
+    int64_t write_align;
 
     /* Parse commandline parameters */
     fmt = NULL;
@@ -3663,6 +3665,20 @@ static int img_rebase(int argc, char **argv)
         }
     }
 
+    /*
+     * We need overlay subcluster size to make sure write requests are
+     * aligned.
+     */
+    ret = bdrv_get_info(unfiltered_bs, &bdi);
+    if (ret < 0) {
+        error_report("could not get block driver info");
+        goto out;
+    } else if (bdi.subcluster_size == 0) {
+        bdi.subcluster_size = 1;
+    }
+
+    write_align = bdi.subcluster_size;
+
     /* For safe rebasing we need to compare old and new backing file */
     if (!unsafe) {
         QDict *options = NULL;
@@ -3762,7 +3778,7 @@ static int img_rebase(int argc, char **argv)
         int64_t old_backing_size = 0;
         int64_t new_backing_size = 0;
         uint64_t offset;
-        int64_t n;
+        int64_t n, n_old = 0, n_new = 0;
         float local_progress = 0;
 
         if (blk_old_backing && bdrv_opt_mem_align(blk_bs(blk_old_backing)) >
@@ -3808,7 +3824,8 @@ static int img_rebase(int argc, char **argv)
         }
 
         for (offset = 0; offset < size; offset += n) {
-            bool buf_old_is_zero = false;
+            bool old_backing_eof = false;
+            int64_t n_alloc;
 
             /* How many bytes can we handle with the next read? */
             n = MIN(IO_BUF_SIZE, size - offset);
@@ -3853,33 +3870,46 @@ static int img_rebase(int argc, char **argv)
                 }
             }
 
+            /*
+             * At this point we know that the region [offset; offset + n)
+             * is unallocated within the target image.  This region might be
+             * unaligned to the target image's (sub)cluster boundaries, as
+             * old backing may have smaller clusters (or have subclusters).
+             * We extend it to the aligned boundaries to avoid CoW on
+             * partial writes in blk_pwrite(),
+             */
+            n += offset - QEMU_ALIGN_DOWN(offset, write_align);
+            offset = QEMU_ALIGN_DOWN(offset, write_align);
+            n += QEMU_ALIGN_UP(offset + n, write_align) - (offset + n);
+            n = MIN(n, size - offset);
+            assert(!bdrv_is_allocated(unfiltered_bs, offset, n, &n_alloc) &&
+                   n_alloc == n);
+
+            /*
+             * Much like with the target image, we'll try to read as much
+             * of the old and new backings as we can.
+             */
+            n_old = MIN(n, MAX(0, old_backing_size - (int64_t) offset));
+            n_new = MIN(n, MAX(0, new_backing_size - (int64_t) offset));
+
             /*
              * Read old and new backing file and take into consideration that
              * backing files may be smaller than the COW image.
              */
-            if (offset >= old_backing_size) {
-                memset(buf_old, 0, n);
-                buf_old_is_zero = true;
+            memset(buf_old + n_old, 0, n - n_old);
+            if (!n_old) {
+                old_backing_eof = true;
             } else {
-                if (offset + n > old_backing_size) {
-                    n = old_backing_size - offset;
-                }
-
-                ret = blk_pread(blk_old_backing, offset, n, buf_old, 0);
+                ret = blk_pread(blk_old_backing, offset, n_old, buf_old, 0);
                 if (ret < 0) {
                     error_report("error while reading from old backing file");
                     goto out;
                 }
             }
 
-            if (offset >= new_backing_size || !blk_new_backing) {
-                memset(buf_new, 0, n);
-            } else {
-                if (offset + n > new_backing_size) {
-                    n = new_backing_size - offset;
-                }
-
-                ret = blk_pread(blk_new_backing, offset, n, buf_new, 0);
+            memset(buf_new + n_new, 0, n - n_new);
+            if (n_new) {
+                ret = blk_pread(blk_new_backing, offset, n_new, buf_new, 0);
                 if (ret < 0) {
                     error_report("error while reading from new backing file");
                     goto out;
@@ -3893,11 +3923,12 @@ static int img_rebase(int argc, char **argv)
                 int64_t pnum;
 
                 if (compare_buffers(buf_old + written, buf_new + written,
-                                    n - written, 0, &pnum))
+                                    n - written, write_align, &pnum))
                 {
-                    if (buf_old_is_zero) {
+                    if (old_backing_eof) {
                         ret = blk_pwrite_zeroes(blk, offset + written, pnum, 0);
                     } else {
+                        assert(written + pnum <= IO_BUF_SIZE);
                         ret = blk_pwrite(blk, offset + written, pnum,
                                          buf_old + written, 0);
                     }
@@ -3909,6 +3940,9 @@ static int img_rebase(int argc, char **argv)
                 }
 
                 written += pnum;
+                if (offset + written >= old_backing_size) {
+                    old_backing_eof = true;
+                }
             }
             qemu_progress_print(local_progress, 100);
         }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 06/27] iotests/{024, 271}: add testcases for qemu-img rebase
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (4 preceding siblings ...)
  2023-10-31 18:58 ` [PULL 05/27] qemu-img: rebase: avoid unnecessary COW operations Kevin Wolf
@ 2023-10-31 18:58 ` Kevin Wolf
  2024-07-22  7:18   ` Thomas Huth
  2023-10-31 18:58 ` [PULL 07/27] qemu-img: add compression option to rebase subcommand Kevin Wolf
                   ` (21 subsequent siblings)
  27 siblings, 1 reply; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:58 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

As the previous commit changes the logic of "qemu-img rebase" (it's using
write alignment now), let's add a couple more test cases which would
ensure it works correctly.  In particular, the following scenarios:

024: add test case for rebase within one backing chain when the overlay
     cluster size > backings cluster size;
271: add test case for rebase images that contain subclusters.  Check
     that no extra allocations are being made.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20230919165804.439110-7-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/024     | 60 ++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/024.out | 43 +++++++++++++++++++++++++
 tests/qemu-iotests/271     | 66 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/271.out | 42 ++++++++++++++++++++++++
 4 files changed, 211 insertions(+)

diff --git a/tests/qemu-iotests/024 b/tests/qemu-iotests/024
index 98a7c8fd65..285f17e79f 100755
--- a/tests/qemu-iotests/024
+++ b/tests/qemu-iotests/024
@@ -257,6 +257,66 @@ $QEMU_IO "$OVERLAY" -c "read -P 0x00 $(( CLUSTER_SIZE * 4 )) $CLUSTER_SIZE" \
 
 echo
 
+# Check that rebase within the chain is working when
+# overlay cluster size > backings cluster size
+# (here overlay cluster size == 2 * backings cluster size)
+#
+# base_new <-- base_old <-- overlay
+#
+# Backing (new): -- -- -- -- -- --
+# Backing (old): -- 11 -- -- 22 --
+# Overlay:      |-- --|-- --|-- --|
+#
+# We should end up having 1st and 3rd cluster allocated, and their halves
+# being read as zeroes.
+
+echo
+echo "=== Test rebase with different cluster sizes ==="
+echo
+
+echo "Creating backing chain"
+echo
+
+TEST_IMG=$BASE_NEW _make_test_img $(( CLUSTER_SIZE * 6 ))
+TEST_IMG=$BASE_OLD _make_test_img -b "$BASE_NEW" -F $IMGFMT \
+    $(( CLUSTER_SIZE * 6 ))
+CLUSTER_SIZE=$(( CLUSTER_SIZE * 2 )) TEST_IMG=$OVERLAY \
+    _make_test_img -b "$BASE_OLD" -F $IMGFMT $(( CLUSTER_SIZE * 6 ))
+
+TEST_IMG=$OVERLAY _img_info
+
+echo
+echo "Fill backing files with data"
+echo
+
+$QEMU_IO "$BASE_OLD" -c "write -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" \
+    -c "write -P 0x22 $(( CLUSTER_SIZE * 4 )) $CLUSTER_SIZE" \
+    | _filter_qemu_io
+
+echo
+echo "Rebase onto another image in the same chain"
+echo
+
+$QEMU_IMG rebase -b "$BASE_NEW" -F $IMGFMT "$OVERLAY"
+
+echo "Verify that data is read the same before and after rebase"
+echo
+
+$QEMU_IO "$OVERLAY" -c "read -P 0x00 0 $CLUSTER_SIZE" \
+    -c "read -P 0x11 $CLUSTER_SIZE $CLUSTER_SIZE" \
+    -c "read -P 0x00 $(( CLUSTER_SIZE * 2 )) $(( CLUSTER_SIZE * 2 ))" \
+    -c "read -P 0x22 $(( CLUSTER_SIZE * 4 )) $CLUSTER_SIZE" \
+    -c "read -P 0x00 $(( CLUSTER_SIZE * 5 )) $CLUSTER_SIZE" \
+    | _filter_qemu_io
+
+echo
+echo "Verify that untouched cluster remains unallocated"
+echo
+
+$QEMU_IMG map "$OVERLAY" | _filter_qemu_img_map
+
+echo
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/024.out b/tests/qemu-iotests/024.out
index 245fe8b1d1..e1e8eea863 100644
--- a/tests/qemu-iotests/024.out
+++ b/tests/qemu-iotests/024.out
@@ -201,4 +201,47 @@ read 262144/262144 bytes at offset 0
 read 65536/65536 bytes at offset 262144
 64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 
+
+=== Test rebase with different cluster sizes ===
+
+Creating backing chain
+
+Formatting 'TEST_DIR/subdir/t.IMGFMT.base_new', fmt=IMGFMT size=393216
+Formatting 'TEST_DIR/subdir/t.IMGFMT.base_old', fmt=IMGFMT size=393216 backing_file=TEST_DIR/subdir/t.IMGFMT.base_new backing_fmt=IMGFMT
+Formatting 'TEST_DIR/subdir/t.IMGFMT', fmt=IMGFMT size=393216 backing_file=TEST_DIR/subdir/t.IMGFMT.base_old backing_fmt=IMGFMT
+image: TEST_DIR/subdir/t.IMGFMT
+file format: IMGFMT
+virtual size: 384 KiB (393216 bytes)
+cluster_size: 131072
+backing file: TEST_DIR/subdir/t.IMGFMT.base_old
+backing file format: IMGFMT
+
+Fill backing files with data
+
+wrote 65536/65536 bytes at offset 65536
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Rebase onto another image in the same chain
+
+Verify that data is read the same before and after rebase
+
+read 65536/65536 bytes at offset 0
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 65536
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 131072/131072 bytes at offset 131072
+128 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 262144
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 65536/65536 bytes at offset 327680
+64 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+Verify that untouched cluster remains unallocated
+
+Offset          Length          File
+0               0x20000         TEST_DIR/subdir/t.IMGFMT
+0x40000         0x20000         TEST_DIR/subdir/t.IMGFMT
+
 *** done
diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
index c7c2cadda0..e243f57ba7 100755
--- a/tests/qemu-iotests/271
+++ b/tests/qemu-iotests/271
@@ -899,6 +899,72 @@ _concurrent_io     | $QEMU_IO | _filter_qemu_io | \
     sed -e 's/\(20480\|40960\)/OFFSET/'
 _concurrent_verify | $QEMU_IO | _filter_qemu_io
 
+############################################################
+############################################################
+############################################################
+
+echo
+echo "### Rebase of qcow2 images with subclusters ###"
+echo
+
+l2_offset=$((0x400000))
+
+# Check that rebase operation preserve holes between allocated subclusters
+# within one cluster (i.e. does not allocate extra space).  Check that the
+# data is preserved as well.
+#
+# Base (new backing): -- -- -- ... -- -- --
+# Mid (old backing):  -- 11 -- ... -- 22 --
+# Top:                -- -- -- ... -- -- --
+
+echo "### Preservation of unallocated holes after rebase ###"
+echo
+
+echo "# create backing chain"
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img -o cluster_size=1M,extended_l2=on 1M
+TEST_IMG="$TEST_IMG.mid" _make_test_img -o cluster_size=1M,extended_l2=on \
+    -b "$TEST_IMG.base" -F qcow2 1M
+TEST_IMG="$TEST_IMG.top" _make_test_img -o cluster_size=1M,extended_l2=on \
+    -b "$TEST_IMG.mid" -F qcow2 1M
+
+echo
+echo "# fill old backing with data (separate subclusters within cluster)"
+echo
+
+$QEMU_IO -c "write -P 0x11 32k 32k" \
+         -c "write -P 0x22 $(( 30 * 32 ))k 32k" \
+         "$TEST_IMG.mid" | _filter_qemu_io
+
+echo
+echo "# rebase topmost image onto the new backing"
+echo
+
+$QEMU_IMG rebase -b "$TEST_IMG.base" -F qcow2 "$TEST_IMG.top"
+
+echo "# verify that data is read the same before and after rebase"
+echo
+
+$QEMU_IO -c "read -P 0x00 0 32k" \
+         -c "read -P 0x11 32k 32k" \
+         -c "read -P 0x00 64k $(( 28 * 32 ))k" \
+         -c "read -P 0x22 $(( 30 * 32 ))k 32k" \
+         -c "read -P 0x00 $(( 31 * 32 ))k 32k" \
+         "$TEST_IMG.top" | _filter_qemu_io
+
+echo
+echo "# verify that only selected subclusters remain allocated"
+echo
+
+$QEMU_IMG map "$TEST_IMG.top" | _filter_testdir
+
+echo
+echo "# verify image bitmap"
+echo
+
+TEST_IMG="$TEST_IMG.top" alloc="1 30" zero="" _verify_l2_bitmap 0
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/271.out b/tests/qemu-iotests/271.out
index 5be780de76..c335a6c608 100644
--- a/tests/qemu-iotests/271.out
+++ b/tests/qemu-iotests/271.out
@@ -723,4 +723,46 @@ wrote 2048/2048 bytes at offset OFFSET
 2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 wrote 2048/2048 bytes at offset OFFSET
 2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+### Rebase of qcow2 images with subclusters ###
+
+### Preservation of unallocated holes after rebase ###
+
+# create backing chain
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=1048576
+Formatting 'TEST_DIR/t.IMGFMT.mid', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
+Formatting 'TEST_DIR/t.IMGFMT.top', fmt=IMGFMT size=1048576 backing_file=TEST_DIR/t.IMGFMT.mid backing_fmt=IMGFMT
+
+# fill old backing with data (separate subclusters within cluster)
+
+wrote 32768/32768 bytes at offset 32768
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 32768/32768 bytes at offset 983040
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+# rebase topmost image onto the new backing
+
+# verify that data is read the same before and after rebase
+
+read 32768/32768 bytes at offset 0
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32768/32768 bytes at offset 32768
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 917504/917504 bytes at offset 65536
+896 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32768/32768 bytes at offset 983040
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32768/32768 bytes at offset 1015808
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+# verify that only selected subclusters remain allocated
+
+Offset          Length          Mapped to       File
+0x8000          0x8000          0x508000        TEST_DIR/t.qcow2.top
+0xf0000         0x8000          0x5f0000        TEST_DIR/t.qcow2.top
+
+# verify image bitmap
+
+L2 entry #0: 0x8000000000500000 0000000040000002
 *** done
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PULL 06/27] iotests/{024, 271}: add testcases for qemu-img rebase
  2023-10-31 18:58 ` [PULL 06/27] iotests/{024, 271}: add testcases for qemu-img rebase Kevin Wolf
@ 2024-07-22  7:18   ` Thomas Huth
  2024-07-30  9:43     ` Andrey Drobyshev
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Huth @ 2024-07-22  7:18 UTC (permalink / raw)
  To: Kevin Wolf, qemu-block, Andrey Drobyshev; +Cc: qemu-devel

On 31/10/2023 19.58, Kevin Wolf wrote:
> From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> 
> As the previous commit changes the logic of "qemu-img rebase" (it's using
> write alignment now), let's add a couple more test cases which would
> ensure it works correctly.  In particular, the following scenarios:
> 
> 024: add test case for rebase within one backing chain when the overlay
>       cluster size > backings cluster size;
> 271: add test case for rebase images that contain subclusters.  Check
>       that no extra allocations are being made.
> 
> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
> Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
> Message-ID: <20230919165804.439110-7-andrey.drobyshev@virtuozzo.com>
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   tests/qemu-iotests/024     | 60 ++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/024.out | 43 +++++++++++++++++++++++++
>   tests/qemu-iotests/271     | 66 ++++++++++++++++++++++++++++++++++++++
>   tests/qemu-iotests/271.out | 42 ++++++++++++++++++++++++
>   4 files changed, 211 insertions(+)

  Hi!

Seems like this patch now breaks the iotests when running with -qed :

$ ./check -qed 024
QEMU          -- ".../qemu-build/qemu-system-x86_64" -nodefaults -display 
none -accel qtest
QEMU_IMG      -- ".../qemu-build/qemu-img"
QEMU_IO       -- ".../qemu-build/qemu-io" --cache writeback --aio threads -f qed
QEMU_NBD      -- ".../qemu-build/qemu-nbd"
IMGFMT        -- qed
IMGPROTO      -- file
PLATFORM      -- Linux/x86_64 thuth-p1g4 6.9.9-200.fc40.x86_64
TEST_DIR      -- .../qemu-build/tests/qemu-iotests/scratch
SOCK_DIR      -- /tmp/qemu-iotests-b84qth8b
GDB_OPTIONS   --
VALGRIND_QEMU --
PRINT_QEMU_OUTPUT --

024   fail       [09:14:06] [09:14:09]   2.9s                 output 
mismatch (see 
.../qemu-build/tests/qemu-iotests/scratch/qed-file-024/024.out.bad)
--- .../qemu/tests/qemu-iotests/024.out
+++ .../qemu-build/tests/qemu-iotests/scratch/qed-file-024/024.out.bad
@@ -214,7 +214,6 @@
  virtual size: 384 KiB (393216 bytes)
  cluster_size: 131072
  backing file: TEST_DIR/subdir/t.IMGFMT.base_old
-backing file format: IMGFMT

  Fill backing files with data

Failures: 024
Failed 1 of 1 iotests

Could you please have a look at it?

  Thanks,
   Thomas



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PULL 06/27] iotests/{024, 271}: add testcases for qemu-img rebase
  2024-07-22  7:18   ` Thomas Huth
@ 2024-07-30  9:43     ` Andrey Drobyshev
  0 siblings, 0 replies; 32+ messages in thread
From: Andrey Drobyshev @ 2024-07-30  9:43 UTC (permalink / raw)
  To: Thomas Huth, Kevin Wolf, qemu-block; +Cc: qemu-devel

On 7/22/24 10:18 AM, Thomas Huth wrote:
> [Вы нечасто получаете письма от thuth@redhat.com. Узнайте, почему это
> важно, по адресу https://aka.ms/LearnAboutSenderIdentification ]
> 
> On 31/10/2023 19.58, Kevin Wolf wrote:
>> From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>>
>> As the previous commit changes the logic of "qemu-img rebase" (it's using
>> write alignment now), let's add a couple more test cases which would
>> ensure it works correctly.  In particular, the following scenarios:
>>
>> 024: add test case for rebase within one backing chain when the overlay
>>       cluster size > backings cluster size;
>> 271: add test case for rebase images that contain subclusters.  Check
>>       that no extra allocations are being made.
>>
>> Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
>> Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
>> Message-ID: <20230919165804.439110-7-andrey.drobyshev@virtuozzo.com>
>> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>> ---
>>   tests/qemu-iotests/024     | 60 ++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/024.out | 43 +++++++++++++++++++++++++
>>   tests/qemu-iotests/271     | 66 ++++++++++++++++++++++++++++++++++++++
>>   tests/qemu-iotests/271.out | 42 ++++++++++++++++++++++++
>>   4 files changed, 211 insertions(+)
> 
>  Hi!
> 
> Seems like this patch now breaks the iotests when running with -qed :
> 
> $ ./check -qed 024
> QEMU          -- ".../qemu-build/qemu-system-x86_64" -nodefaults -display
> none -accel qtest
> QEMU_IMG      -- ".../qemu-build/qemu-img"
> QEMU_IO       -- ".../qemu-build/qemu-io" --cache writeback --aio
> threads -f qed
> QEMU_NBD      -- ".../qemu-build/qemu-nbd"
> IMGFMT        -- qed
> IMGPROTO      -- file
> PLATFORM      -- Linux/x86_64 thuth-p1g4 6.9.9-200.fc40.x86_64
> TEST_DIR      -- .../qemu-build/tests/qemu-iotests/scratch
> SOCK_DIR      -- /tmp/qemu-iotests-b84qth8b
> GDB_OPTIONS   --
> VALGRIND_QEMU --
> PRINT_QEMU_OUTPUT --
> 
> 024   fail       [09:14:06] [09:14:09]   2.9s                 output
> mismatch (see
> .../qemu-build/tests/qemu-iotests/scratch/qed-file-024/024.out.bad)
> --- .../qemu/tests/qemu-iotests/024.out
> +++ .../qemu-build/tests/qemu-iotests/scratch/qed-file-024/024.out.bad
> @@ -214,7 +214,6 @@
>  virtual size: 384 KiB (393216 bytes)
>  cluster_size: 131072
>  backing file: TEST_DIR/subdir/t.IMGFMT.base_old
> -backing file format: IMGFMT
> 
>  Fill backing files with data
> 
> Failures: 024
> Failed 1 of 1 iotests
> 
> Could you please have a look at it?
> 
>  Thanks,
>   Thomas
> 

Hi Thomas,

Thanks for the catch.  That seems to be a minor issue, apparently
'qemu-img info' doesn't report the backing file format field for qed (as
it does for qcow2):

# qemu-img create -f qed base.qed 1M && qemu-img create -f qed -b
base.qed -F qed top.qed 1M
# qemu-img create -f qcow2 base.qcow2 1M && qemu-img create -f qcow2 -b
base.qcow2 -F qcow2 top.qcow2 1M
# qemu-img info top.qed | grep 'backing file format'
# qemu-img info top.qcow2 | grep 'backing file format'
backing file format: qcow2

I think we can just filter the field out and remove it from the expected
output.

Andrey


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PULL 07/27] qemu-img: add compression option to rebase subcommand
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (5 preceding siblings ...)
  2023-10-31 18:58 ` [PULL 06/27] iotests/{024, 271}: add testcases for qemu-img rebase Kevin Wolf
@ 2023-10-31 18:58 ` Kevin Wolf
  2023-10-31 18:58 ` [PULL 08/27] iotests: add tests for "qemu-img rebase" with compression Kevin Wolf
                   ` (20 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:58 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

If we rebase an image whose backing file has compressed clusters, we
might end up wasting disk space since the copied clusters are now
uncompressed.  In order to have better control over this, let's add
"--compress" option to the "qemu-img rebase" command.

Note that this option affects only the clusters which are actually being
copied from the original backing file.  The clusters which were
uncompressed in the target image will remain so.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20230919165804.439110-8-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 docs/tools/qemu-img.rst |  6 ++++--
 qemu-img.c              | 26 ++++++++++++++++++++------
 qemu-img-cmds.hx        |  4 ++--
 3 files changed, 26 insertions(+), 10 deletions(-)

diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index ca5a2773cf..4459c065f1 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -667,7 +667,7 @@ Command description:
 
   List, apply, create or delete snapshots in image *FILENAME*.
 
-.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-p] [-u] -b BACKING_FILE [-F BACKING_FMT] FILENAME
+.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-p] [-u] [-c] -b BACKING_FILE [-F BACKING_FMT] FILENAME
 
   Changes the backing file of an image. Only the formats ``qcow2`` and
   ``qed`` support changing the backing file.
@@ -694,7 +694,9 @@ Command description:
 
     In order to achieve this, any clusters that differ between
     *BACKING_FILE* and the old backing file of *FILENAME* are merged
-    into *FILENAME* before actually changing the backing file.
+    into *FILENAME* before actually changing the backing file. With the
+    ``-c`` option specified, the clusters which are being merged (but not
+    the entire *FILENAME* image) are compressed when written.
 
     Note that the safe mode is an expensive operation, comparable to
     converting an image. It only works if the old backing file still
diff --git a/qemu-img.c b/qemu-img.c
index 01de77295e..369c2e8ddf 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3534,11 +3534,13 @@ static int img_rebase(int argc, char **argv)
     char *filename;
     const char *fmt, *cache, *src_cache, *out_basefmt, *out_baseimg;
     int c, flags, src_flags, ret;
+    BdrvRequestFlags write_flags = 0;
     bool writethrough, src_writethrough;
     int unsafe = 0;
     bool force_share = false;
     int progress = 0;
     bool quiet = false;
+    bool compress = false;
     Error *local_err = NULL;
     bool image_opts = false;
     int64_t write_align;
@@ -3555,9 +3557,10 @@ static int img_rebase(int argc, char **argv)
             {"object", required_argument, 0, OPTION_OBJECT},
             {"image-opts", no_argument, 0, OPTION_IMAGE_OPTS},
             {"force-share", no_argument, 0, 'U'},
+            {"compress", no_argument, 0, 'c'},
             {0, 0, 0, 0}
         };
-        c = getopt_long(argc, argv, ":hf:F:b:upt:T:qU",
+        c = getopt_long(argc, argv, ":hf:F:b:upt:T:qUc",
                         long_options, NULL);
         if (c == -1) {
             break;
@@ -3605,6 +3608,9 @@ static int img_rebase(int argc, char **argv)
         case 'U':
             force_share = true;
             break;
+        case 'c':
+            compress = true;
+            break;
         }
     }
 
@@ -3657,6 +3663,14 @@ static int img_rebase(int argc, char **argv)
 
     unfiltered_bs = bdrv_skip_filters(bs);
 
+    if (compress && !block_driver_can_compress(unfiltered_bs->drv)) {
+        error_report("Compression not supported for this file format");
+        ret = -1;
+        goto out;
+    } else if (compress) {
+        write_flags |= BDRV_REQ_WRITE_COMPRESSED;
+    }
+
     if (out_basefmt != NULL) {
         if (bdrv_find_format(out_basefmt) == NULL) {
             error_report("Invalid format name: '%s'", out_basefmt);
@@ -3666,18 +3680,18 @@ static int img_rebase(int argc, char **argv)
     }
 
     /*
-     * We need overlay subcluster size to make sure write requests are
-     * aligned.
+     * We need overlay subcluster size (or cluster size in case writes are
+     * compressed) to make sure write requests are aligned.
      */
     ret = bdrv_get_info(unfiltered_bs, &bdi);
     if (ret < 0) {
         error_report("could not get block driver info");
         goto out;
     } else if (bdi.subcluster_size == 0) {
-        bdi.subcluster_size = 1;
+        bdi.cluster_size = bdi.subcluster_size = 1;
     }
 
-    write_align = bdi.subcluster_size;
+    write_align = compress ? bdi.cluster_size : bdi.subcluster_size;
 
     /* For safe rebasing we need to compare old and new backing file */
     if (!unsafe) {
@@ -3930,7 +3944,7 @@ static int img_rebase(int argc, char **argv)
                     } else {
                         assert(written + pnum <= IO_BUF_SIZE);
                         ret = blk_pwrite(blk, offset + written, pnum,
-                                         buf_old + written, 0);
+                                         buf_old + written, write_flags);
                     }
                     if (ret < 0) {
                         error_report("Error while writing to COW image: %s",
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 1b1dab5b17..068692d13e 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -88,9 +88,9 @@ SRST
 ERST
 
 DEF("rebase", img_rebase,
-    "rebase [--object objectdef] [--image-opts] [-U] [-q] [-f fmt] [-t cache] [-T src_cache] [-p] [-u] -b backing_file [-F backing_fmt] filename")
+    "rebase [--object objectdef] [--image-opts] [-U] [-q] [-f fmt] [-t cache] [-T src_cache] [-p] [-u] [-c] -b backing_file [-F backing_fmt] filename")
 SRST
-.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-p] [-u] -b BACKING_FILE [-F BACKING_FMT] FILENAME
+.. option:: rebase [--object OBJECTDEF] [--image-opts] [-U] [-q] [-f FMT] [-t CACHE] [-T SRC_CACHE] [-p] [-u] [-c] -b BACKING_FILE [-F BACKING_FMT] FILENAME
 ERST
 
 DEF("resize", img_resize,
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 08/27] iotests: add tests for "qemu-img rebase" with compression
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (6 preceding siblings ...)
  2023-10-31 18:58 ` [PULL 07/27] qemu-img: add compression option to rebase subcommand Kevin Wolf
@ 2023-10-31 18:58 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 09/27] block: Fix locking in media change monitor commands Kevin Wolf
                   ` (19 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:58 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>

The test cases considered so far:

314 (new test suite):

1. Check that compression mode isn't compatible with "-f raw" (raw
   format doesn't support compression).
2. Check that rebasing an image onto no backing file preserves the data
   and writes the copied clusters actually compressed.
3. Same as 2, but with a raw backing file (i.e. the clusters copied from the
   backing are originally uncompressed -- we check they end up compressed
   after being merged).
4. Remove a single delta from a backing chain, perform the same checks
   as in 2.
5. Check that even when backing and overlay are initially uncompressed,
   copied clusters end up compressed when rebase with compression is
   performed.

271:

1. Check that when target image has subclusters, rebase with compression
   will make an entire cluster containing the written subcluster
   compressed.

Signed-off-by: Andrey Drobyshev <andrey.drobyshev@virtuozzo.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Message-ID: <20230919165804.439110-9-andrey.drobyshev@virtuozzo.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/271     |  65 +++++++++++++++
 tests/qemu-iotests/271.out |  40 +++++++++
 tests/qemu-iotests/314     | 165 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/314.out |  75 +++++++++++++++++
 4 files changed, 345 insertions(+)
 create mode 100755 tests/qemu-iotests/314
 create mode 100644 tests/qemu-iotests/314.out

diff --git a/tests/qemu-iotests/271 b/tests/qemu-iotests/271
index e243f57ba7..59a6fafa2f 100755
--- a/tests/qemu-iotests/271
+++ b/tests/qemu-iotests/271
@@ -965,6 +965,71 @@ echo
 
 TEST_IMG="$TEST_IMG.top" alloc="1 30" zero="" _verify_l2_bitmap 0
 
+# Check that rebase with compression works correctly with images containing
+# subclusters.  When compression is enabled and we allocate a new
+# subcluster within the target (overlay) image, we expect the entire cluster
+# containing that subcluster to become compressed.
+#
+# Here we expect 1st and 3rd clusters of the top (overlay) image to become
+# compressed after the rebase, while cluster 2 to remain unallocated and
+# be read from the base (new backing) image.
+#
+# Base (new backing): |-- -- .. -- --|11 11 .. 11 11|-- -- .. -- --|
+# Mid (old backing):  |-- -- .. -- 22|-- -- .. -- --|33 -- .. -- --|
+# Top:                |-- -- .. -- --|-- -- -- -- --|-- -- .. -- --|
+
+echo
+echo "### Rebase with compression for images with subclusters ###"
+echo
+
+echo "# create backing chain"
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img -o cluster_size=1M,extended_l2=on 3M
+TEST_IMG="$TEST_IMG.mid" _make_test_img -o cluster_size=1M,extended_l2=on \
+    -b "$TEST_IMG.base" -F qcow2 3M
+TEST_IMG="$TEST_IMG.top" _make_test_img -o cluster_size=1M,extended_l2=on \
+    -b "$TEST_IMG.mid" -F qcow2 3M
+
+echo
+echo "# fill old and new backing with data"
+echo
+
+$QEMU_IO -c "write -P 0x11 1M 1M" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -P 0x22 $(( 31 * 32 ))k 32k" \
+         -c "write -P 0x33 $(( 64 * 32 ))k 32k" \
+         "$TEST_IMG.mid" | _filter_qemu_io
+
+echo
+echo "# rebase topmost image onto the new backing, with compression"
+echo
+
+$QEMU_IMG rebase -c -b "$TEST_IMG.base" -F qcow2 "$TEST_IMG.top"
+
+echo "# verify that the 1st and 3rd clusters've become compressed"
+echo
+
+$QEMU_IMG map --output=json "$TEST_IMG.top" | _filter_testdir
+
+echo
+echo "# verify that data is read the same before and after rebase"
+echo
+
+$QEMU_IO -c "read -P 0x22 $(( 31 * 32 ))k 32k" \
+         -c "read -P 0x11 1M 1M" \
+         -c "read -P 0x33 $(( 64 * 32 ))k 32k" \
+         "$TEST_IMG.top" | _filter_qemu_io
+
+echo
+echo "# verify image bitmap"
+echo
+
+# For compressed clusters bitmap is always 0.  For unallocated cluster
+# there should be no entry at all, thus bitmap is also 0.
+TEST_IMG="$TEST_IMG.top" alloc="" zero="" _verify_l2_bitmap 0
+TEST_IMG="$TEST_IMG.top" alloc="" zero="" _verify_l2_bitmap 1
+TEST_IMG="$TEST_IMG.top" alloc="" zero="" _verify_l2_bitmap 2
+
 # success, all done
 echo "*** done"
 rm -f $seq.full
diff --git a/tests/qemu-iotests/271.out b/tests/qemu-iotests/271.out
index c335a6c608..0b24d50159 100644
--- a/tests/qemu-iotests/271.out
+++ b/tests/qemu-iotests/271.out
@@ -765,4 +765,44 @@ Offset          Length          Mapped to       File
 # verify image bitmap
 
 L2 entry #0: 0x8000000000500000 0000000040000002
+
+### Rebase with compression for images with subclusters ###
+
+# create backing chain
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=3145728
+Formatting 'TEST_DIR/t.IMGFMT.mid', fmt=IMGFMT size=3145728 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
+Formatting 'TEST_DIR/t.IMGFMT.top', fmt=IMGFMT size=3145728 backing_file=TEST_DIR/t.IMGFMT.mid backing_fmt=IMGFMT
+
+# fill old and new backing with data
+
+wrote 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 32768/32768 bytes at offset 1015808
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 32768/32768 bytes at offset 2097152
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+# rebase topmost image onto the new backing, with compression
+
+# verify that the 1st and 3rd clusters've become compressed
+
+[{ "start": 0, "length": 1048576, "depth": 0, "present": true, "zero": false, "data": true, "compressed": true},
+{ "start": 1048576, "length": 1048576, "depth": 1, "present": true, "zero": false, "data": true, "compressed": false, "offset": 5242880},
+{ "start": 2097152, "length": 1048576, "depth": 0, "present": true, "zero": false, "data": true, "compressed": true}]
+
+# verify that data is read the same before and after rebase
+
+read 32768/32768 bytes at offset 1015808
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 1048576/1048576 bytes at offset 1048576
+1 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32768/32768 bytes at offset 2097152
+32 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+# verify image bitmap
+
+L2 entry #0: 0x4008000000500000 0000000000000000
+L2 entry #1: 0x0000000000000000 0000000000000000
+L2 entry #2: 0x400800000050040b 0000000000000000
 *** done
diff --git a/tests/qemu-iotests/314 b/tests/qemu-iotests/314
new file mode 100755
index 0000000000..96d7b4d258
--- /dev/null
+++ b/tests/qemu-iotests/314
@@ -0,0 +1,165 @@
+#!/usr/bin/env bash
+# group: rw backing auto quick
+#
+# Test qemu-img rebase with compression
+#
+# Copyright (c) 2023 Virtuozzo International GmbH.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=andrey.drobyshev@virtuozzo.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+status=1	# failure is the default!
+
+_cleanup()
+{
+    _cleanup_test_img
+    _rm_test_img "$TEST_IMG.base"
+    _rm_test_img "$TEST_IMG.itmd"
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+# Want the size divisible by 2 and 3
+size=$(( 48 * 1024 * 1024 ))
+half_size=$(( size / 2 ))
+third_size=$(( size / 3 ))
+
+# 1. "qemu-img rebase -c" should refuse working with any format which doesn't
+# support compression.  We only check "-f raw" here.
+echo
+echo "=== Testing compressed rebase format compatibility ==="
+echo
+
+$QEMU_IMG create -f raw "$TEST_IMG" "$size" | _filter_img_create
+$QEMU_IMG rebase -c -f raw -b "" "$TEST_IMG"
+
+# 2. Write the 1st half of $size to backing file (compressed), 2nd half -- to
+# the top image (also compressed).  Rebase the top image onto no backing file,
+# with compression (i.e. "qemu-img -c -b ''").  Check that the resulting image
+# has the written data preserved, and "qemu-img check" reports 100% clusters
+# as compressed.
+echo
+echo "=== Testing rebase with compression onto no backing file ==="
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img $size
+_make_test_img -b "$TEST_IMG.base" -F $IMGFMT $size
+
+$QEMU_IO -c "write -c -P 0xaa 0 $half_size" "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xbb $half_size $half_size" "$TEST_IMG" \
+    | _filter_qemu_io
+
+$QEMU_IMG rebase -c -f $IMGFMT -b "" "$TEST_IMG"
+
+$QEMU_IO -c "read -P 0xaa 0 $half_size" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xbb $half_size $half_size" "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG check "$TEST_IMG" | _filter_testdir
+
+# 3. Same as the previous one, but with raw backing file (hence write to
+# the backing is uncompressed).
+echo
+echo "=== Testing rebase with compression with raw backing file ==="
+echo
+
+$QEMU_IMG create -f raw "$TEST_IMG.base" "$half_size" | _filter_img_create
+_make_test_img -b "$TEST_IMG.base" -F raw $size
+
+$QEMU_IO -f raw -c "write -P 0xaa 0 $half_size" "$TEST_IMG.base" \
+    | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xbb $half_size $half_size" \
+    "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG rebase -c -f $IMGFMT -b "" "$TEST_IMG"
+
+$QEMU_IO -c "read -P 0xaa 0 $half_size" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xbb $half_size $half_size" "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG check "$TEST_IMG" | _filter_testdir
+
+# 4. Create a backing chain base<--itmd<--img, filling 1st, 2nd and 3rd
+# thirds of them, respectively (with compression).  Rebase img onto base,
+# effectively deleting itmd from the chain, and check that written data is
+# preserved in the resulting image.  Also check that "qemu-img check" reports
+# 100% clusters as compressed.
+echo
+echo "=== Testing compressed rebase removing single delta from the chain ==="
+echo
+
+TEST_IMG="$TEST_IMG.base" _make_test_img $size
+TEST_IMG="$TEST_IMG.itmd" _make_test_img -b "$TEST_IMG.base" -F $IMGFMT $size
+_make_test_img -b "$TEST_IMG.itmd" -F $IMGFMT $size
+
+$QEMU_IO -c "write -c -P 0xaa 0 $third_size" \
+    "$TEST_IMG.base" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xbb $third_size $third_size" \
+    "$TEST_IMG.itmd" | _filter_qemu_io
+$QEMU_IO -c "write -c -P 0xcc $((third_size * 2 )) $third_size" \
+    "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG rebase -c -f $IMGFMT -b "$TEST_IMG.base" -F $IMGFMT "$TEST_IMG"
+
+$QEMU_IO -c "read -P 0xaa 0 $third_size" "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xbb $third_size $third_size" \
+    "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xcc $(( third_size * 2 )) $third_size" \
+    "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG check "$TEST_IMG" | _filter_testdir
+
+# 5. Create one-cluster backing and overlay images, and fill only the first
+# (half - 1) bytes of the backing with data (uncompressed).  Rebase the
+# overlay onto no backing file with compression.  Check that data is still
+# read correctly, and that cluster is now really compressed ("qemu-img check"
+# reports 100% clusters as compressed.
+echo
+echo "=== Testing compressed rebase with unaligned unmerged data ==="
+echo
+
+CLUSTER_SIZE=65536
+
+TEST_IMG="$TEST_IMG.base" _make_test_img $CLUSTER_SIZE
+_make_test_img -b "$TEST_IMG.base" -F $IMGFMT $CLUSTER_SIZE
+
+$QEMU_IO -c "write -P 0xaa 0 $(( CLUSTER_SIZE / 2 - 1 ))" $TEST_IMG.base \
+    | _filter_qemu_io
+
+$QEMU_IMG rebase -c -f $IMGFMT -b "" "$TEST_IMG"
+
+$QEMU_IO -c "read -P 0xaa 0 $(( CLUSTER_SIZE / 2 - 1 ))" "$TEST_IMG" \
+    | _filter_qemu_io
+$QEMU_IO -c \
+    "read -P 0x00 $(( CLUSTER_SIZE / 2 - 1 )) $(( CLUSTER_SIZE / 2 + 1 ))" \
+    "$TEST_IMG" | _filter_qemu_io
+
+$QEMU_IMG check "$TEST_IMG" | _filter_testdir
+
+# success, all done
+echo
+echo '*** done'
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/314.out b/tests/qemu-iotests/314.out
new file mode 100644
index 0000000000..ac9337a543
--- /dev/null
+++ b/tests/qemu-iotests/314.out
@@ -0,0 +1,75 @@
+QA output created by 314
+
+=== Testing compressed rebase format compatibility ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=raw size=50331648
+qemu-img: Compression not supported for this file format
+
+=== Testing rebase with compression onto no backing file ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=50331648
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=50331648 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
+wrote 25165824/25165824 bytes at offset 0
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 25165824/25165824 bytes at offset 25165824
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 25165824/25165824 bytes at offset 0
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 25165824/25165824 bytes at offset 25165824
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+768/768 = 100.00% allocated, 100.00% fragmented, 100.00% compressed clusters
+Image end offset: 458752
+
+=== Testing rebase with compression with raw backing file ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=raw size=25165824
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=50331648 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=raw
+wrote 25165824/25165824 bytes at offset 0
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 25165824/25165824 bytes at offset 25165824
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 25165824/25165824 bytes at offset 0
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 25165824/25165824 bytes at offset 25165824
+24 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+768/768 = 100.00% allocated, 100.00% fragmented, 100.00% compressed clusters
+Image end offset: 458752
+
+=== Testing compressed rebase removing single delta from the chain ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=50331648
+Formatting 'TEST_DIR/t.IMGFMT.itmd', fmt=IMGFMT size=50331648 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=50331648 backing_file=TEST_DIR/t.IMGFMT.itmd backing_fmt=IMGFMT
+wrote 16777216/16777216 bytes at offset 0
+16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 16777216/16777216 bytes at offset 16777216
+16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 16777216/16777216 bytes at offset 33554432
+16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 16777216/16777216 bytes at offset 0
+16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 16777216/16777216 bytes at offset 16777216
+16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 16777216/16777216 bytes at offset 33554432
+16 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+512/768 = 66.67% allocated, 100.00% fragmented, 100.00% compressed clusters
+Image end offset: 458752
+
+=== Testing compressed rebase with unaligned unmerged data ===
+
+Formatting 'TEST_DIR/t.IMGFMT.base', fmt=IMGFMT size=65536
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=65536 backing_file=TEST_DIR/t.IMGFMT.base backing_fmt=IMGFMT
+wrote 32767/32767 bytes at offset 0
+31.999 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32767/32767 bytes at offset 0
+31.999 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+read 32769/32769 bytes at offset 32767
+32.001 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+No errors were found on the image.
+1/1 = 100.00% allocated, 100.00% fragmented, 100.00% compressed clusters
+Image end offset: 393216
+
+*** done
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 09/27] block: Fix locking in media change monitor commands
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (7 preceding siblings ...)
  2023-10-31 18:58 ` [PULL 08/27] iotests: add tests for "qemu-img rebase" with compression Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 10/27] iotests: Test media change with iothreads Kevin Wolf
                   ` (18 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

blk_insert_bs() requires that the caller holds the AioContext lock for
the node to be inserted. Since commit c066e808e11, neglecting to do so
causes a crash when the child has to be moved to a different AioContext
to attach it to the BlockBackend.

This fixes qmp_blockdev_insert_anon_medium(), which is called for the
QMP commands 'blockdev-insert-medium' and 'blockdev-change-medium', to
correctly take the lock.

Cc: qemu-stable@nongnu.org
Fixes: https://issues.redhat.com/browse/RHEL-3922
Fixes: c066e808e11a5c181b625537b6c78e0de27a4801
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231013153302.39234-2-kwolf@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qapi-sysemu.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c
index 3f614cbc04..1618cd225a 100644
--- a/block/qapi-sysemu.c
+++ b/block/qapi-sysemu.c
@@ -237,6 +237,7 @@ static void qmp_blockdev_insert_anon_medium(BlockBackend *blk,
                                             BlockDriverState *bs, Error **errp)
 {
     Error *local_err = NULL;
+    AioContext *ctx;
     bool has_device;
     int ret;
 
@@ -258,7 +259,11 @@ static void qmp_blockdev_insert_anon_medium(BlockBackend *blk,
         return;
     }
 
+    ctx = bdrv_get_aio_context(bs);
+    aio_context_acquire(ctx);
     ret = blk_insert_bs(blk, bs, errp);
+    aio_context_release(ctx);
+
     if (ret < 0) {
         return;
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 10/27] iotests: Test media change with iothreads
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (8 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 09/27] block: Fix locking in media change monitor commands Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 11/27] blockjob: drop AioContext lock before calling bdrv_graph_wrlock() Kevin Wolf
                   ` (17 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

iotests case 118 already tests all relevant operations for media change
with multiple devices, however never with iothreads. This changes the
test so that the virtio-scsi tests run with an iothread.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-ID: <20231013153302.39234-3-kwolf@redhat.com>
Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/118 | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/qemu-iotests/118 b/tests/qemu-iotests/118
index 10dc47459f..6a4210c219 100755
--- a/tests/qemu-iotests/118
+++ b/tests/qemu-iotests/118
@@ -277,7 +277,8 @@ class TestInitiallyFilled(GeneralChangeTestsBaseClass):
                                    'file.driver=file',
                                    'file.filename=%s' % old_img ])
         if self.interface == 'scsi':
-            self.vm.add_device('virtio-scsi-pci')
+            self.vm.add_object('iothread,id=iothread0')
+            self.vm.add_device('virtio-scsi-pci,iothread=iothread0')
         self.vm.add_device('%s,drive=drive0,id=%s' %
                            (interface_to_device_name(self.interface),
                             self.device_name))
@@ -312,7 +313,8 @@ class TestInitiallyEmpty(GeneralChangeTestsBaseClass):
         if self.use_drive:
             self.vm.add_drive(None, 'media=%s' % self.media, 'none')
         if self.interface == 'scsi':
-            self.vm.add_device('virtio-scsi-pci')
+            self.vm.add_object('iothread,id=iothread0')
+            self.vm.add_device('virtio-scsi-pci,iothread=iothread0')
         self.vm.add_device('%s,%sid=%s' %
                            (interface_to_device_name(self.interface),
                             'drive=drive0,' if self.use_drive else '',
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 11/27] blockjob: drop AioContext lock before calling bdrv_graph_wrlock()
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (9 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 10/27] iotests: Test media change with iothreads Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 12/27] block: avoid potential deadlock during bdrv_graph_wrlock() in bdrv_close() Kevin Wolf
                   ` (16 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

Same rationale as in 31b2ddfea3 ("graph-lock: Unlock the AioContext
while polling"). Otherwise, a deadlock can happen.

The alternative would be to pass a BlockDriverState along to
bdrv_graph_wrlock(), but there is no BlockDriverState readily
available and it's also better conceptually, because the lock is held
for the job.

The function is always called with the job's AioContext lock held, via
one of the .abort, .clean, .free or .prepare job driver functions.
Thus, it's safe to drop it.

While mirror_exit_common() does hold a second AioContext lock while
calling block_job_remove_all_bdrv(), that is for the main thread's
AioContext and does not need to be dropped (bdrv_graph_wrlock(bs) also
skips dropping the lock if bdrv_get_aio_context(bs) ==
qemu_get_aio_context()).

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20231019131936.414246-2-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 blockjob.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 807f992b59..953dc1b6dc 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -198,7 +198,9 @@ void block_job_remove_all_bdrv(BlockJob *job)
      * one to make sure that such a concurrent access does not attempt
      * to process an already freed BdrvChild.
      */
+    aio_context_release(job->job.aio_context);
     bdrv_graph_wrlock(NULL);
+    aio_context_acquire(job->job.aio_context);
     while (job->nodes) {
         GSList *l = job->nodes;
         BdrvChild *c = l->data;
-- 
2.41.0

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 12/27] block: avoid potential deadlock during bdrv_graph_wrlock() in bdrv_close()
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (10 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 11/27] blockjob: drop AioContext lock before calling bdrv_graph_wrlock() Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 13/27] blockdev: mirror: avoid potential deadlock when using iothread Kevin Wolf
                   ` (15 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

by passing the BlockDriverState along, so the held AioContext can be
dropped before polling. See commit 31b2ddfea3 ("graph-lock: Unlock the
AioContext while polling") which introduced this functionality for
more information.

The only way to reach bdrv_close() is via bdrv_unref() and for calling
that the BlockDriverState's AioContext lock is supposed to be held.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20231019131936.414246-3-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block.c b/block.c
index f9cf05ddcf..a527aa1a4c 100644
--- a/block.c
+++ b/block.c
@@ -5200,7 +5200,7 @@ static void bdrv_close(BlockDriverState *bs)
         bs->drv = NULL;
     }
 
-    bdrv_graph_wrlock(NULL);
+    bdrv_graph_wrlock(bs);
     QLIST_FOREACH_SAFE(child, &bs->children, next, next) {
         bdrv_unref_child(bs, child);
     }
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 13/27] blockdev: mirror: avoid potential deadlock when using iothread
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (11 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 12/27] block: avoid potential deadlock during bdrv_graph_wrlock() in bdrv_close() Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 14/27] block: rename blk_io_plug_call() API to defer_call() Kevin Wolf
                   ` (14 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

The bdrv_getlength() function is a generated co-wrapper and uses
AIO_WAIT_WHILE() to wait for the spawned coroutine. AIO_WAIT_WHILE()
expects the lock to be acquired exactly once.

Fix a case where it may be acquired twice. This can happen when the
source node is explicitly specified as the @replaces parameter or if the
source node is a filter node.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20231019131936.414246-4-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 blockdev.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index a01c62596b..877e3a26d4 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2968,6 +2968,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
 
     if (replaces) {
         BlockDriverState *to_replace_bs;
+        AioContext *aio_context;
         AioContext *replace_aio_context;
         int64_t bs_size, replace_size;
 
@@ -2982,10 +2983,19 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
             return;
         }
 
+        aio_context = bdrv_get_aio_context(bs);
         replace_aio_context = bdrv_get_aio_context(to_replace_bs);
-        aio_context_acquire(replace_aio_context);
+        /*
+         * bdrv_getlength() is a co-wrapper and uses AIO_WAIT_WHILE. Be sure not
+         * to acquire the same AioContext twice.
+         */
+        if (replace_aio_context != aio_context) {
+            aio_context_acquire(replace_aio_context);
+        }
         replace_size = bdrv_getlength(to_replace_bs);
-        aio_context_release(replace_aio_context);
+        if (replace_aio_context != aio_context) {
+            aio_context_release(replace_aio_context);
+        }
 
         if (replace_size < 0) {
             error_setg_errno(errp, -replace_size,
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 14/27] block: rename blk_io_plug_call() API to defer_call()
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (12 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 13/27] blockdev: mirror: avoid potential deadlock when using iothread Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 15/27] util/defer-call: move defer_call() to util/ Kevin Wolf
                   ` (13 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

Prepare to move the blk_io_plug_call() API out of the block layer so
that other subsystems call use this deferred call mechanism. Rename it
to defer_call() but leave the code in block/plug.c.

The next commit will move the code out of the block layer.

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Paul Durrant <paul@xen.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230913200045.1024233-2-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/sysemu/block-backend-io.h |   6 +-
 block/blkio.c                     |   8 +--
 block/io_uring.c                  |   4 +-
 block/linux-aio.c                 |   4 +-
 block/nvme.c                      |   4 +-
 block/plug.c                      | 109 +++++++++++++++---------------
 hw/block/dataplane/xen-block.c    |  10 +--
 hw/block/virtio-blk.c             |   4 +-
 hw/scsi/virtio-scsi.c             |   6 +-
 9 files changed, 76 insertions(+), 79 deletions(-)

diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
index be4dcef59d..cfcfd85c1d 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -100,9 +100,9 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 
-void blk_io_plug(void);
-void blk_io_unplug(void);
-void blk_io_plug_call(void (*fn)(void *), void *opaque);
+void defer_call_begin(void);
+void defer_call_end(void);
+void defer_call(void (*fn)(void *), void *opaque);
 
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
diff --git a/block/blkio.c b/block/blkio.c
index 1dd495617c..7cf6d61f47 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -312,10 +312,10 @@ static void blkio_detach_aio_context(BlockDriverState *bs)
 }
 
 /*
- * Called by blk_io_unplug() or immediately if not plugged. Called without
- * blkio_lock.
+ * Called by defer_call_end() or immediately if not in a deferred section.
+ * Called without blkio_lock.
  */
-static void blkio_unplug_fn(void *opaque)
+static void blkio_deferred_fn(void *opaque)
 {
     BDRVBlkioState *s = opaque;
 
@@ -332,7 +332,7 @@ static void blkio_submit_io(BlockDriverState *bs)
 {
     BDRVBlkioState *s = bs->opaque;
 
-    blk_io_plug_call(blkio_unplug_fn, s);
+    defer_call(blkio_deferred_fn, s);
 }
 
 static int coroutine_fn
diff --git a/block/io_uring.c b/block/io_uring.c
index 69d9820928..8429f341be 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -306,7 +306,7 @@ static void ioq_init(LuringQueue *io_q)
     io_q->blocked = false;
 }
 
-static void luring_unplug_fn(void *opaque)
+static void luring_deferred_fn(void *opaque)
 {
     LuringState *s = opaque;
     trace_luring_unplug_fn(s, s->io_q.blocked, s->io_q.in_queue,
@@ -367,7 +367,7 @@ static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
             return ret;
         }
 
-        blk_io_plug_call(luring_unplug_fn, s);
+        defer_call(luring_deferred_fn, s);
     }
     return 0;
 }
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 1a51503271..49a37174c2 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -353,7 +353,7 @@ static uint64_t laio_max_batch(LinuxAioState *s, uint64_t dev_max_batch)
     return max_batch;
 }
 
-static void laio_unplug_fn(void *opaque)
+static void laio_deferred_fn(void *opaque)
 {
     LinuxAioState *s = opaque;
 
@@ -393,7 +393,7 @@ static int laio_do_submit(int fd, struct qemu_laiocb *laiocb, off_t offset,
         if (s->io_q.in_queue >= laio_max_batch(s, dev_max_batch)) {
             ioq_submit(s);
         } else {
-            blk_io_plug_call(laio_unplug_fn, s);
+            defer_call(laio_deferred_fn, s);
         }
     }
 
diff --git a/block/nvme.c b/block/nvme.c
index b6e95f0b7e..dfbd1085fd 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -476,7 +476,7 @@ static void nvme_trace_command(const NvmeCmd *cmd)
     }
 }
 
-static void nvme_unplug_fn(void *opaque)
+static void nvme_deferred_fn(void *opaque)
 {
     NVMeQueuePair *q = opaque;
 
@@ -503,7 +503,7 @@ static void nvme_submit_command(NVMeQueuePair *q, NVMeRequest *req,
     q->need_kick++;
     qemu_mutex_unlock(&q->lock);
 
-    blk_io_plug_call(nvme_unplug_fn, q);
+    defer_call(nvme_deferred_fn, q);
 }
 
 static void nvme_admin_cmd_sync_cb(void *opaque, int ret)
diff --git a/block/plug.c b/block/plug.c
index 98a155d2f4..f26173559c 100644
--- a/block/plug.c
+++ b/block/plug.c
@@ -1,24 +1,21 @@
 /* SPDX-License-Identifier: GPL-2.0-or-later */
 /*
- * Block I/O plugging
+ * Deferred calls
  *
  * Copyright Red Hat.
  *
- * This API defers a function call within a blk_io_plug()/blk_io_unplug()
+ * This API defers a function call within a defer_call_begin()/defer_call_end()
  * section, allowing multiple calls to batch up. This is a performance
  * optimization that is used in the block layer to submit several I/O requests
  * at once instead of individually:
  *
- *   blk_io_plug(); <-- start of plugged region
+ *   defer_call_begin(); <-- start of section
  *   ...
- *   blk_io_plug_call(my_func, my_obj); <-- deferred my_func(my_obj) call
- *   blk_io_plug_call(my_func, my_obj); <-- another
- *   blk_io_plug_call(my_func, my_obj); <-- another
+ *   defer_call(my_func, my_obj); <-- deferred my_func(my_obj) call
+ *   defer_call(my_func, my_obj); <-- another
+ *   defer_call(my_func, my_obj); <-- another
  *   ...
- *   blk_io_unplug(); <-- end of plugged region, my_func(my_obj) is called once
- *
- * This code is actually generic and not tied to the block layer. If another
- * subsystem needs this functionality, it could be renamed.
+ *   defer_call_end(); <-- end of section, my_func(my_obj) is called once
  */
 
 #include "qemu/osdep.h"
@@ -27,66 +24,66 @@
 #include "qemu/thread.h"
 #include "sysemu/block-backend.h"
 
-/* A function call that has been deferred until unplug() */
+/* A function call that has been deferred until defer_call_end() */
 typedef struct {
     void (*fn)(void *);
     void *opaque;
-} UnplugFn;
+} DeferredCall;
 
 /* Per-thread state */
 typedef struct {
-    unsigned count;       /* how many times has plug() been called? */
-    GArray *unplug_fns;   /* functions to call at unplug time */
-} Plug;
+    unsigned nesting_level;
+    GArray *deferred_call_array;
+} DeferCallThreadState;
 
-/* Use get_ptr_plug() to fetch this thread-local value */
-QEMU_DEFINE_STATIC_CO_TLS(Plug, plug);
+/* Use get_ptr_defer_call_thread_state() to fetch this thread-local value */
+QEMU_DEFINE_STATIC_CO_TLS(DeferCallThreadState, defer_call_thread_state);
 
 /* Called at thread cleanup time */
-static void blk_io_plug_atexit(Notifier *n, void *value)
+static void defer_call_atexit(Notifier *n, void *value)
 {
-    Plug *plug = get_ptr_plug();
-    g_array_free(plug->unplug_fns, TRUE);
+    DeferCallThreadState *thread_state = get_ptr_defer_call_thread_state();
+    g_array_free(thread_state->deferred_call_array, TRUE);
 }
 
 /* This won't involve coroutines, so use __thread */
-static __thread Notifier blk_io_plug_atexit_notifier;
+static __thread Notifier defer_call_atexit_notifier;
 
 /**
- * blk_io_plug_call:
+ * defer_call:
  * @fn: a function pointer to be invoked
  * @opaque: a user-defined argument to @fn()
  *
- * Call @fn(@opaque) immediately if not within a blk_io_plug()/blk_io_unplug()
- * section.
+ * Call @fn(@opaque) immediately if not within a
+ * defer_call_begin()/defer_call_end() section.
  *
  * Otherwise defer the call until the end of the outermost
- * blk_io_plug()/blk_io_unplug() section in this thread. If the same
+ * defer_call_begin()/defer_call_end() section in this thread. If the same
  * @fn/@opaque pair has already been deferred, it will only be called once upon
- * blk_io_unplug() so that accumulated calls are batched into a single call.
+ * defer_call_end() so that accumulated calls are batched into a single call.
  *
  * The caller must ensure that @opaque is not freed before @fn() is invoked.
  */
-void blk_io_plug_call(void (*fn)(void *), void *opaque)
+void defer_call(void (*fn)(void *), void *opaque)
 {
-    Plug *plug = get_ptr_plug();
+    DeferCallThreadState *thread_state = get_ptr_defer_call_thread_state();
 
-    /* Call immediately if we're not plugged */
-    if (plug->count == 0) {
+    /* Call immediately if we're not deferring calls */
+    if (thread_state->nesting_level == 0) {
         fn(opaque);
         return;
     }
 
-    GArray *array = plug->unplug_fns;
+    GArray *array = thread_state->deferred_call_array;
     if (!array) {
-        array = g_array_new(FALSE, FALSE, sizeof(UnplugFn));
-        plug->unplug_fns = array;
-        blk_io_plug_atexit_notifier.notify = blk_io_plug_atexit;
-        qemu_thread_atexit_add(&blk_io_plug_atexit_notifier);
+        array = g_array_new(FALSE, FALSE, sizeof(DeferredCall));
+        thread_state->deferred_call_array = array;
+        defer_call_atexit_notifier.notify = defer_call_atexit;
+        qemu_thread_atexit_add(&defer_call_atexit_notifier);
     }
 
-    UnplugFn *fns = (UnplugFn *)array->data;
-    UnplugFn new_fn = {
+    DeferredCall *fns = (DeferredCall *)array->data;
+    DeferredCall new_fn = {
         .fn = fn,
         .opaque = opaque,
     };
@@ -106,46 +103,46 @@ void blk_io_plug_call(void (*fn)(void *), void *opaque)
 }
 
 /**
- * blk_io_plug: Defer blk_io_plug_call() functions until blk_io_unplug()
+ * defer_call_begin: Defer defer_call() functions until defer_call_end()
  *
- * blk_io_plug/unplug are thread-local operations. This means that multiple
- * threads can simultaneously call plug/unplug, but the caller must ensure that
- * each unplug() is called in the same thread of the matching plug().
+ * defer_call_begin() and defer_call_end() are thread-local operations. The
+ * caller must ensure that each defer_call_begin() has a matching
+ * defer_call_end() in the same thread.
  *
- * Nesting is supported. blk_io_plug_call() functions are only called at the
- * outermost blk_io_unplug().
+ * Nesting is supported. defer_call() functions are only called at the
+ * outermost defer_call_end().
  */
-void blk_io_plug(void)
+void defer_call_begin(void)
 {
-    Plug *plug = get_ptr_plug();
+    DeferCallThreadState *thread_state = get_ptr_defer_call_thread_state();
 
-    assert(plug->count < UINT32_MAX);
+    assert(thread_state->nesting_level < UINT32_MAX);
 
-    plug->count++;
+    thread_state->nesting_level++;
 }
 
 /**
- * blk_io_unplug: Run any pending blk_io_plug_call() functions
+ * defer_call_end: Run any pending defer_call() functions
  *
- * There must have been a matching blk_io_plug() call in the same thread prior
- * to this blk_io_unplug() call.
+ * There must have been a matching defer_call_begin() call in the same thread
+ * prior to this defer_call_end() call.
  */
-void blk_io_unplug(void)
+void defer_call_end(void)
 {
-    Plug *plug = get_ptr_plug();
+    DeferCallThreadState *thread_state = get_ptr_defer_call_thread_state();
 
-    assert(plug->count > 0);
+    assert(thread_state->nesting_level > 0);
 
-    if (--plug->count > 0) {
+    if (--thread_state->nesting_level > 0) {
         return;
     }
 
-    GArray *array = plug->unplug_fns;
+    GArray *array = thread_state->deferred_call_array;
     if (!array) {
         return;
     }
 
-    UnplugFn *fns = (UnplugFn *)array->data;
+    DeferredCall *fns = (DeferredCall *)array->data;
 
     for (guint i = 0; i < array->len; i++) {
         fns[i].fn(fns[i].opaque);
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index 3b6f2b0aa2..e9dd8f8a99 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -509,7 +509,7 @@ static int xen_block_get_request(XenBlockDataPlane *dataplane,
 
 /*
  * Threshold of in-flight requests above which we will start using
- * blk_io_plug()/blk_io_unplug() to batch requests.
+ * defer_call_begin()/defer_call_end() to batch requests.
  */
 #define IO_PLUG_THRESHOLD 1
 
@@ -537,7 +537,7 @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
      * is below us.
      */
     if (inflight_atstart > IO_PLUG_THRESHOLD) {
-        blk_io_plug();
+        defer_call_begin();
     }
     while (rc != rp) {
         /* pull request from ring */
@@ -577,12 +577,12 @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
 
         if (inflight_atstart > IO_PLUG_THRESHOLD &&
             batched >= inflight_atstart) {
-            blk_io_unplug();
+            defer_call_end();
         }
         xen_block_do_aio(request);
         if (inflight_atstart > IO_PLUG_THRESHOLD) {
             if (batched >= inflight_atstart) {
-                blk_io_plug();
+                defer_call_begin();
                 batched = 0;
             } else {
                 batched++;
@@ -590,7 +590,7 @@ static bool xen_block_handle_requests(XenBlockDataPlane *dataplane)
         }
     }
     if (inflight_atstart > IO_PLUG_THRESHOLD) {
-        blk_io_unplug();
+        defer_call_end();
     }
 
     return done_something;
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 39e7f23fab..6a45033d15 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1134,7 +1134,7 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
     bool suppress_notifications = virtio_queue_get_notification(vq);
 
     aio_context_acquire(blk_get_aio_context(s->blk));
-    blk_io_plug();
+    defer_call_begin();
 
     do {
         if (suppress_notifications) {
@@ -1158,7 +1158,7 @@ void virtio_blk_handle_vq(VirtIOBlock *s, VirtQueue *vq)
         virtio_blk_submit_multireq(s, &mrb);
     }
 
-    blk_io_unplug();
+    defer_call_end();
     aio_context_release(blk_get_aio_context(s->blk));
 }
 
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index fa53f0902c..42fcbcb45f 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -799,7 +799,7 @@ static int virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI *s, VirtIOSCSIReq *req)
         return -ENOBUFS;
     }
     scsi_req_ref(req->sreq);
-    blk_io_plug();
+    defer_call_begin();
     object_unref(OBJECT(d));
     return 0;
 }
@@ -810,7 +810,7 @@ static void virtio_scsi_handle_cmd_req_submit(VirtIOSCSI *s, VirtIOSCSIReq *req)
     if (scsi_req_enqueue(sreq)) {
         scsi_req_continue(sreq);
     }
-    blk_io_unplug();
+    defer_call_end();
     scsi_req_unref(sreq);
 }
 
@@ -836,7 +836,7 @@ static void virtio_scsi_handle_cmd_vq(VirtIOSCSI *s, VirtQueue *vq)
                 while (!QTAILQ_EMPTY(&reqs)) {
                     req = QTAILQ_FIRST(&reqs);
                     QTAILQ_REMOVE(&reqs, req, next);
-                    blk_io_unplug();
+                    defer_call_end();
                     scsi_req_unref(req->sreq);
                     virtqueue_detach_element(req->vq, &req->elem, 0);
                     virtio_scsi_free_req(req);
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 15/27] util/defer-call: move defer_call() to util/
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (13 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 14/27] block: rename blk_io_plug_call() API to defer_call() Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 16/27] virtio: use defer_call() in virtio_irqfd_notify() Kevin Wolf
                   ` (12 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

The networking subsystem may wish to use defer_call(), so move the code
to util/ where it can be reused.

As a reminder of what defer_call() does:

This API defers a function call within a defer_call_begin()/defer_call_end()
section, allowing multiple calls to batch up. This is a performance
optimization that is used in the block layer to submit several I/O requests
at once instead of individually:

  defer_call_begin(); <-- start of section
  ...
  defer_call(my_func, my_obj); <-- deferred my_func(my_obj) call
  defer_call(my_func, my_obj); <-- another
  defer_call(my_func, my_obj); <-- another
  ...
  defer_call_end(); <-- end of section, my_func(my_obj) is called once

Suggested-by: Ilya Maximets <i.maximets@ovn.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230913200045.1024233-3-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/qemu/defer-call.h         | 16 ++++++++++++++++
 include/sysemu/block-backend-io.h |  4 ----
 block/blkio.c                     |  1 +
 block/io_uring.c                  |  1 +
 block/linux-aio.c                 |  1 +
 block/nvme.c                      |  1 +
 hw/block/dataplane/xen-block.c    |  1 +
 hw/block/virtio-blk.c             |  1 +
 hw/scsi/virtio-scsi.c             |  1 +
 block/plug.c => util/defer-call.c |  2 +-
 MAINTAINERS                       |  3 ++-
 block/meson.build                 |  1 -
 util/meson.build                  |  1 +
 13 files changed, 27 insertions(+), 7 deletions(-)
 create mode 100644 include/qemu/defer-call.h
 rename block/plug.c => util/defer-call.c (99%)

diff --git a/include/qemu/defer-call.h b/include/qemu/defer-call.h
new file mode 100644
index 0000000000..e2c1d24572
--- /dev/null
+++ b/include/qemu/defer-call.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Deferred calls
+ *
+ * Copyright Red Hat.
+ */
+
+#ifndef QEMU_DEFER_CALL_H
+#define QEMU_DEFER_CALL_H
+
+/* See documentation in util/defer-call.c */
+void defer_call_begin(void);
+void defer_call_end(void);
+void defer_call(void (*fn)(void *), void *opaque);
+
+#endif /* QEMU_DEFER_CALL_H */
diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h
index cfcfd85c1d..d174275a5c 100644
--- a/include/sysemu/block-backend-io.h
+++ b/include/sysemu/block-backend-io.h
@@ -100,10 +100,6 @@ void blk_iostatus_set_err(BlockBackend *blk, int error);
 int blk_get_max_iov(BlockBackend *blk);
 int blk_get_max_hw_iov(BlockBackend *blk);
 
-void defer_call_begin(void);
-void defer_call_end(void);
-void defer_call(void (*fn)(void *), void *opaque);
-
 AioContext *blk_get_aio_context(BlockBackend *blk);
 BlockAcctStats *blk_get_stats(BlockBackend *blk);
 void *blk_aio_get(const AIOCBInfo *aiocb_info, BlockBackend *blk,
diff --git a/block/blkio.c b/block/blkio.c
index 7cf6d61f47..0a0a6c0f5f 100644
--- a/block/blkio.c
+++ b/block/blkio.c
@@ -13,6 +13,7 @@
 #include "block/block_int.h"
 #include "exec/memory.h"
 #include "exec/cpu-common.h" /* for qemu_ram_get_fd() */
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "qapi/qmp/qdict.h"
diff --git a/block/io_uring.c b/block/io_uring.c
index 8429f341be..3a1e1f45b3 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -15,6 +15,7 @@
 #include "block/block.h"
 #include "block/raw-aio.h"
 #include "qemu/coroutine.h"
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "sysemu/block-backend.h"
 #include "trace.h"
diff --git a/block/linux-aio.c b/block/linux-aio.c
index 49a37174c2..a2670b3e46 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -14,6 +14,7 @@
 #include "block/raw-aio.h"
 #include "qemu/event_notifier.h"
 #include "qemu/coroutine.h"
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "sysemu/block-backend.h"
 
diff --git a/block/nvme.c b/block/nvme.c
index dfbd1085fd..96b3f8f2fa 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -16,6 +16,7 @@
 #include "qapi/error.h"
 #include "qapi/qmp/qdict.h"
 #include "qapi/qmp/qstring.h"
+#include "qemu/defer-call.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c
index e9dd8f8a99..c4bb28c66f 100644
--- a/hw/block/dataplane/xen-block.c
+++ b/hw/block/dataplane/xen-block.c
@@ -19,6 +19,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/defer-call.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 #include "qemu/memalign.h"
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 6a45033d15..a1f8e15522 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -12,6 +12,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/defer-call.h"
 #include "qapi/error.h"
 #include "qemu/iov.h"
 #include "qemu/module.h"
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 42fcbcb45f..9c751bf296 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -18,6 +18,7 @@
 #include "standard-headers/linux/virtio_ids.h"
 #include "hw/virtio/virtio-scsi.h"
 #include "migration/qemu-file-types.h"
+#include "qemu/defer-call.h"
 #include "qemu/error-report.h"
 #include "qemu/iov.h"
 #include "qemu/module.h"
diff --git a/block/plug.c b/util/defer-call.c
similarity index 99%
rename from block/plug.c
rename to util/defer-call.c
index f26173559c..037dc0abf0 100644
--- a/block/plug.c
+++ b/util/defer-call.c
@@ -22,7 +22,7 @@
 #include "qemu/coroutine-tls.h"
 #include "qemu/notify.h"
 #include "qemu/thread.h"
-#include "sysemu/block-backend.h"
+#include "qemu/defer-call.h"
 
 /* A function call that has been deferred until defer_call_end() */
 typedef struct {
diff --git a/MAINTAINERS b/MAINTAINERS
index cd8d6b140f..018ed62560 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2755,12 +2755,13 @@ S: Supported
 F: util/async.c
 F: util/aio-*.c
 F: util/aio-*.h
+F: util/defer-call.c
 F: util/fdmon-*.c
 F: block/io.c
-F: block/plug.c
 F: migration/block*
 F: include/block/aio.h
 F: include/block/aio-wait.h
+F: include/qemu/defer-call.h
 F: scripts/qemugdb/aio.py
 F: tests/unit/test-fdmon-epoll.c
 T: git https://github.com/stefanha/qemu.git block
diff --git a/block/meson.build b/block/meson.build
index f351b9d0d3..59ff6d380c 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -21,7 +21,6 @@ block_ss.add(files(
   'mirror.c',
   'nbd.c',
   'null.c',
-  'plug.c',
   'preallocate.c',
   'progress_meter.c',
   'qapi.c',
diff --git a/util/meson.build b/util/meson.build
index c4827fd70a..769b24f2e0 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -28,6 +28,7 @@ util_ss.add(when: 'CONFIG_WIN32', if_true: pathcch)
 if glib_has_gslice
   util_ss.add(files('qtree.c'))
 endif
+util_ss.add(files('defer-call.c'))
 util_ss.add(files('envlist.c', 'path.c', 'module.c'))
 util_ss.add(files('host-utils.c'))
 util_ss.add(files('bitmap.c', 'bitops.c'))
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 16/27] virtio: use defer_call() in virtio_irqfd_notify()
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (14 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 15/27] util/defer-call: move defer_call() to util/ Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 17/27] virtio-blk: remove batch notification BH Kevin Wolf
                   ` (11 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

virtio-blk and virtio-scsi invoke virtio_irqfd_notify() to send Used
Buffer Notifications from an IOThread. This involves an eventfd
write(2) syscall. Calling this repeatedly when completing multiple I/O
requests in a row is wasteful.

Use the defer_call() API to batch together virtio_irqfd_notify() calls
made during thread pool (aio=threads), Linux AIO (aio=native), and
io_uring (aio=io_uring) completion processing.

Behavior is unchanged for emulated devices that do not use
defer_call_begin()/defer_call_end() since defer_call() immediately
invokes the callback when called outside a
defer_call_begin()/defer_call_end() region.

fio rw=randread bs=4k iodepth=64 numjobs=8 IOPS increases by ~9% with a
single IOThread and 8 vCPUs. iodepth=1 decreases by ~1% but this could
be noise. Detailed performance data and configuration specifics are
available here:
https://gitlab.com/stefanha/virt-playbooks/-/tree/blk_io_plug-irqfd

This duplicates the BH that virtio-blk uses for batching. The next
commit will remove it.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230913200045.1024233-4-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/io_uring.c       |  6 ++++++
 block/linux-aio.c      |  4 ++++
 hw/virtio/virtio.c     | 13 ++++++++++++-
 util/thread-pool.c     |  5 +++++
 hw/virtio/trace-events |  1 +
 5 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/block/io_uring.c b/block/io_uring.c
index 3a1e1f45b3..7cdd00e9f1 100644
--- a/block/io_uring.c
+++ b/block/io_uring.c
@@ -125,6 +125,9 @@ static void luring_process_completions(LuringState *s)
 {
     struct io_uring_cqe *cqes;
     int total_bytes;
+
+    defer_call_begin();
+
     /*
      * Request completion callbacks can run the nested event loop.
      * Schedule ourselves so the nested event loop will "see" remaining
@@ -217,7 +220,10 @@ end:
             aio_co_wake(luringcb->co);
         }
     }
+
     qemu_bh_cancel(s->completion_bh);
+
+    defer_call_end();
 }
 
 static int ioq_submit(LuringState *s)
diff --git a/block/linux-aio.c b/block/linux-aio.c
index a2670b3e46..ec05d946f3 100644
--- a/block/linux-aio.c
+++ b/block/linux-aio.c
@@ -205,6 +205,8 @@ static void qemu_laio_process_completions(LinuxAioState *s)
 {
     struct io_event *events;
 
+    defer_call_begin();
+
     /* Reschedule so nested event loops see currently pending completions */
     qemu_bh_schedule(s->completion_bh);
 
@@ -231,6 +233,8 @@ static void qemu_laio_process_completions(LinuxAioState *s)
      * own `for` loop.  If we are the last all counters dropped to zero. */
     s->event_max = 0;
     s->event_idx = 0;
+
+    defer_call_end();
 }
 
 static void qemu_laio_process_completions_and_submit(LinuxAioState *s)
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index fb24bc927b..e5105571cf 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -15,6 +15,7 @@
 #include "qapi/error.h"
 #include "qapi/qapi-commands-virtio.h"
 #include "trace.h"
+#include "qemu/defer-call.h"
 #include "qemu/error-report.h"
 #include "qemu/log.h"
 #include "qemu/main-loop.h"
@@ -2445,6 +2446,16 @@ static bool virtio_should_notify(VirtIODevice *vdev, VirtQueue *vq)
     }
 }
 
+/* Batch irqs while inside a defer_call_begin()/defer_call_end() section */
+static void virtio_notify_irqfd_deferred_fn(void *opaque)
+{
+    EventNotifier *notifier = opaque;
+    VirtQueue *vq = container_of(notifier, VirtQueue, guest_notifier);
+
+    trace_virtio_notify_irqfd_deferred_fn(vq->vdev, vq);
+    event_notifier_set(notifier);
+}
+
 void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
 {
     WITH_RCU_READ_LOCK_GUARD() {
@@ -2471,7 +2482,7 @@ void virtio_notify_irqfd(VirtIODevice *vdev, VirtQueue *vq)
      * to an atomic operation.
      */
     virtio_set_isr(vq->vdev, 0x1);
-    event_notifier_set(&vq->guest_notifier);
+    defer_call(virtio_notify_irqfd_deferred_fn, &vq->guest_notifier);
 }
 
 static void virtio_irq(VirtQueue *vq)
diff --git a/util/thread-pool.c b/util/thread-pool.c
index 22f9ba3286..27eb777e85 100644
--- a/util/thread-pool.c
+++ b/util/thread-pool.c
@@ -15,6 +15,7 @@
  * GNU GPL, version 2 or (at your option) any later version.
  */
 #include "qemu/osdep.h"
+#include "qemu/defer-call.h"
 #include "qemu/queue.h"
 #include "qemu/thread.h"
 #include "qemu/coroutine.h"
@@ -175,6 +176,8 @@ static void thread_pool_completion_bh(void *opaque)
     ThreadPool *pool = opaque;
     ThreadPoolElement *elem, *next;
 
+    defer_call_begin(); /* cb() may use defer_call() to coalesce work */
+
 restart:
     QLIST_FOREACH_SAFE(elem, &pool->head, all, next) {
         if (elem->state != THREAD_DONE) {
@@ -208,6 +211,8 @@ restart:
             qemu_aio_unref(elem);
         }
     }
+
+    defer_call_end();
 }
 
 static void thread_pool_cancel(BlockAIOCB *acb)
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 1cb9027d1e..0af7a2886c 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -73,6 +73,7 @@ virtqueue_fill(void *vq, const void *elem, unsigned int len, unsigned int idx) "
 virtqueue_flush(void *vq, unsigned int count) "vq %p count %u"
 virtqueue_pop(void *vq, void *elem, unsigned int in_num, unsigned int out_num) "vq %p elem %p in_num %u out_num %u"
 virtio_queue_notify(void *vdev, int n, void *vq) "vdev %p n %d vq %p"
+virtio_notify_irqfd_deferred_fn(void *vdev, void *vq) "vdev %p vq %p"
 virtio_notify_irqfd(void *vdev, void *vq) "vdev %p vq %p"
 virtio_notify(void *vdev, void *vq) "vdev %p vq %p"
 virtio_set_status(void *vdev, uint8_t val) "vdev %p val %u"
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 17/27] virtio-blk: remove batch notification BH
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (15 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 16/27] virtio: use defer_call() in virtio_irqfd_notify() Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 18/27] blockjob: introduce block-job-change QMP command Kevin Wolf
                   ` (10 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Stefan Hajnoczi <stefanha@redhat.com>

There is a batching mechanism for virtio-blk Used Buffer Notifications
that is no longer needed because the previous commit added batching to
virtio_notify_irqfd().

Note that this mechanism was rarely used in practice because it is only
enabled when EVENT_IDX is not negotiated by the driver. Modern drivers
enable EVENT_IDX.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-ID: <20230913200045.1024233-5-stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 hw/block/dataplane/virtio-blk.c | 48 +--------------------------------
 1 file changed, 1 insertion(+), 47 deletions(-)

diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index da36fcfd0b..f83bb0f116 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -31,9 +31,6 @@ struct VirtIOBlockDataPlane {
 
     VirtIOBlkConf *conf;
     VirtIODevice *vdev;
-    QEMUBH *bh;                     /* bh for guest notification */
-    unsigned long *batch_notify_vqs;
-    bool batch_notifications;
 
     /* Note that these EventNotifiers are assigned by value.  This is
      * fine as long as you do not call event_notifier_cleanup on them
@@ -47,36 +44,7 @@ struct VirtIOBlockDataPlane {
 /* Raise an interrupt to signal guest, if necessary */
 void virtio_blk_data_plane_notify(VirtIOBlockDataPlane *s, VirtQueue *vq)
 {
-    if (s->batch_notifications) {
-        set_bit(virtio_get_queue_index(vq), s->batch_notify_vqs);
-        qemu_bh_schedule(s->bh);
-    } else {
-        virtio_notify_irqfd(s->vdev, vq);
-    }
-}
-
-static void notify_guest_bh(void *opaque)
-{
-    VirtIOBlockDataPlane *s = opaque;
-    unsigned nvqs = s->conf->num_queues;
-    unsigned long bitmap[BITS_TO_LONGS(nvqs)];
-    unsigned j;
-
-    memcpy(bitmap, s->batch_notify_vqs, sizeof(bitmap));
-    memset(s->batch_notify_vqs, 0, sizeof(bitmap));
-
-    for (j = 0; j < nvqs; j += BITS_PER_LONG) {
-        unsigned long bits = bitmap[j / BITS_PER_LONG];
-
-        while (bits != 0) {
-            unsigned i = j + ctzl(bits);
-            VirtQueue *vq = virtio_get_queue(s->vdev, i);
-
-            virtio_notify_irqfd(s->vdev, vq);
-
-            bits &= bits - 1; /* clear right-most bit */
-        }
-    }
+    virtio_notify_irqfd(s->vdev, vq);
 }
 
 /* Context: QEMU global mutex held */
@@ -126,9 +94,6 @@ bool virtio_blk_data_plane_create(VirtIODevice *vdev, VirtIOBlkConf *conf,
     } else {
         s->ctx = qemu_get_aio_context();
     }
-    s->bh = aio_bh_new_guarded(s->ctx, notify_guest_bh, s,
-                               &DEVICE(vdev)->mem_reentrancy_guard);
-    s->batch_notify_vqs = bitmap_new(conf->num_queues);
 
     *dataplane = s;
 
@@ -146,8 +111,6 @@ void virtio_blk_data_plane_destroy(VirtIOBlockDataPlane *s)
 
     vblk = VIRTIO_BLK(s->vdev);
     assert(!vblk->dataplane_started);
-    g_free(s->batch_notify_vqs);
-    qemu_bh_delete(s->bh);
     if (s->iothread) {
         object_unref(OBJECT(s->iothread));
     }
@@ -173,12 +136,6 @@ int virtio_blk_data_plane_start(VirtIODevice *vdev)
 
     s->starting = true;
 
-    if (!virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
-        s->batch_notifications = true;
-    } else {
-        s->batch_notifications = false;
-    }
-
     /* Set up guest notifier (irq) */
     r = k->set_guest_notifiers(qbus->parent, nvqs, true);
     if (r != 0) {
@@ -370,9 +327,6 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
 
     aio_context_release(s->ctx);
 
-    qemu_bh_cancel(s->bh);
-    notify_guest_bh(s); /* final chance to notify guest */
-
     /* Clean up guest notifier (irq) */
     k->set_guest_notifiers(qbus->parent, nvqs, false);
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 18/27] blockjob: introduce block-job-change QMP command
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (16 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 17/27] virtio-blk: remove batch notification BH Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 19/27] block/mirror: set actively_synced even after the job is ready Kevin Wolf
                   ` (9 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

which will allow changing job-type-specific options after job
creation.

In the JobVerbTable, the same allow bits as for set-speed are used,
because set-speed can be considered an existing change command.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20231031135431.393137-2-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qapi/block-core.json         | 26 ++++++++++++++++++++++++++
 qapi/job.json                |  4 +++-
 include/block/blockjob.h     | 11 +++++++++++
 include/block/blockjob_int.h |  7 +++++++
 blockdev.c                   | 14 ++++++++++++++
 blockjob.c                   | 20 ++++++++++++++++++++
 job.c                        |  1 +
 7 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 89751d81f2..c6f31a9399 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3044,6 +3044,32 @@
 { 'command': 'block-job-finalize', 'data': { 'id': 'str' },
   'allow-preconfig': true }
 
+##
+# @BlockJobChangeOptions:
+#
+# Block job options that can be changed after job creation.
+#
+# @id: The job identifier
+#
+# @type: The job type
+#
+# Since 8.2
+##
+{ 'union': 'BlockJobChangeOptions',
+  'base': { 'id': 'str', 'type': 'JobType' },
+  'discriminator': 'type',
+  'data': {} }
+
+##
+# @block-job-change:
+#
+# Change the block job's options.
+#
+# Since: 8.2
+##
+{ 'command': 'block-job-change',
+  'data': 'BlockJobChangeOptions', 'boxed': true }
+
 ##
 # @BlockdevDiscardOptions:
 #
diff --git a/qapi/job.json b/qapi/job.json
index 7f0ba090de..b3957207a4 100644
--- a/qapi/job.json
+++ b/qapi/job.json
@@ -105,11 +105,13 @@
 #
 # @finalize: see @job-finalize
 #
+# @change: see @block-job-change (since 8.2)
+#
 # Since: 2.12
 ##
 { 'enum': 'JobVerb',
   'data': ['cancel', 'pause', 'resume', 'set-speed', 'complete', 'dismiss',
-           'finalize' ] }
+           'finalize', 'change' ] }
 
 ##
 # @JOB_STATUS_CHANGE:
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 058b0c824c..95854f1477 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -172,6 +172,17 @@ bool block_job_has_bdrv(BlockJob *job, BlockDriverState *bs);
  */
 bool block_job_set_speed_locked(BlockJob *job, int64_t speed, Error **errp);
 
+/**
+ * block_job_change_locked:
+ * @job: The job to change.
+ * @opts: The new options.
+ * @errp: Error object.
+ *
+ * Change the job according to opts.
+ */
+void block_job_change_locked(BlockJob *job, BlockJobChangeOptions *opts,
+                             Error **errp);
+
 /**
  * block_job_query_locked:
  * @job: The job to get information about.
diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index 104824040c..a4656d4cb5 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -67,6 +67,13 @@ struct BlockJobDriver {
     void (*attached_aio_context)(BlockJob *job, AioContext *new_context);
 
     void (*set_speed)(BlockJob *job, int64_t speed);
+
+    /*
+     * Change the @job's options according to @opts.
+     *
+     * Note that this can already be called before the job coroutine is running.
+     */
+    void (*change)(BlockJob *job, BlockJobChangeOptions *opts, Error **errp);
 };
 
 /*
diff --git a/blockdev.c b/blockdev.c
index 877e3a26d4..1517dc6210 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3392,6 +3392,20 @@ void qmp_block_job_dismiss(const char *id, Error **errp)
     job_dismiss_locked(&job, errp);
 }
 
+void qmp_block_job_change(BlockJobChangeOptions *opts, Error **errp)
+{
+    BlockJob *job;
+
+    JOB_LOCK_GUARD();
+    job = find_block_job_locked(opts->id, errp);
+
+    if (!job) {
+        return;
+    }
+
+    block_job_change_locked(job, opts, errp);
+}
+
 void qmp_change_backing_file(const char *device,
                              const char *image_node_name,
                              const char *backing_file,
diff --git a/blockjob.c b/blockjob.c
index 953dc1b6dc..f0505ad232 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -330,6 +330,26 @@ static bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
     return block_job_set_speed_locked(job, speed, errp);
 }
 
+void block_job_change_locked(BlockJob *job, BlockJobChangeOptions *opts,
+                             Error **errp)
+{
+    const BlockJobDriver *drv = block_job_driver(job);
+
+    GLOBAL_STATE_CODE();
+
+    if (job_apply_verb_locked(&job->job, JOB_VERB_CHANGE, errp)) {
+        return;
+    }
+
+    if (drv->change) {
+        job_unlock();
+        drv->change(job, opts, errp);
+        job_lock();
+    } else {
+        error_setg(errp, "Job type does not support change");
+    }
+}
+
 void block_job_ratelimit_processed_bytes(BlockJob *job, uint64_t n)
 {
     IO_CODE();
diff --git a/job.c b/job.c
index 72d57f0934..99a2e54b54 100644
--- a/job.c
+++ b/job.c
@@ -80,6 +80,7 @@ bool JobVerbTable[JOB_VERB__MAX][JOB_STATUS__MAX] = {
     [JOB_VERB_COMPLETE]             = {0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0},
     [JOB_VERB_FINALIZE]             = {0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0},
     [JOB_VERB_DISMISS]              = {0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0},
+    [JOB_VERB_CHANGE]               = {0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0},
 };
 
 /* Transactional group of jobs */
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 19/27] block/mirror: set actively_synced even after the job is ready
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (17 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 18/27] blockjob: introduce block-job-change QMP command Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 20/27] block/mirror: move dirty bitmap to filter Kevin Wolf
                   ` (8 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

In preparation to allow switching from background to active mode. This
ensures that setting actively_synced will not be missed when the
switch happens after the job is ready.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20231031135431.393137-3-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/mirror.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index dcd88de2e3..1c2c00ee1d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1074,9 +1074,9 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
                  * the target in a consistent state.
                  */
                 job_transition_to_ready(&s->common.job);
-                if (s->copy_mode != MIRROR_COPY_MODE_BACKGROUND) {
-                    s->actively_synced = true;
-                }
+            }
+            if (s->copy_mode != MIRROR_COPY_MODE_BACKGROUND) {
+                s->actively_synced = true;
             }
 
             should_complete = s->should_complete ||
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 20/27] block/mirror: move dirty bitmap to filter
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (18 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 19/27] block/mirror: set actively_synced even after the job is ready Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 21/27] block/mirror: determine copy_to_target only once Kevin Wolf
                   ` (7 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

In preparation to allow switching to active mode without draining.
Initialization of the bitmap in mirror_dirty_init() still happens with
the original/backing BlockDriverState, which should be fine, because
the mirror top has the same length.

Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20231031135431.393137-4-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/mirror.c | 17 +++++++++++++----
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 1c2c00ee1d..914d723446 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1500,6 +1500,11 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod method,
         abort();
     }
 
+    if (!copy_to_target && s->job && s->job->dirty_bitmap) {
+        s->job->actively_synced = false;
+        bdrv_set_dirty_bitmap(s->job->dirty_bitmap, offset, bytes);
+    }
+
     if (ret < 0) {
         goto out;
     }
@@ -1823,13 +1828,17 @@ static BlockJob *mirror_start_job(
         s->should_complete = true;
     }
 
-    s->dirty_bitmap = bdrv_create_dirty_bitmap(bs, granularity, NULL, errp);
+    s->dirty_bitmap = bdrv_create_dirty_bitmap(s->mirror_top_bs, granularity,
+                                               NULL, errp);
     if (!s->dirty_bitmap) {
         goto fail;
     }
-    if (s->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING) {
-        bdrv_disable_dirty_bitmap(s->dirty_bitmap);
-    }
+
+    /*
+     * The dirty bitmap is set by bdrv_mirror_top_do_write() when not in active
+     * mode.
+     */
+    bdrv_disable_dirty_bitmap(s->dirty_bitmap);
 
     ret = block_job_add_bdrv(&s->common, "source", bs, 0,
                              BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE |
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 21/27] block/mirror: determine copy_to_target only once
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (19 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 20/27] block/mirror: move dirty bitmap to filter Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 22/27] mirror: implement mirror_change method Kevin Wolf
                   ` (6 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

In preparation to allow changing the copy_mode via QMP. When running
in an iothread, it could be that copy_mode is changed from the main
thread in between reading copy_mode in bdrv_mirror_top_pwritev() and
reading copy_mode in bdrv_mirror_top_do_write(), so they might end up
disagreeing about whether copy_to_target is true or false. Avoid that
scenario by determining copy_to_target only once and passing it to
bdrv_mirror_top_do_write() as an argument.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20231031135431.393137-5-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/mirror.c | 41 ++++++++++++++++++-----------------------
 1 file changed, 18 insertions(+), 23 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index 914d723446..31da1526eb 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1463,21 +1463,21 @@ bdrv_mirror_top_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
     return bdrv_co_preadv(bs->backing, offset, bytes, qiov, flags);
 }
 
+static bool should_copy_to_target(MirrorBDSOpaque *s)
+{
+    return s->job && s->job->ret >= 0 &&
+        !job_is_cancelled(&s->job->common.job) &&
+        s->job->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING;
+}
+
 static int coroutine_fn GRAPH_RDLOCK
 bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod method,
-                         uint64_t offset, uint64_t bytes, QEMUIOVector *qiov,
-                         int flags)
+                         bool copy_to_target, uint64_t offset, uint64_t bytes,
+                         QEMUIOVector *qiov, int flags)
 {
     MirrorOp *op = NULL;
     MirrorBDSOpaque *s = bs->opaque;
     int ret = 0;
-    bool copy_to_target = false;
-
-    if (s->job) {
-        copy_to_target = s->job->ret >= 0 &&
-                         !job_is_cancelled(&s->job->common.job) &&
-                         s->job->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING;
-    }
 
     if (copy_to_target) {
         op = active_write_prepare(s->job, offset, bytes);
@@ -1524,17 +1524,10 @@ static int coroutine_fn GRAPH_RDLOCK
 bdrv_mirror_top_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
                         QEMUIOVector *qiov, BdrvRequestFlags flags)
 {
-    MirrorBDSOpaque *s = bs->opaque;
     QEMUIOVector bounce_qiov;
     void *bounce_buf;
     int ret = 0;
-    bool copy_to_target = false;
-
-    if (s->job) {
-        copy_to_target = s->job->ret >= 0 &&
-                         !job_is_cancelled(&s->job->common.job) &&
-                         s->job->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING;
-    }
+    bool copy_to_target = should_copy_to_target(bs->opaque);
 
     if (copy_to_target) {
         /* The guest might concurrently modify the data to write; but
@@ -1551,8 +1544,8 @@ bdrv_mirror_top_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
         flags &= ~BDRV_REQ_REGISTERED_BUF;
     }
 
-    ret = bdrv_mirror_top_do_write(bs, MIRROR_METHOD_COPY, offset, bytes, qiov,
-                                   flags);
+    ret = bdrv_mirror_top_do_write(bs, MIRROR_METHOD_COPY, copy_to_target,
+                                   offset, bytes, qiov, flags);
 
     if (copy_to_target) {
         qemu_iovec_destroy(&bounce_qiov);
@@ -1575,15 +1568,17 @@ static int coroutine_fn GRAPH_RDLOCK
 bdrv_mirror_top_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
                               int64_t bytes, BdrvRequestFlags flags)
 {
-    return bdrv_mirror_top_do_write(bs, MIRROR_METHOD_ZERO, offset, bytes, NULL,
-                                    flags);
+    bool copy_to_target = should_copy_to_target(bs->opaque);
+    return bdrv_mirror_top_do_write(bs, MIRROR_METHOD_ZERO, copy_to_target,
+                                    offset, bytes, NULL, flags);
 }
 
 static int coroutine_fn GRAPH_RDLOCK
 bdrv_mirror_top_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
 {
-    return bdrv_mirror_top_do_write(bs, MIRROR_METHOD_DISCARD, offset, bytes,
-                                    NULL, 0);
+    bool copy_to_target = should_copy_to_target(bs->opaque);
+    return bdrv_mirror_top_do_write(bs, MIRROR_METHOD_DISCARD, copy_to_target,
+                                    offset, bytes, NULL, 0);
 }
 
 static void bdrv_mirror_top_refresh_filename(BlockDriverState *bs)
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 22/27] mirror: implement mirror_change method
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (20 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 21/27] block/mirror: determine copy_to_target only once Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 23/27] qapi/block-core: use JobType for BlockJobInfo's type Kevin Wolf
                   ` (5 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

which allows switching the @copy-mode from 'background' to
'write-blocking'.

This is useful for management applications, so they can start out in
background mode to avoid limiting guest write speed and switch to
active mode when certain criteria are fulfilled.

In presence of an iothread, the copy_mode member is now shared between
the iothread and the main thread, so turn accesses to it atomic.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20231031135431.393137-6-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qapi/block-core.json | 13 ++++++++++++-
 block/mirror.c       | 44 +++++++++++++++++++++++++++++++++++++++++---
 2 files changed, 53 insertions(+), 4 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index c6f31a9399..6369207be2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3044,6 +3044,17 @@
 { 'command': 'block-job-finalize', 'data': { 'id': 'str' },
   'allow-preconfig': true }
 
+##
+# @BlockJobChangeOptionsMirror:
+#
+# @copy-mode: Switch to this copy mode.  Currently, only the switch
+#     from 'background' to 'write-blocking' is implemented.
+#
+# Since: 8.2
+##
+{ 'struct': 'BlockJobChangeOptionsMirror',
+  'data': { 'copy-mode' : 'MirrorCopyMode' } }
+
 ##
 # @BlockJobChangeOptions:
 #
@@ -3058,7 +3069,7 @@
 { 'union': 'BlockJobChangeOptions',
   'base': { 'id': 'str', 'type': 'JobType' },
   'discriminator': 'type',
-  'data': {} }
+  'data': { 'mirror': 'BlockJobChangeOptionsMirror' } }
 
 ##
 # @block-job-change:
diff --git a/block/mirror.c b/block/mirror.c
index 31da1526eb..4016d89253 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -55,6 +55,10 @@ typedef struct MirrorBlockJob {
     BlockMirrorBackingMode backing_mode;
     /* Whether the target image requires explicit zero-initialization */
     bool zero_target;
+    /*
+     * To be accesssed with atomics. Written only under the BQL (required by the
+     * current implementation of mirror_change()).
+     */
     MirrorCopyMode copy_mode;
     BlockdevOnError on_source_error, on_target_error;
     /* Set when the target is synced (dirty bitmap is clean, nothing
@@ -1075,7 +1079,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
                  */
                 job_transition_to_ready(&s->common.job);
             }
-            if (s->copy_mode != MIRROR_COPY_MODE_BACKGROUND) {
+            if (qatomic_read(&s->copy_mode) != MIRROR_COPY_MODE_BACKGROUND) {
                 s->actively_synced = true;
             }
 
@@ -1246,6 +1250,39 @@ static bool commit_active_cancel(Job *job, bool force)
     return force || !job_is_ready(job);
 }
 
+static void mirror_change(BlockJob *job, BlockJobChangeOptions *opts,
+                          Error **errp)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+    BlockJobChangeOptionsMirror *change_opts = &opts->u.mirror;
+    MirrorCopyMode current;
+
+    /*
+     * The implementation relies on the fact that copy_mode is only written
+     * under the BQL. Otherwise, further synchronization would be required.
+     */
+
+    GLOBAL_STATE_CODE();
+
+    if (qatomic_read(&s->copy_mode) == change_opts->copy_mode) {
+        return;
+    }
+
+    if (change_opts->copy_mode != MIRROR_COPY_MODE_WRITE_BLOCKING) {
+        error_setg(errp, "Change to copy mode '%s' is not implemented",
+                   MirrorCopyMode_str(change_opts->copy_mode));
+        return;
+    }
+
+    current = qatomic_cmpxchg(&s->copy_mode, MIRROR_COPY_MODE_BACKGROUND,
+                              change_opts->copy_mode);
+    if (current != MIRROR_COPY_MODE_BACKGROUND) {
+        error_setg(errp, "Expected current copy mode '%s', got '%s'",
+                   MirrorCopyMode_str(MIRROR_COPY_MODE_BACKGROUND),
+                   MirrorCopyMode_str(current));
+    }
+}
+
 static const BlockJobDriver mirror_job_driver = {
     .job_driver = {
         .instance_size          = sizeof(MirrorBlockJob),
@@ -1260,6 +1297,7 @@ static const BlockJobDriver mirror_job_driver = {
         .cancel                 = mirror_cancel,
     },
     .drained_poll           = mirror_drained_poll,
+    .change                 = mirror_change,
 };
 
 static const BlockJobDriver commit_active_job_driver = {
@@ -1467,7 +1505,7 @@ static bool should_copy_to_target(MirrorBDSOpaque *s)
 {
     return s->job && s->job->ret >= 0 &&
         !job_is_cancelled(&s->job->common.job) &&
-        s->job->copy_mode == MIRROR_COPY_MODE_WRITE_BLOCKING;
+        qatomic_read(&s->job->copy_mode) == MIRROR_COPY_MODE_WRITE_BLOCKING;
 }
 
 static int coroutine_fn GRAPH_RDLOCK
@@ -1813,7 +1851,7 @@ static BlockJob *mirror_start_job(
     s->is_none_mode = is_none_mode;
     s->backing_mode = backing_mode;
     s->zero_target = zero_target;
-    s->copy_mode = copy_mode;
+    qatomic_set(&s->copy_mode, copy_mode);
     s->base = base;
     s->base_overlay = bdrv_find_overlay(bs, base);
     s->granularity = granularity;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 23/27] qapi/block-core: use JobType for BlockJobInfo's type
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (21 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 22/27] mirror: implement mirror_change method Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 24/27] qapi/block-core: turn BlockJobInfo into a union Kevin Wolf
                   ` (4 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

In preparation to turn BlockJobInfo into a union with @type as the
discriminator. That requires it to be an enum. Even without that
requirement, it's nicer to have an enum instead of a str here.

No functional change is intended.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-ID: <20231031135431.393137-7-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qapi/block-core.json           | 2 +-
 block/monitor/block-hmp-cmds.c | 4 ++--
 blockjob.c                     | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6369207be2..9d03210664 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1396,7 +1396,7 @@
 # Since: 1.1
 ##
 { 'struct': 'BlockJobInfo',
-  'data': {'type': 'str', 'device': 'str', 'len': 'int',
+  'data': {'type': 'JobType', 'device': 'str', 'len': 'int',
            'offset': 'int', 'busy': 'bool', 'paused': 'bool', 'speed': 'int',
            'io-status': 'BlockDeviceIoStatus', 'ready': 'bool',
            'status': 'JobStatus',
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 7645c7e5fb..5b2c597e7a 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -846,7 +846,7 @@ void hmp_info_block_jobs(Monitor *mon, const QDict *qdict)
     }
 
     while (list) {
-        if (strcmp(list->value->type, "stream") == 0) {
+        if (list->value->type == JOB_TYPE_STREAM) {
             monitor_printf(mon, "Streaming device %s: Completed %" PRId64
                            " of %" PRId64 " bytes, speed limit %" PRId64
                            " bytes/s\n",
@@ -858,7 +858,7 @@ void hmp_info_block_jobs(Monitor *mon, const QDict *qdict)
             monitor_printf(mon, "Type %s, device %s: Completed %" PRId64
                            " of %" PRId64 " bytes, speed limit %" PRId64
                            " bytes/s\n",
-                           list->value->type,
+                           JobType_str(list->value->type),
                            list->value->device,
                            list->value->offset,
                            list->value->len,
diff --git a/blockjob.c b/blockjob.c
index f0505ad232..5b4786a70f 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -390,7 +390,7 @@ BlockJobInfo *block_job_query_locked(BlockJob *job, Error **errp)
                           &progress_total);
 
     info = g_new0(BlockJobInfo, 1);
-    info->type      = g_strdup(job_type_str(&job->job));
+    info->type      = job_type(&job->job);
     info->device    = g_strdup(job->job.id);
     info->busy      = job->job.busy;
     info->paused    = job->job.pause_count > 0;
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 24/27] qapi/block-core: turn BlockJobInfo into a union
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (22 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 23/27] qapi/block-core: use JobType for BlockJobInfo's type Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 25/27] blockjob: query driver-specific info via a new 'query' driver method Kevin Wolf
                   ` (3 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

In preparation to additionally return job-type-specific information.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Message-ID: <20231031135431.393137-8-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qapi/block-core.json | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9d03210664..dca0e94bb0 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1395,13 +1395,15 @@
 #
 # Since: 1.1
 ##
-{ 'struct': 'BlockJobInfo',
-  'data': {'type': 'JobType', 'device': 'str', 'len': 'int',
+{ 'union': 'BlockJobInfo',
+  'base': {'type': 'JobType', 'device': 'str', 'len': 'int',
            'offset': 'int', 'busy': 'bool', 'paused': 'bool', 'speed': 'int',
            'io-status': 'BlockDeviceIoStatus', 'ready': 'bool',
            'status': 'JobStatus',
            'auto-finalize': 'bool', 'auto-dismiss': 'bool',
-           '*error': 'str' } }
+           '*error': 'str' },
+  'discriminator': 'type',
+  'data': {} }
 
 ##
 # @query-block-jobs:
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 25/27] blockjob: query driver-specific info via a new 'query' driver method
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (23 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 24/27] qapi/block-core: turn BlockJobInfo into a union Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 26/27] mirror: return mirror-specific information upon query Kevin Wolf
                   ` (2 subsequent siblings)
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20231031135431.393137-9-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/blockjob_int.h | 5 +++++
 blockjob.c                   | 6 ++++++
 2 files changed, 11 insertions(+)

diff --git a/include/block/blockjob_int.h b/include/block/blockjob_int.h
index a4656d4cb5..18ee6f7bf0 100644
--- a/include/block/blockjob_int.h
+++ b/include/block/blockjob_int.h
@@ -74,6 +74,11 @@ struct BlockJobDriver {
      * Note that this can already be called before the job coroutine is running.
      */
     void (*change)(BlockJob *job, BlockJobChangeOptions *opts, Error **errp);
+
+    /*
+     * Query information specific to this kind of block job.
+     */
+    void (*query)(BlockJob *job, BlockJobInfo *info);
 };
 
 /*
diff --git a/blockjob.c b/blockjob.c
index 5b4786a70f..5b24de356d 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -378,6 +378,7 @@ BlockJobInfo *block_job_query_locked(BlockJob *job, Error **errp)
 {
     BlockJobInfo *info;
     uint64_t progress_current, progress_total;
+    const BlockJobDriver *drv = block_job_driver(job);
 
     GLOBAL_STATE_CODE();
 
@@ -407,6 +408,11 @@ BlockJobInfo *block_job_query_locked(BlockJob *job, Error **errp)
                         g_strdup(error_get_pretty(job->job.err)) :
                         g_strdup(strerror(-job->job.ret));
     }
+    if (drv->query) {
+        job_unlock();
+        drv->query(job, info);
+        job_lock();
+    }
     return info;
 }
 
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 26/27] mirror: return mirror-specific information upon query
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (24 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 25/27] blockjob: query driver-specific info via a new 'query' driver method Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 18:59 ` [PULL 27/27] iotests: add test for changing mirror's copy_mode Kevin Wolf
  2023-10-31 23:31 ` [PULL 00/27] Block layer patches Stefan Hajnoczi
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

To start out, only actively-synced is returned.

For example, this is useful for jobs that started out in background
mode and switched to active mode. Once actively-synced is true, it's
clear that the mode switch has been completed. Note that completion of
the switch might happen much earlier, e.g. if the switch happens
before the job is ready, once all background operations have finished.
It's assumed that whether the disks are actively-synced or not is more
interesting than whether the mode switch completed. That information
can still be added if required in the future.

In presence of an iothread, the actively_synced member is now shared
between the iothread and the main thread, so turn accesses to it
atomic.

Requires to adapt the output for iotest 109.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20231031135431.393137-10-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 qapi/block-core.json       | 16 +++++++++++++++-
 block/mirror.c             | 31 +++++++++++++++++++++++--------
 tests/qemu-iotests/109.out | 24 ++++++++++++------------
 3 files changed, 50 insertions(+), 21 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index dca0e94bb0..99961256f2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1352,6 +1352,20 @@
 { 'enum': 'MirrorCopyMode',
   'data': ['background', 'write-blocking'] }
 
+##
+# @BlockJobInfoMirror:
+#
+# Information specific to mirror block jobs.
+#
+# @actively-synced: Whether the source is actively synced to the
+#     target, i.e. same data and new writes are done synchronously to
+#     both.
+#
+# Since 8.2
+##
+{ 'struct': 'BlockJobInfoMirror',
+  'data': { 'actively-synced': 'bool' } }
+
 ##
 # @BlockJobInfo:
 #
@@ -1403,7 +1417,7 @@
            'auto-finalize': 'bool', 'auto-dismiss': 'bool',
            '*error': 'str' },
   'discriminator': 'type',
-  'data': {} }
+  'data': { 'mirror': 'BlockJobInfoMirror' } }
 
 ##
 # @query-block-jobs:
diff --git a/block/mirror.c b/block/mirror.c
index 4016d89253..c839542774 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -61,8 +61,12 @@ typedef struct MirrorBlockJob {
      */
     MirrorCopyMode copy_mode;
     BlockdevOnError on_source_error, on_target_error;
-    /* Set when the target is synced (dirty bitmap is clean, nothing
-     * in flight) and the job is running in active mode */
+    /*
+     * To be accessed with atomics.
+     *
+     * Set when the target is synced (dirty bitmap is clean, nothing in flight)
+     * and the job is running in active mode.
+     */
     bool actively_synced;
     bool should_complete;
     int64_t granularity;
@@ -126,7 +130,7 @@ typedef enum MirrorMethod {
 static BlockErrorAction mirror_error_action(MirrorBlockJob *s, bool read,
                                             int error)
 {
-    s->actively_synced = false;
+    qatomic_set(&s->actively_synced, false);
     if (read) {
         return block_job_error_action(&s->common, s->on_source_error,
                                       true, error);
@@ -966,7 +970,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
     if (s->bdev_length == 0) {
         /* Transition to the READY state and wait for complete. */
         job_transition_to_ready(&s->common.job);
-        s->actively_synced = true;
+        qatomic_set(&s->actively_synced, true);
         while (!job_cancel_requested(&s->common.job) && !s->should_complete) {
             job_yield(&s->common.job);
         }
@@ -1080,7 +1084,7 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
                 job_transition_to_ready(&s->common.job);
             }
             if (qatomic_read(&s->copy_mode) != MIRROR_COPY_MODE_BACKGROUND) {
-                s->actively_synced = true;
+                qatomic_set(&s->actively_synced, true);
             }
 
             should_complete = s->should_complete ||
@@ -1283,6 +1287,15 @@ static void mirror_change(BlockJob *job, BlockJobChangeOptions *opts,
     }
 }
 
+static void mirror_query(BlockJob *job, BlockJobInfo *info)
+{
+    MirrorBlockJob *s = container_of(job, MirrorBlockJob, common);
+
+    info->u.mirror = (BlockJobInfoMirror) {
+        .actively_synced = qatomic_read(&s->actively_synced),
+    };
+}
+
 static const BlockJobDriver mirror_job_driver = {
     .job_driver = {
         .instance_size          = sizeof(MirrorBlockJob),
@@ -1298,6 +1311,7 @@ static const BlockJobDriver mirror_job_driver = {
     },
     .drained_poll           = mirror_drained_poll,
     .change                 = mirror_change,
+    .query                  = mirror_query,
 };
 
 static const BlockJobDriver commit_active_job_driver = {
@@ -1416,7 +1430,7 @@ do_sync_target_write(MirrorBlockJob *job, MirrorMethod method,
         bitmap_end = QEMU_ALIGN_UP(offset + bytes, job->granularity);
         bdrv_set_dirty_bitmap(job->dirty_bitmap, bitmap_offset,
                               bitmap_end - bitmap_offset);
-        job->actively_synced = false;
+        qatomic_set(&job->actively_synced, false);
 
         action = mirror_error_action(job, false, -ret);
         if (action == BLOCK_ERROR_ACTION_REPORT) {
@@ -1475,7 +1489,8 @@ static void coroutine_fn GRAPH_RDLOCK active_write_settle(MirrorOp *op)
     uint64_t end_chunk = DIV_ROUND_UP(op->offset + op->bytes,
                                       op->s->granularity);
 
-    if (!--op->s->in_active_write_counter && op->s->actively_synced) {
+    if (!--op->s->in_active_write_counter &&
+        qatomic_read(&op->s->actively_synced)) {
         BdrvChild *source = op->s->mirror_top_bs->backing;
 
         if (QLIST_FIRST(&source->bs->parents) == source &&
@@ -1539,7 +1554,7 @@ bdrv_mirror_top_do_write(BlockDriverState *bs, MirrorMethod method,
     }
 
     if (!copy_to_target && s->job && s->job->dirty_bitmap) {
-        s->job->actively_synced = false;
+        qatomic_set(&s->job->actively_synced, false);
         bdrv_set_dirty_bitmap(s->job->dirty_bitmap, offset, bytes);
     }
 
diff --git a/tests/qemu-iotests/109.out b/tests/qemu-iotests/109.out
index 2611d6a40f..965c9a6a0a 100644
--- a/tests/qemu-iotests/109.out
+++ b/tests/qemu-iotests/109.out
@@ -38,7 +38,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 1024, "offset": 1024, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 1024, "offset": 1024, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 1024, "offset": 1024, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", "actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
@@ -90,7 +90,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 197120, "offset": 197120, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 197120, "offset": 197120, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 197120, "offset": 197120, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", "actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
@@ -142,7 +142,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 327680, "offset": 327680, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 327680, "offset": 327680, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 327680, "offset": 327680, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", "actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
@@ -194,7 +194,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 1024, "offset": 1024, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 1024, "offset": 1024, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 1024, "offset": 1024, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", "actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
@@ -246,7 +246,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 65536, "offset": 65536, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 65536, "offset": 65536, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 65536, "offset": 65536, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", "actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
@@ -298,7 +298,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 2560, "offset": 2560, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 2560, "offset": 2560, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 2560, "offset": 2560, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", "actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
@@ -349,7 +349,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 2560, "offset": 2560, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 2560, "offset": 2560, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 2560, "offset": 2560, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", "actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
@@ -400,7 +400,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 31457280, "offset": 31457280, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 31457280, "offset": 31457280, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 31457280, "offset": 31457280, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", "actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
@@ -451,7 +451,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 327680, "offset": 327680, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 327680, "offset": 327680, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 327680, "offset": 327680, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", "actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
@@ -502,7 +502,7 @@ read 512/512 bytes at offset 0
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 2048, "offset": 2048, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 2048, "offset": 2048, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 2048, "offset": 2048, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", "actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
@@ -533,7 +533,7 @@ WARNING: Image format was not specified for 'TEST_DIR/t.raw' and probing guessed
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 512, "offset": 512, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 512, "offset": 512, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 512, "offset": 512, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", "actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
@@ -557,7 +557,7 @@ Images are identical.
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "JOB_STATUS_CHANGE", "data": {"status": "ready", "id": "src"}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "BLOCK_JOB_READY", "data": {"device": "src", "len": 512, "offset": 512, "speed": 0, "type": "mirror"}}
 {"execute":"query-block-jobs"}
-{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 512, "offset": 512, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror"}]}
+{"return": [{"auto-finalize": true, "io-status": "ok", "device": "src", "auto-dismiss": true, "busy": false, "len": 512, "offset": 512, "status": "ready", "paused": false, "speed": 0, "ready": true, "type": "mirror", "actively-synced": false}]}
 {"execute":"quit"}
 {"return": {}}
 {"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": "SHUTDOWN", "data": {"guest": false, "reason": "host-qmp-quit"}}
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PULL 27/27] iotests: add test for changing mirror's copy_mode
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (25 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 26/27] mirror: return mirror-specific information upon query Kevin Wolf
@ 2023-10-31 18:59 ` Kevin Wolf
  2023-10-31 23:31 ` [PULL 00/27] Block layer patches Stefan Hajnoczi
  27 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2023-10-31 18:59 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

From: Fiona Ebner <f.ebner@proxmox.com>

One part of the test is using a throttled source to ensure that there
are no obvious issues when changing the copy_mode while there are
ongoing requests (source and target images are compared at the very
end).

The other part of the test is using a throttled target to ensure that
the change to active mode actually happened. This is done by hitting
the throttling limit, issuing a synchronous write and then immediately
verifying the target side. QSD is used, because otherwise, a
synchronous write would hang there.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Message-ID: <20231031135431.393137-11-f.ebner@proxmox.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 .../tests/mirror-change-copy-mode             | 193 ++++++++++++++++++
 .../tests/mirror-change-copy-mode.out         |   5 +
 2 files changed, 198 insertions(+)
 create mode 100755 tests/qemu-iotests/tests/mirror-change-copy-mode
 create mode 100644 tests/qemu-iotests/tests/mirror-change-copy-mode.out

diff --git a/tests/qemu-iotests/tests/mirror-change-copy-mode b/tests/qemu-iotests/tests/mirror-change-copy-mode
new file mode 100755
index 0000000000..51788b85c7
--- /dev/null
+++ b/tests/qemu-iotests/tests/mirror-change-copy-mode
@@ -0,0 +1,193 @@
+#!/usr/bin/env python3
+# group: rw
+#
+# Test for changing mirror copy mode from background to active
+#
+# Copyright (C) 2023 Proxmox Server Solutions GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+import os
+import time
+
+import iotests
+from iotests import qemu_img, QemuStorageDaemon
+
+iops_target = 8
+iops_source = iops_target * 2
+image_size = 1 * 1024 * 1024
+source_img = os.path.join(iotests.test_dir, 'source.' + iotests.imgfmt)
+target_img = os.path.join(iotests.test_dir, 'target.' + iotests.imgfmt)
+nbd_sock = os.path.join(iotests.sock_dir, 'nbd.sock')
+
+class TestMirrorChangeCopyMode(iotests.QMPTestCase):
+
+    def setUp(self):
+        qemu_img('create', '-f', iotests.imgfmt, source_img, str(image_size))
+        qemu_img('create', '-f', iotests.imgfmt, target_img, str(image_size))
+
+        self.qsd = QemuStorageDaemon('--nbd-server',
+                                     f'addr.type=unix,addr.path={nbd_sock}',
+                                     qmp=True)
+
+        self.qsd.cmd('object-add', {
+            'qom-type': 'throttle-group',
+            'id': 'thrgr-target',
+            'limits': {
+                'iops-write': iops_target,
+                'iops-write-max': iops_target
+            }
+        })
+
+        self.qsd.cmd('blockdev-add', {
+            'node-name': 'target',
+            'driver': 'throttle',
+            'throttle-group': 'thrgr-target',
+            'file': {
+                'driver': iotests.imgfmt,
+                'file': {
+                    'driver': 'file',
+                    'filename': target_img
+                }
+            }
+        })
+
+        self.qsd.cmd('block-export-add', {
+            'id': 'exp0',
+            'type': 'nbd',
+            'node-name': 'target',
+            'writable': True
+        })
+
+        self.vm = iotests.VM()
+        self.vm.add_args('-drive',
+                         f'file={source_img},if=none,format={iotests.imgfmt},'
+                         f'iops_wr={iops_source},'
+                         f'iops_wr_max={iops_source},'
+                         'id=source')
+        self.vm.launch()
+
+        self.vm.cmd('blockdev-add', {
+            'node-name': 'target',
+            'driver': 'nbd',
+            'export': 'target',
+            'server': {
+                'type': 'unix',
+                'path': nbd_sock
+            }
+        })
+
+
+    def tearDown(self):
+        self.vm.shutdown()
+        self.qsd.stop()
+        self.check_qemu_io_errors()
+        self.check_images_identical()
+        os.remove(source_img)
+        os.remove(target_img)
+
+    # Once the VM is shut down we can parse the log and see if qemu-io ran
+    # without errors.
+    def check_qemu_io_errors(self):
+        self.assertFalse(self.vm.is_running())
+        log = self.vm.get_log()
+        for line in log.split("\n"):
+            assert not line.startswith("Pattern verification failed")
+
+    def check_images_identical(self):
+        qemu_img('compare', '-f', iotests.imgfmt, source_img, target_img)
+
+    def start_mirror(self):
+        self.vm.cmd('blockdev-mirror',
+                    job_id='mirror',
+                    device='source',
+                    target='target',
+                    filter_node_name='mirror-top',
+                    sync='full',
+                    copy_mode='background')
+
+    def test_background_to_active(self):
+        self.vm.hmp_qemu_io('source', f'write 0 {image_size}')
+        self.vm.hmp_qemu_io('target', f'write 0 {image_size}')
+
+        self.start_mirror()
+
+        result = self.vm.cmd('query-block-jobs')
+        assert not result[0]['actively-synced']
+
+        self.vm.event_wait('BLOCK_JOB_READY')
+
+        result = self.vm.cmd('query-block-jobs')
+        assert not result[0]['actively-synced']
+
+        # Start some background requests.
+        reqs = 4 * iops_source
+        req_size = image_size // reqs
+        for i in range(0, reqs):
+            req = f'aio_write -P 7 {req_size * i} {req_size}'
+            self.vm.hmp_qemu_io('source', req)
+
+        # Wait for the first few requests.
+        time.sleep(1)
+        self.vm.qtest(f'clock_step {1 * 1000 * 1000 * 1000}')
+
+        result = self.vm.cmd('query-block-jobs')
+        # There should've been new requests.
+        assert result[0]['len'] > image_size
+        # To verify later that not all requests were completed at this point.
+        len_before_change = result[0]['len']
+
+        # Change the copy mode while requests are happening.
+        self.vm.cmd('block-job-change',
+                    id='mirror',
+                    type='mirror',
+                    copy_mode='write-blocking')
+
+        # Wait until image is actively synced.
+        while True:
+            time.sleep(0.1)
+            self.vm.qtest(f'clock_step {100 * 1000 * 1000}')
+            result = self.vm.cmd('query-block-jobs')
+            if result[0]['actively-synced']:
+                break
+
+        # Because of throttling, not all requests should have been completed
+        # above.
+        result = self.vm.cmd('query-block-jobs')
+        assert result[0]['len'] > len_before_change
+
+        # Issue enough requests for a few seconds only touching the first half
+        # of the image.
+        reqs = 4 * iops_target
+        req_size = image_size // 2 // reqs
+        for i in range(0, reqs):
+            req = f'aio_write -P 19 {req_size * i} {req_size}'
+            self.vm.hmp_qemu_io('source', req)
+
+        # Now issue a synchronous write in the second half of the image and
+        # immediately verify that it was written to the target too. This would
+        # fail without switching the copy mode. Note that this only produces a
+        # log line and the actual checking happens during tearDown().
+        req_args = f'-P 37 {3 * (image_size // 4)} {req_size}'
+        self.vm.hmp_qemu_io('source', f'write {req_args}')
+        self.vm.hmp_qemu_io('target', f'read {req_args}')
+
+        self.vm.cmd('block-job-cancel', device='mirror')
+        while len(self.vm.cmd('query-block-jobs')) > 0:
+            time.sleep(0.1)
+
+if __name__ == '__main__':
+    iotests.main(supported_fmts=['qcow2', 'raw'],
+                 supported_protocols=['file'])
diff --git a/tests/qemu-iotests/tests/mirror-change-copy-mode.out b/tests/qemu-iotests/tests/mirror-change-copy-mode.out
new file mode 100644
index 0000000000..ae1213e6f8
--- /dev/null
+++ b/tests/qemu-iotests/tests/mirror-change-copy-mode.out
@@ -0,0 +1,5 @@
+.
+----------------------------------------------------------------------
+Ran 1 tests
+
+OK
-- 
2.41.0



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PULL 00/27] Block layer patches
  2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
                   ` (26 preceding siblings ...)
  2023-10-31 18:59 ` [PULL 27/27] iotests: add test for changing mirror's copy_mode Kevin Wolf
@ 2023-10-31 23:31 ` Stefan Hajnoczi
  27 siblings, 0 replies; 32+ messages in thread
From: Stefan Hajnoczi @ 2023-10-31 23:31 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, kwolf, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 115 bytes --]

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.2 for any user-visible changes.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PULL 00/27] Block layer patches
@ 2025-11-04 17:53 Kevin Wolf
  0 siblings, 0 replies; 32+ messages in thread
From: Kevin Wolf @ 2025-11-04 17:53 UTC (permalink / raw)
  To: qemu-block; +Cc: kwolf, qemu-devel

The following changes since commit a8e63c013016f9ff981689189c5b063551d04559:

  Merge tag 'igvm-20251103--pull-request' of https://gitlab.com/kraxel/qemu into staging (2025-11-03 10:21:01 +0100)

are available in the Git repository at:

  https://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 4d0de416dd06c405906735a61c2521912aa3d72c:

  qcow2, vmdk: Restrict creation with secondary file using protocol (2025-11-04 18:25:47 +0100)

----------------------------------------------------------------
Block layer patches

- stream: Fix potential crash during job completion
- aio: add the aio_add_sqe() io_uring API
- qcow2: put discards in discard queue when discard-no-unref is enabled
- qcow2, vmdk: Restrict creation with secondary file using protocol
- iotests: Run iotests with sanitizers
- iotests: Add more image formats to the thorough testing
- iotests: Improve the dry run list to speed up thorough testing
- Code cleanup

----------------------------------------------------------------
Akihiko Odaki (2):
      qemu-img: Fix amend option parse error handling
      iotests: Run iotests with sanitizers

Eric Blake (2):
      block: Allow drivers to control protocol prefix at creation
      qcow2, vmdk: Restrict creation with secondary file using protocol

Jean-Louis Dupond (2):
      qcow2: rename update_refcount_discard to queue_discard
      qcow2: put discards in discard queue when discard-no-unref is enabled

Kevin Wolf (1):
      iotests: Test resizing file node under raw with size/offset

Stefan Hajnoczi (15):
      aio-posix: fix race between io_uring CQE and AioHandler deletion
      aio-posix: fix fdmon-io_uring.c timeout stack variable lifetime
      aio-posix: fix spurious return from ->wait() due to signals
      aio-posix: keep polling enabled with fdmon-io_uring.c
      tests/unit: skip test-nested-aio-poll with io_uring
      aio-posix: integrate fdmon into glib event loop
      aio: remove aio_context_use_g_source()
      aio: free AioContext when aio_context_new() fails
      aio: add errp argument to aio_context_setup()
      aio-posix: gracefully handle io_uring_queue_init() failure
      aio-posix: unindent fdmon_io_uring_destroy()
      aio-posix: add fdmon_ops->dispatch()
      aio-posix: add aio_add_sqe() API for user-defined io_uring requests
      block/io_uring: use aio_add_sqe()
      block/io_uring: use non-vectored read/write when possible

Thomas Huth (3):
      tests/qemu-iotests/184: Fix skip message for qemu-img without throttle
      tests/qemu-iotests: Improve the dry run list to speed up thorough testing
      tests/qemu-iotest: Add more image formats to the thorough testing

Wesley Hershberger (1):
      block: Drop detach_subchain for bdrv_replace_node

Yeqi Fu (1):
      block: replace TABs with space

 block/qcow2.h                                 |   4 +
 include/block/aio.h                           | 156 +++++++-
 include/block/block-global-state.h            |   3 +-
 include/block/nbd.h                           |   2 +-
 include/block/raw-aio.h                       |   5 -
 util/aio-posix.h                              |  18 +-
 block.c                                       |  42 +--
 block/bochs.c                                 |  14 +-
 block/crypto.c                                |   2 +-
 block/file-posix.c                            |  98 +++--
 block/file-win32.c                            |  38 +-
 block/io_uring.c                              | 505 +++++++-------------------
 block/parallels.c                             |   2 +-
 block/qcow.c                                  |  12 +-
 block/qcow2-cluster.c                         |  16 +-
 block/qcow2-refcount.c                        |  25 +-
 block/qcow2.c                                 |   4 +-
 block/qed.c                                   |   2 +-
 block/raw-format.c                            |   2 +-
 block/vdi.c                                   |   2 +-
 block/vhdx.c                                  |   2 +-
 block/vmdk.c                                  |   2 +-
 block/vpc.c                                   |   2 +-
 qemu-img.c                                    |   2 +-
 stubs/io_uring.c                              |  32 --
 tests/unit/test-aio.c                         |   7 +-
 tests/unit/test-nested-aio-poll.c             |  13 +-
 util/aio-posix.c                              | 137 ++++---
 util/aio-win32.c                              |   7 +-
 util/async.c                                  |  71 ++--
 util/fdmon-epoll.c                            |  34 +-
 util/fdmon-io_uring.c                         | 247 ++++++++++---
 util/fdmon-poll.c                             |  85 ++++-
 tests/qemu-iotests/testrunner.py              |  12 +
 block/trace-events                            |  12 +-
 stubs/meson.build                             |   3 -
 tests/qemu-iotests/184                        |   2 +-
 tests/qemu-iotests/257                        |   8 +-
 tests/qemu-iotests/257.out                    |  14 +-
 tests/qemu-iotests/check                      |  42 ++-
 tests/qemu-iotests/meson.build                |  11 +-
 tests/qemu-iotests/tests/resize-below-raw     |  51 ++-
 tests/qemu-iotests/tests/resize-below-raw.out |   4 +-
 util/trace-events                             |   4 +
 44 files changed, 956 insertions(+), 800 deletions(-)
 delete mode 100644 stubs/io_uring.c



^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2025-11-04 17:54 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-31 18:58 [PULL 00/27] Block layer patches Kevin Wolf
2023-10-31 18:58 ` [PULL 01/27] qemu-img: rebase: stop when reaching EOF of old backing file Kevin Wolf
2023-10-31 18:58 ` [PULL 02/27] qemu-iotests: 024: add rebasing test case for overlay_size > backing_size Kevin Wolf
2023-10-31 18:58 ` [PULL 03/27] qemu-img: rebase: use backing files' BlockBackend for buffer alignment Kevin Wolf
2023-10-31 18:58 ` [PULL 04/27] qemu-img: add chunk size parameter to compare_buffers() Kevin Wolf
2023-10-31 18:58 ` [PULL 05/27] qemu-img: rebase: avoid unnecessary COW operations Kevin Wolf
2023-10-31 18:58 ` [PULL 06/27] iotests/{024, 271}: add testcases for qemu-img rebase Kevin Wolf
2024-07-22  7:18   ` Thomas Huth
2024-07-30  9:43     ` Andrey Drobyshev
2023-10-31 18:58 ` [PULL 07/27] qemu-img: add compression option to rebase subcommand Kevin Wolf
2023-10-31 18:58 ` [PULL 08/27] iotests: add tests for "qemu-img rebase" with compression Kevin Wolf
2023-10-31 18:59 ` [PULL 09/27] block: Fix locking in media change monitor commands Kevin Wolf
2023-10-31 18:59 ` [PULL 10/27] iotests: Test media change with iothreads Kevin Wolf
2023-10-31 18:59 ` [PULL 11/27] blockjob: drop AioContext lock before calling bdrv_graph_wrlock() Kevin Wolf
2023-10-31 18:59 ` [PULL 12/27] block: avoid potential deadlock during bdrv_graph_wrlock() in bdrv_close() Kevin Wolf
2023-10-31 18:59 ` [PULL 13/27] blockdev: mirror: avoid potential deadlock when using iothread Kevin Wolf
2023-10-31 18:59 ` [PULL 14/27] block: rename blk_io_plug_call() API to defer_call() Kevin Wolf
2023-10-31 18:59 ` [PULL 15/27] util/defer-call: move defer_call() to util/ Kevin Wolf
2023-10-31 18:59 ` [PULL 16/27] virtio: use defer_call() in virtio_irqfd_notify() Kevin Wolf
2023-10-31 18:59 ` [PULL 17/27] virtio-blk: remove batch notification BH Kevin Wolf
2023-10-31 18:59 ` [PULL 18/27] blockjob: introduce block-job-change QMP command Kevin Wolf
2023-10-31 18:59 ` [PULL 19/27] block/mirror: set actively_synced even after the job is ready Kevin Wolf
2023-10-31 18:59 ` [PULL 20/27] block/mirror: move dirty bitmap to filter Kevin Wolf
2023-10-31 18:59 ` [PULL 21/27] block/mirror: determine copy_to_target only once Kevin Wolf
2023-10-31 18:59 ` [PULL 22/27] mirror: implement mirror_change method Kevin Wolf
2023-10-31 18:59 ` [PULL 23/27] qapi/block-core: use JobType for BlockJobInfo's type Kevin Wolf
2023-10-31 18:59 ` [PULL 24/27] qapi/block-core: turn BlockJobInfo into a union Kevin Wolf
2023-10-31 18:59 ` [PULL 25/27] blockjob: query driver-specific info via a new 'query' driver method Kevin Wolf
2023-10-31 18:59 ` [PULL 26/27] mirror: return mirror-specific information upon query Kevin Wolf
2023-10-31 18:59 ` [PULL 27/27] iotests: add test for changing mirror's copy_mode Kevin Wolf
2023-10-31 23:31 ` [PULL 00/27] Block layer patches Stefan Hajnoczi
  -- strict thread matches above, loose matches on Subject: below --
2025-11-04 17:53 Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).