qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/6] block: seriously improve savevm performance
@ 2020-06-19 10:07 Denis V. Lunev
  2020-06-19 10:07 ` [PATCH 1/6] migration/savevm: respect qemu_fclose() error code in save_snapshot() Denis V. Lunev
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Denis V. Lunev @ 2020-06-19 10:07 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Juan Quintela, Dr. David Alan Gilbert, Max Reitz, Denis Plotnikov,
	Stefan Hajnoczi, Denis V . Lunev

This series do standard basic things:
- it creates intermediate buffer for all writes from QEMU migration code
  to QCOW2 image,
- this buffer is sent to disk asynchronously, allowing several writes to
  run in parallel.

In general, migration code is fantastically inefficent (by observation),
buffers are not aligned and sent with arbitrary pieces, a lot of time
less than 100 bytes at a chunk, which results in read-modify-write
operations with non-cached operations. It should also be noted that all
operations are performed into unallocated image blocks, which also suffer
due to partial writes to such new clusters.

This patch series is an implementation of idea discussed in the RFC
posted by Denis Plotnikov
https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg01925.html
Results with this series over NVME are better than original code
                original     rfc    this
cached:          1.79s      2.38s   1.27s
non-cached:      3.29s      1.31s   0.81s

Changes from v4:
- added patch 4 with blk_save_vmstate() cleanup
- added R-By
- bdrv_flush_vmstate -> bdrv_finalize_vmstate
- fixed return code of bdrv_co_do_save_vmstate
- fixed typos in comments (Eric, thanks!)
- fixed patchew warnings

Changes from v3:
- rebased to master
- added patch 3 which removes aio_task_pool_wait_one()
- added R-By to patch 1
- patch 4 is rewritten via bdrv_run_co
- error path in blk_save_vmstate() is rewritten to call bdrv_flush_vmstate
  unconditionally
- added some comments
- fixes initialization in bdrv_co_vmstate_save_task_entry as suggested

Changes from v2:
- code moved from QCOW2 level to generic block level
- created bdrv_flush_vmstate helper to fix 022, 029 tests
- added recursive for bs->file in bdrv_co_flush_vmstate (fix 267)
- fixed blk_save_vmstate helper
- fixed coroutine wait as Vladimir suggested with waiting fixes from me

Changes from v1:
- patchew warning fixed
- fixed validation that only 1 waiter is allowed in patch 1

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>




^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/6] migration/savevm: respect qemu_fclose() error code in save_snapshot()
  2020-06-19 10:07 [PATCH v5 0/6] block: seriously improve savevm performance Denis V. Lunev
@ 2020-06-19 10:07 ` Denis V. Lunev
  2020-06-19 10:07 ` [PATCH 2/6] block/aio_task: allow start/wait task from any coroutine Denis V. Lunev
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Denis V. Lunev @ 2020-06-19 10:07 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Juan Quintela, Max Reitz, Denis Plotnikov,
	Stefan Hajnoczi, Denis V. Lunev

qemu_fclose() could return error, f.e. if bdrv_co_flush() will return
the error.

This validation will become more important once we will start waiting of
asynchronous IO operations, started from bdrv_write_vmstate(), which are
coming soon.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 migration/savevm.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index b979ea6e7f..da3dead4e9 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2628,7 +2628,7 @@ int save_snapshot(const char *name, Error **errp)
 {
     BlockDriverState *bs, *bs1;
     QEMUSnapshotInfo sn1, *sn = &sn1, old_sn1, *old_sn = &old_sn1;
-    int ret = -1;
+    int ret = -1, ret2;
     QEMUFile *f;
     int saved_vm_running;
     uint64_t vm_state_size;
@@ -2712,10 +2712,14 @@ int save_snapshot(const char *name, Error **errp)
     }
     ret = qemu_savevm_state(f, errp);
     vm_state_size = qemu_ftell(f);
-    qemu_fclose(f);
+    ret2 = qemu_fclose(f);
     if (ret < 0) {
         goto the_end;
     }
+    if (ret2 < 0) {
+        ret = ret2;
+        goto the_end;
+    }
 
     /* The bdrv_all_create_snapshot() call that follows acquires the AioContext
      * for itself.  BDRV_POLL_WHILE() does not support nested locking because
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/6] block/aio_task: allow start/wait task from any coroutine
  2020-06-19 10:07 [PATCH v5 0/6] block: seriously improve savevm performance Denis V. Lunev
  2020-06-19 10:07 ` [PATCH 1/6] migration/savevm: respect qemu_fclose() error code in save_snapshot() Denis V. Lunev
@ 2020-06-19 10:07 ` Denis V. Lunev
  2020-06-19 10:07 ` [PATCH 3/6] block/aio_task: drop aio_task_pool_wait_one() helper Denis V. Lunev
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Denis V. Lunev @ 2020-06-19 10:07 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Juan Quintela, Dr. David Alan Gilbert, Max Reitz, Denis Plotnikov,
	Stefan Hajnoczi, Denis V . Lunev

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

Currently, aio task pool assumes that there is a main coroutine, which
creates tasks and wait for them. Let's remove the restriction by using
CoQueue. Code becomes clearer, interface more obvious.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 block/aio_task.c | 21 ++++++---------------
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/block/aio_task.c b/block/aio_task.c
index 88989fa248..cf62e5c58b 100644
--- a/block/aio_task.c
+++ b/block/aio_task.c
@@ -27,11 +27,10 @@
 #include "block/aio_task.h"
 
 struct AioTaskPool {
-    Coroutine *main_co;
     int status;
     int max_busy_tasks;
     int busy_tasks;
-    bool waiting;
+    CoQueue waiters;
 };
 
 static void coroutine_fn aio_task_co(void *opaque)
@@ -52,31 +51,23 @@ static void coroutine_fn aio_task_co(void *opaque)
 
     g_free(task);
 
-    if (pool->waiting) {
-        pool->waiting = false;
-        aio_co_wake(pool->main_co);
-    }
+    qemu_co_queue_restart_all(&pool->waiters);
 }
 
 void coroutine_fn aio_task_pool_wait_one(AioTaskPool *pool)
 {
     assert(pool->busy_tasks > 0);
-    assert(qemu_coroutine_self() == pool->main_co);
 
-    pool->waiting = true;
-    qemu_coroutine_yield();
+    qemu_co_queue_wait(&pool->waiters, NULL);
 
-    assert(!pool->waiting);
     assert(pool->busy_tasks < pool->max_busy_tasks);
 }
 
 void coroutine_fn aio_task_pool_wait_slot(AioTaskPool *pool)
 {
-    if (pool->busy_tasks < pool->max_busy_tasks) {
-        return;
+    while (pool->busy_tasks >= pool->max_busy_tasks) {
+        aio_task_pool_wait_one(pool);
     }
-
-    aio_task_pool_wait_one(pool);
 }
 
 void coroutine_fn aio_task_pool_wait_all(AioTaskPool *pool)
@@ -98,8 +89,8 @@ AioTaskPool *coroutine_fn aio_task_pool_new(int max_busy_tasks)
 {
     AioTaskPool *pool = g_new0(AioTaskPool, 1);
 
-    pool->main_co = qemu_coroutine_self();
     pool->max_busy_tasks = max_busy_tasks;
+    qemu_co_queue_init(&pool->waiters);
 
     return pool;
 }
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/6] block/aio_task: drop aio_task_pool_wait_one() helper
  2020-06-19 10:07 [PATCH v5 0/6] block: seriously improve savevm performance Denis V. Lunev
  2020-06-19 10:07 ` [PATCH 1/6] migration/savevm: respect qemu_fclose() error code in save_snapshot() Denis V. Lunev
  2020-06-19 10:07 ` [PATCH 2/6] block/aio_task: allow start/wait task from any coroutine Denis V. Lunev
@ 2020-06-19 10:07 ` Denis V. Lunev
  2020-06-19 10:07 ` [PATCH 4/6] block/block-backend: remove always true check from blk_save_vmstate Denis V. Lunev
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Denis V. Lunev @ 2020-06-19 10:07 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Juan Quintela, Dr. David Alan Gilbert,
	Max Reitz, Denis Plotnikov, Stefan Hajnoczi, Denis V. Lunev

It is not used outside the module.

Actually there are 2 kind of waiters:
- for a slot and
- for all tasks to finish
This patch limits external API to listed types.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Suggested-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 block/aio_task.c         | 13 ++-----------
 include/block/aio_task.h |  1 -
 2 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/block/aio_task.c b/block/aio_task.c
index cf62e5c58b..7ba15ff41f 100644
--- a/block/aio_task.c
+++ b/block/aio_task.c
@@ -54,26 +54,17 @@ static void coroutine_fn aio_task_co(void *opaque)
     qemu_co_queue_restart_all(&pool->waiters);
 }
 
-void coroutine_fn aio_task_pool_wait_one(AioTaskPool *pool)
-{
-    assert(pool->busy_tasks > 0);
-
-    qemu_co_queue_wait(&pool->waiters, NULL);
-
-    assert(pool->busy_tasks < pool->max_busy_tasks);
-}
-
 void coroutine_fn aio_task_pool_wait_slot(AioTaskPool *pool)
 {
     while (pool->busy_tasks >= pool->max_busy_tasks) {
-        aio_task_pool_wait_one(pool);
+        qemu_co_queue_wait(&pool->waiters, NULL);
     }
 }
 
 void coroutine_fn aio_task_pool_wait_all(AioTaskPool *pool)
 {
     while (pool->busy_tasks > 0) {
-        aio_task_pool_wait_one(pool);
+        qemu_co_queue_wait(&pool->waiters, NULL);
     }
 }
 
diff --git a/include/block/aio_task.h b/include/block/aio_task.h
index 50bc1e1817..50b1c036c5 100644
--- a/include/block/aio_task.h
+++ b/include/block/aio_task.h
@@ -48,7 +48,6 @@ bool aio_task_pool_empty(AioTaskPool *pool);
 void coroutine_fn aio_task_pool_start_task(AioTaskPool *pool, AioTask *task);
 
 void coroutine_fn aio_task_pool_wait_slot(AioTaskPool *pool);
-void coroutine_fn aio_task_pool_wait_one(AioTaskPool *pool);
 void coroutine_fn aio_task_pool_wait_all(AioTaskPool *pool);
 
 #endif /* BLOCK_AIO_TASK_H */
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/6] block/block-backend: remove always true check from blk_save_vmstate
  2020-06-19 10:07 [PATCH v5 0/6] block: seriously improve savevm performance Denis V. Lunev
                   ` (2 preceding siblings ...)
  2020-06-19 10:07 ` [PATCH 3/6] block/aio_task: drop aio_task_pool_wait_one() helper Denis V. Lunev
@ 2020-06-19 10:07 ` Denis V. Lunev
  2020-06-20  3:06   ` Vladimir Sementsov-Ogievskiy
  2020-06-19 10:07 ` [PATCH 5/6] block, migration: add bdrv_finalize_vmstate helper Denis V. Lunev
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Denis V. Lunev @ 2020-06-19 10:07 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Juan Quintela, Dr. David Alan Gilbert, Max Reitz, Denis Plotnikov,
	Stefan Hajnoczi, Denis V. Lunev

bdrv_save_vmstate() returns either error with negative return value or
size. Thus this check is useless.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Suggested-by: Eric Blake <eblake@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 block/block-backend.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 6936b25c83..1c6e53bbde 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2188,7 +2188,7 @@ int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf,
         return ret;
     }
 
-    if (ret == size && !blk->enable_write_cache) {
+    if (!blk->enable_write_cache) {
         ret = bdrv_flush(blk_bs(blk));
     }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/6] block, migration: add bdrv_finalize_vmstate helper
  2020-06-19 10:07 [PATCH v5 0/6] block: seriously improve savevm performance Denis V. Lunev
                   ` (3 preceding siblings ...)
  2020-06-19 10:07 ` [PATCH 4/6] block/block-backend: remove always true check from blk_save_vmstate Denis V. Lunev
@ 2020-06-19 10:07 ` Denis V. Lunev
  2020-06-19 10:07 ` [PATCH 6/6] block/io: improve savevm performance Denis V. Lunev
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Denis V. Lunev @ 2020-06-19 10:07 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Juan Quintela, Dr. David Alan Gilbert,
	Max Reitz, Denis Plotnikov, Stefan Hajnoczi, Denis V. Lunev

Right now bdrv_fclose() is just calling bdrv_flush().

The problem is that migration code is working inefficiently from block
layer terms and are frequently called for very small pieces of
unaligned data. Block layer is capable to work this way, but this is very
slow.

This patch is a preparation for the introduction of the intermediate
buffer at block driver state. It would be beneficial to separate
conventional bdrv_flush() from closing QEMU file from migration code.

The patch also forces bdrv_finalize_vmstate() operation inside
synchronous blk_save_vmstate() operation. This helper is used from
qemu-io only.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 block/block-backend.c |  6 +++++-
 block/io.c            | 15 +++++++++++++++
 include/block/block.h |  5 +++++
 migration/savevm.c    |  4 ++++
 4 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 1c6e53bbde..5bb11c8e01 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -2177,16 +2177,20 @@ int blk_truncate(BlockBackend *blk, int64_t offset, bool exact,
 int blk_save_vmstate(BlockBackend *blk, const uint8_t *buf,
                      int64_t pos, int size)
 {
-    int ret;
+    int ret, ret2;
 
     if (!blk_is_available(blk)) {
         return -ENOMEDIUM;
     }
 
     ret = bdrv_save_vmstate(blk_bs(blk), buf, pos, size);
+    ret2 = bdrv_finalize_vmstate(blk_bs(blk));
     if (ret < 0) {
         return ret;
     }
+    if (ret2 < 0) {
+        return ret2;
+    }
 
     if (!blk->enable_write_cache) {
         ret = bdrv_flush(blk_bs(blk));
diff --git a/block/io.c b/block/io.c
index df8f2a98d4..1f69268361 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2724,6 +2724,21 @@ int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
     return bdrv_rw_vmstate(bs, qiov, pos, true);
 }
 
+static int coroutine_fn bdrv_co_finalize_vmstate(BlockDriverState *bs)
+{
+    return 0;
+}
+
+static int coroutine_fn bdrv_finalize_vmstate_co_entry(void *opaque)
+{
+    return bdrv_co_finalize_vmstate(opaque);
+}
+
+int bdrv_finalize_vmstate(BlockDriverState *bs)
+{
+    return bdrv_run_co(bs, bdrv_finalize_vmstate_co_entry, bs);
+}
+
 /**************************************************************/
 /* async I/Os */
 
diff --git a/include/block/block.h b/include/block/block.h
index 25e299605e..ab2c962094 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -572,6 +572,11 @@ int bdrv_save_vmstate(BlockDriverState *bs, const uint8_t *buf,
 
 int bdrv_load_vmstate(BlockDriverState *bs, uint8_t *buf,
                       int64_t pos, int size);
+/*
+ * bdrv_finalize_vmstate() is mandatory to commit vmstate changes if
+ * bdrv_save_vmstate() was ever called.
+ */
+int bdrv_finalize_vmstate(BlockDriverState *bs);
 
 void bdrv_img_create(const char *filename, const char *fmt,
                      const char *base_filename, const char *base_fmt,
diff --git a/migration/savevm.c b/migration/savevm.c
index da3dead4e9..798a4cb402 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -150,6 +150,10 @@ static ssize_t block_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
 
 static int bdrv_fclose(void *opaque, Error **errp)
 {
+    int err = bdrv_finalize_vmstate(opaque);
+    if (err < 0) {
+        return err;
+    }
     return bdrv_flush(opaque);
 }
 
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/6] block/io: improve savevm performance
  2020-06-19 10:07 [PATCH v5 0/6] block: seriously improve savevm performance Denis V. Lunev
                   ` (4 preceding siblings ...)
  2020-06-19 10:07 ` [PATCH 5/6] block, migration: add bdrv_finalize_vmstate helper Denis V. Lunev
@ 2020-06-19 10:07 ` Denis V. Lunev
  2020-06-19 11:02 ` [PATCH v5 0/6] block: seriously " no-reply
  2020-06-22  8:33 ` [PATCH 7/6] block/io: improve loadvm performance Denis V. Lunev
  7 siblings, 0 replies; 14+ messages in thread
From: Denis V. Lunev @ 2020-06-19 10:07 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Juan Quintela, Dr. David Alan Gilbert,
	Max Reitz, Denis Plotnikov, Stefan Hajnoczi, Denis V. Lunev

This patch does 2 standard basic things:
- it creates intermediate buffer for all writes from QEMU migration code
  to block driver,
- this buffer is sent to disk asynchronously, allowing several writes to
  run in parallel.

Thus bdrv_vmstate_write() is becoming asynchronous. All pending operations
completion are performed in newly invented bdrv_finalize_vmstate().

In general, migration code is fantastically inefficent (by observation),
buffers are not aligned and sent with arbitrary pieces, a lot of time
less than 100 bytes at a chunk, which results in read-modify-write
operations if target file descriptor is opened with O_DIRECT. It should
also be noted that all operations are performed into unallocated image
blocks, which also suffer due to partial writes to such new clusters
even on cached file descriptors.

Snapshot creation time (2 GB Fedora-31 VM running over NVME storage):
                original     fixed
cached:          1.79s       1.27s
non-cached:      3.29s       0.81s

The difference over HDD would be more significant :)

Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
---
 block/io.c                | 126 +++++++++++++++++++++++++++++++++++++-
 include/block/block_int.h |   8 +++
 2 files changed, 132 insertions(+), 2 deletions(-)

diff --git a/block/io.c b/block/io.c
index 1f69268361..71a696deb7 100644
--- a/block/io.c
+++ b/block/io.c
@@ -26,6 +26,7 @@
 #include "trace.h"
 #include "sysemu/block-backend.h"
 #include "block/aio-wait.h"
+#include "block/aio_task.h"
 #include "block/blockjob.h"
 #include "block/blockjob_int.h"
 #include "block/block_int.h"
@@ -33,6 +34,7 @@
 #include "qapi/error.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
+#include "qemu/units.h"
 #include "sysemu/replay.h"
 
 /* Maximum bounce buffer for copy-on-read and write zeroes, in bytes */
@@ -2640,6 +2642,103 @@ typedef struct BdrvVmstateCo {
     bool                is_read;
 } BdrvVmstateCo;
 
+typedef struct BdrvVMStateTask {
+    AioTask task;
+
+    BlockDriverState *bs;
+    int64_t offset;
+    void *buf;
+    size_t bytes;
+} BdrvVMStateTask;
+
+typedef struct BdrvSaveVMState {
+    AioTaskPool *pool;
+    BdrvVMStateTask *t;
+} BdrvSaveVMState;
+
+
+static coroutine_fn int bdrv_co_vmstate_save_task_entry(AioTask *task)
+{
+    int err = 0;
+    BdrvVMStateTask *t = container_of(task, BdrvVMStateTask, task);
+
+    if (t->bytes != 0) {
+        QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, t->buf, t->bytes);
+
+        bdrv_inc_in_flight(t->bs);
+        err = t->bs->drv->bdrv_save_vmstate(t->bs, &qiov, t->offset);
+        bdrv_dec_in_flight(t->bs);
+    }
+
+    qemu_vfree(t->buf);
+    return err;
+}
+
+static BdrvVMStateTask *bdrv_vmstate_task_create(BlockDriverState *bs,
+                                                 int64_t pos, size_t size)
+{
+    BdrvVMStateTask *t = g_new(BdrvVMStateTask, 1);
+
+    *t = (BdrvVMStateTask) {
+        .task.func = bdrv_co_vmstate_save_task_entry,
+        .buf = qemu_blockalign(bs, size),
+        .offset = pos,
+        .bs = bs,
+    };
+
+    return t;
+}
+
+static int bdrv_co_do_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
+                                   int64_t pos)
+{
+    BdrvSaveVMState *state = bs->savevm_state;
+    BdrvVMStateTask *t;
+    size_t buf_size = MAX(bdrv_get_cluster_size(bs), 1 * MiB);
+    size_t to_copy, off;
+
+    if (state == NULL) {
+        state = g_new(BdrvSaveVMState, 1);
+        *state = (BdrvSaveVMState) {
+            .pool = aio_task_pool_new(BDRV_VMSTATE_WORKERS_MAX),
+            .t = bdrv_vmstate_task_create(bs, pos, buf_size),
+        };
+
+        bs->savevm_state = state;
+    }
+
+    if (aio_task_pool_status(state->pool) < 0) {
+        /*
+         * The operation as a whole is unsuccessful. Prohibit all futher
+         * operations. If we clean here, new useless ops will come again.
+         * Thus we rely on caller for cleanup here.
+         */
+        return aio_task_pool_status(state->pool);
+    }
+
+    t = state->t;
+    if (t->offset + t->bytes != pos) {
+        /* Normally this branch is not reachable from migration */
+        return bs->drv->bdrv_save_vmstate(bs, qiov, pos);
+    }
+
+    off = 0;
+    while (1) {
+        to_copy = MIN(qiov->size - off, buf_size - t->bytes);
+        qemu_iovec_to_buf(qiov, off, t->buf + t->bytes, to_copy);
+        t->bytes += to_copy;
+        if (t->bytes < buf_size) {
+            return 0;
+        }
+
+        aio_task_pool_start_task(state->pool, &t->task);
+
+        pos += to_copy;
+        off += to_copy;
+        state->t = t = bdrv_vmstate_task_create(bs, pos, buf_size);
+    }
+}
+
 static int coroutine_fn
 bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
                    bool is_read)
@@ -2655,7 +2754,7 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
         if (is_read) {
             ret = drv->bdrv_load_vmstate(bs, qiov, pos);
         } else {
-            ret = drv->bdrv_save_vmstate(bs, qiov, pos);
+            ret = bdrv_co_do_save_vmstate(bs, qiov, pos);
         }
     } else if (bs->file) {
         ret = bdrv_co_rw_vmstate(bs->file->bs, qiov, pos, is_read);
@@ -2726,7 +2825,30 @@ int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
 
 static int coroutine_fn bdrv_co_finalize_vmstate(BlockDriverState *bs)
 {
-    return 0;
+    int err;
+    BdrvSaveVMState *state = bs->savevm_state;
+
+    if (bs->drv->bdrv_save_vmstate == NULL && bs->file != NULL) {
+        return bdrv_co_finalize_vmstate(bs->file->bs);
+    }
+    if (state == NULL) {
+        return 0;
+    }
+
+    if (aio_task_pool_status(state->pool) >= 0) {
+        /* We are on success path, commit last chunk if possible */
+        aio_task_pool_start_task(state->pool, &state->t->task);
+    }
+
+    aio_task_pool_wait_all(state->pool);
+    err = aio_task_pool_status(state->pool);
+
+    aio_task_pool_free(state->pool);
+    g_free(state);
+
+    bs->savevm_state = NULL;
+
+    return err;
 }
 
 static int coroutine_fn bdrv_finalize_vmstate_co_entry(void *opaque)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 791de6a59c..f90f0e8b6a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -61,6 +61,8 @@
 
 #define BLOCK_PROBE_BUF_SIZE        512
 
+#define BDRV_VMSTATE_WORKERS_MAX    8
+
 enum BdrvTrackedRequestType {
     BDRV_TRACKED_READ,
     BDRV_TRACKED_WRITE,
@@ -784,6 +786,9 @@ struct BdrvChild {
     QLIST_ENTRY(BdrvChild) next_parent;
 };
 
+
+typedef struct BdrvSaveVMState BdrvSaveVMState;
+
 /*
  * Note: the function bdrv_append() copies and swaps contents of
  * BlockDriverStates, so if you add new fields to this struct, please
@@ -947,6 +952,9 @@ struct BlockDriverState {
 
     /* BdrvChild links to this node may never be frozen */
     bool never_freeze;
+
+    /* Intermediate buffer for VM state saving from snapshot creation code */
+    BdrvSaveVMState *savevm_state;
 };
 
 struct BlockBackendRootState {
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 0/6] block: seriously improve savevm performance
  2020-06-19 10:07 [PATCH v5 0/6] block: seriously improve savevm performance Denis V. Lunev
                   ` (5 preceding siblings ...)
  2020-06-19 10:07 ` [PATCH 6/6] block/io: improve savevm performance Denis V. Lunev
@ 2020-06-19 11:02 ` no-reply
  2020-06-22  8:34   ` Denis V. Lunev
  2020-06-22  8:33 ` [PATCH 7/6] block/io: improve loadvm performance Denis V. Lunev
  7 siblings, 1 reply; 14+ messages in thread
From: no-reply @ 2020-06-19 11:02 UTC (permalink / raw)
  To: den
  Cc: kwolf, fam, vsementsov, qemu-block, quintela, qemu-devel,
	dgilbert, dplotnikov, stefanha, den, mreitz

Patchew URL: https://patchew.org/QEMU/20200619100708.30440-1-den@openvz.org/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

  GEN     docs/interop/qemu-qmp-ref.html
  GEN     docs/interop/qemu-qmp-ref.txt
  GEN     docs/interop/qemu-qmp-ref.7
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `  CC      qga/commands.o
  CC      qga/guest-agent-command-state.o
  CC      qga/main.o
  CC      qga/commands-posix.o
---
  LINK    elf2dmp
  CC      qemu-img.o
  AR      libvhost-user.a
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  AS      pc-bios/optionrom/linuxboot.o
  CC      pc-bios/optionrom/linuxboot_dma.o
  AS      pc-bios/optionrom/kvmvapic.o
---
  SIGN    pc-bios/optionrom/pvh.bin
  LINK    qemu-ga
  LINK    qemu-keymap
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    ivshmem-client
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    ivshmem-server
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    qemu-nbd
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    qemu-storage-daemon
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    qemu-img
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    qemu-io
  LINK    qemu-edid
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    fsdev/virtfs-proxy-helper
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    scsi/qemu-pr-helper
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    qemu-bridge-helper
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    virtiofsd
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  LINK    vhost-user-input
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
  GEN     x86_64-softmmu/hmp-commands.h
  GEN     x86_64-softmmu/config-devices.h
  GEN     x86_64-softmmu/hmp-commands-info.h
---
  CC      x86_64-softmmu/gdbstub-xml.o
  CC      x86_64-softmmu/trace/generated-helpers.o
  LINK    x86_64-softmmu/qemu-system-x86_64
/usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
common.rc: line 50: test: check: binary operator expected
(printf '#define QEMU_PKGVERSION ""\n'; printf '#define QEMU_FULL_VERSION "5.0.50"\n'; ) > qemu-version.h.tmp
make -C /tmp/qemu-test/src/slirp BUILD_DIR="/tmp/qemu-test/build/slirp" PKG_CONFIG="pkg-config" CC="clang" AR="ar"      LD="ld" RANLIB="ranlib" CFLAGS="-I/usr/include/pixman-1   -Werror -fsanitize=undefined -fsanitize=address  -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include  -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99  -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition -Wno-tautological-type-limit-compare -fstack-protector-strong   -I/usr/include/p11-kit-1   -DSTRUCT_IOVEC_DEFINED  -I/usr/include/libpng16  -I/usr/include/spice-1 -I/usr/include/spice-server -I/usr/include/cacard -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/nss3 -I/usr/include/nspr4 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/pixman-1   -I/tmp/qemu-test/src/tests -I/tmp/qemu-test/src/tests/qtest -g " LDFLAGS="-Wl,--warn-common -fsanitize=undefined -fsanitize=address -Wl,-z,relro -Wl,-z,now -pie -m64  -fstack-protector-strong"
---
clang -iquote /tmp/qemu-test/build/tests -iquote tests -iquote /tmp/qemu-test/src/tcg/i386 -isystem /tmp/qemu-test/src/linux-headers -isystem /tmp/qemu-test/build/linux-headers -iquote . -iquote /tmp/qemu-test/src -iquote /tmp/qemu-test/src/accel/tcg -iquote /tmp/qemu-test/src/include -iquote /tmp/qemu-test/src/disas/libvixl -I/usr/include/pixman-1   -Werror -fsanitize=undefined -fsanitize=address  -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include  -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99  -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition -Wno-tautological-type-limit-compare -fstack-protector-strong   -I/usr/include/p11-kit-1   -DSTRUCT_IOVEC_DEFINED  -I/usr/include/libpng16  -I/usr/include/spice-1 -I/usr/include/spice-server -I/usr/include/cacard -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/nss3 -I/usr/include/nspr4 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/pixman-1   -I/tmp/qemu-test/src/tests -I/tmp/qemu-test/src/tests/qtest -MMD -MP -MT tests/test-io-task.o -MF tests/test-io-task.d -g   -c -o tests/test-io-task.o /tmp/qemu-test/src/tests/test-io-task.c
clang -iquote /tmp/qemu-test/build/tests -iquote tests -iquote /tmp/qemu-test/src/tcg/i386 -isystem /tmp/qemu-test/src/linux-headers -isystem /tmp/qemu-test/build/linux-headers -iquote . -iquote /tmp/qemu-test/src -iquote /tmp/qemu-test/src/accel/tcg -iquote /tmp/qemu-test/src/include -iquote /tmp/qemu-test/src/disas/libvixl -I/usr/include/pixman-1   -Werror -fsanitize=undefined -fsanitize=address  -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include  -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99  -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition -Wno-tautological-type-limit-compare -fstack-protector-strong   -I/usr/include/p11-kit-1   -DSTRUCT_IOVEC_DEFINED  -I/usr/include/libpng16  -I/usr/include/spice-1 -I/usr/include/spice-server -I/usr/include/cacard -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/nss3 -I/usr/include/nspr4 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/pixman-1   -I/tmp/qemu-test/src/tests -I/tmp/qemu-test/src/tests/qtest -MMD -MP -MT tests/test-io-channel-socket.o -MF tests/test-io-channel-socket.d -g   -c -o tests/test-io-channel-socket.o /tmp/qemu-test/src/tests/test-io-channel-socket.c
clang -iquote /tmp/qemu-test/build/tests -iquote tests -iquote /tmp/qemu-test/src/tcg/i386 -isystem /tmp/qemu-test/src/linux-headers -isystem /tmp/qemu-test/build/linux-headers -iquote . -iquote /tmp/qemu-test/src -iquote /tmp/qemu-test/src/accel/tcg -iquote /tmp/qemu-test/src/include -iquote /tmp/qemu-test/src/disas/libvixl -I/usr/include/pixman-1   -Werror -fsanitize=undefined -fsanitize=address  -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include  -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99  -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition -Wno-tautological-type-limit-compare -fstack-protector-strong   -I/usr/include/p11-kit-1   -DSTRUCT_IOVEC_DEFINED  -I/usr/include/libpng16  -I/usr/include/spice-1 -I/usr/include/spice-server -I/usr/include/cacard -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/nss3 -I/usr/include/nspr4 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/pixman-1   -I/tmp/qemu-test/src/tests -I/tmp/qemu-test/src/tests/qtest -MMD -MP -MT tests/io-channel-helpers.o -MF tests/io-channel-helpers.d -g   -c -o tests/io-channel-helpers.o /tmp/qemu-test/src/tests/io-channel-helpers.c
/tmp/qemu-test/src/tests/qht-bench.c:287:29: error: implicit conversion from 'unsigned long' to 'double' changes value from 18446744073709551615 to 18446744073709551616 [-Werror,-Wimplicit-int-float-conversion]
        *threshold = rate * UINT64_MAX;
                          ~ ^~~~~~~~~~
/usr/include/stdint.h:130:23: note: expanded from macro 'UINT64_MAX'
---
18446744073709551615UL
^~~~~~~~~~~~~~~~~~~~~~
1 error generated.
make: *** [/tmp/qemu-test/src/rules.mak:69: tests/qht-bench.o] Error 1
make: *** Waiting for unfinished jobs....
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 669, in <module>
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=c67c2e4d508c4d55a517b3da9a7e6c35', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=x86_64-softmmu', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-k4xm9z3a/src/docker-src.2020-06-19-06.55.27.1715:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-debug']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=c67c2e4d508c4d55a517b3da9a7e6c35
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-k4xm9z3a/src'
make: *** [docker-run-test-debug@fedora] Error 2

real    7m2.468s
user    0m9.043s


The full log is available at
http://patchew.org/logs/20200619100708.30440-1-den@openvz.org/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/6] block/block-backend: remove always true check from blk_save_vmstate
  2020-06-19 10:07 ` [PATCH 4/6] block/block-backend: remove always true check from blk_save_vmstate Denis V. Lunev
@ 2020-06-20  3:06   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-06-20  3:06 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Juan Quintela, Dr. David Alan Gilbert,
	Max Reitz, Denis Plotnikov, Stefan Hajnoczi

19.06.2020 13:07, Denis V. Lunev wrote:
> bdrv_save_vmstate() returns either error with negative return value or
> size. Thus this check is useless.
> 
> Signed-off-by: Denis V. Lunev<den@openvz.org>
> Suggested-by: Eric Blake<eblake@redhat.com>
> CC: Kevin Wolf<kwolf@redhat.com>
> CC: Max Reitz<mreitz@redhat.com>
> CC: Stefan Hajnoczi<stefanha@redhat.com>
> CC: Fam Zheng<fam@euphon.net>
> CC: Juan Quintela<quintela@redhat.com>
> CC: "Dr. David Alan Gilbert"<dgilbert@redhat.com>
> CC: Vladimir Sementsov-Ogievskiy<vsementsov@virtuozzo.com>
> CC: Denis Plotnikov<dplotnikov@virtuozzo.com>

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 7/6] block/io: improve loadvm performance
  2020-06-19 10:07 [PATCH v5 0/6] block: seriously improve savevm performance Denis V. Lunev
                   ` (6 preceding siblings ...)
  2020-06-19 11:02 ` [PATCH v5 0/6] block: seriously " no-reply
@ 2020-06-22  8:33 ` Denis V. Lunev
  2020-06-22 17:02   ` Vladimir Sementsov-Ogievskiy
  7 siblings, 1 reply; 14+ messages in thread
From: Denis V. Lunev @ 2020-06-22  8:33 UTC (permalink / raw)
  To: qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Juan Quintela, Dr. David Alan Gilbert, Max Reitz, Denis Plotnikov,
	Stefan Hajnoczi, Denis V. Lunev

This patch creates intermediate buffer for reading from block driver
state and performs read-ahead to this buffer. Snapshot code performs
reads sequentially and thus we know what offsets will be required
and when they will become not needed.

Results are fantastic. Switch to snapshot times of 2GB Fedora 31 VM
over NVME storage are the following:
                original     fixed
cached:          1.84s       1.16s
non-cached:     12.74s       1.27s

The difference over HDD would be more significant :)

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <fam@euphon.net>
CC: Juan Quintela <quintela@redhat.com>
CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
---

 block/io.c                | 225 +++++++++++++++++++++++++++++++++++++-
 include/block/block_int.h |   3 +
 2 files changed, 225 insertions(+), 3 deletions(-)

diff --git a/block/io.c b/block/io.c
index 71a696deb7..bb06f750d8 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2739,6 +2739,180 @@ static int bdrv_co_do_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
     }
 }
 
+
+typedef struct BdrvLoadVMChunk {
+    void *buf;
+    uint64_t offset;
+    ssize_t bytes;
+
+    QLIST_ENTRY(BdrvLoadVMChunk) list;
+} BdrvLoadVMChunk;
+
+typedef struct BdrvLoadVMState {
+    AioTaskPool *pool;
+
+    int64_t offset;
+    int64_t last_loaded;
+
+    int chunk_count;
+    QLIST_HEAD(, BdrvLoadVMChunk) chunks;
+    QLIST_HEAD(, BdrvLoadVMChunk) loading;
+    CoMutex lock;
+    CoQueue waiters;
+} BdrvLoadVMState;
+
+typedef struct BdrvLoadVMStateTask {
+    AioTask task;
+
+    BlockDriverState *bs;
+    BdrvLoadVMChunk *chunk;
+} BdrvLoadVMStateTask;
+
+static BdrvLoadVMChunk *bdrv_co_find_loadvmstate_chunk(int64_t pos,
+                                                       BdrvLoadVMChunk *c)
+{
+    for (; c != NULL; c = QLIST_NEXT(c, list)) {
+        if (c->offset <= pos && c->offset + c->bytes > pos) {
+            return c;
+        }
+    }
+
+    return NULL;
+}
+
+static void bdrv_free_loadvm_chunk(BdrvLoadVMChunk *c)
+{
+    qemu_vfree(c->buf);
+    g_free(c);
+}
+
+static coroutine_fn int bdrv_co_vmstate_load_task_entry(AioTask *task)
+{
+    int err = 0;
+    BdrvLoadVMStateTask *t = container_of(task, BdrvLoadVMStateTask, task);
+    BdrvLoadVMChunk *c = t->chunk;
+    BdrvLoadVMState *state = t->bs->loadvm_state;
+    QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, c->buf, c->bytes);
+
+    bdrv_inc_in_flight(t->bs);
+    err = t->bs->drv->bdrv_load_vmstate(t->bs, &qiov, c->offset);
+    bdrv_dec_in_flight(t->bs);
+
+    qemu_co_mutex_lock(&state->lock);
+    QLIST_REMOVE(c, list);
+    if (err == 0) {
+        QLIST_INSERT_HEAD(&state->chunks, c, list);
+    } else {
+        bdrv_free_loadvm_chunk(c);
+    }
+    qemu_co_mutex_unlock(&state->lock);
+    qemu_co_queue_restart_all(&state->waiters);
+
+    return err;
+}
+
+static void bdrv_co_start_loadvmstate(BlockDriverState *bs,
+                                      BdrvLoadVMState *state)
+{
+    int i;
+    size_t buf_size = MAX(bdrv_get_cluster_size(bs), 1 * MiB);
+
+    qemu_co_mutex_assert_locked(&state->lock);
+    for (i = state->chunk_count; i < BDRV_VMSTATE_WORKERS_MAX; i++) {
+        BdrvLoadVMStateTask *t = g_new(BdrvLoadVMStateTask, 1);
+
+        *t = (BdrvLoadVMStateTask) {
+            .task.func = bdrv_co_vmstate_load_task_entry,
+            .bs = bs,
+            .chunk = g_new(BdrvLoadVMChunk, 1),
+        };
+
+        *t->chunk = (BdrvLoadVMChunk) {
+            .buf = qemu_blockalign(bs, buf_size),
+            .offset = state->last_loaded,
+            .bytes = buf_size,
+        };
+        /* FIXME: tail of stream */
+
+        QLIST_INSERT_HEAD(&state->loading, t->chunk, list);
+        state->chunk_count++;
+        state->last_loaded += buf_size;
+
+        qemu_co_mutex_unlock(&state->lock);
+        aio_task_pool_start_task(state->pool, &t->task);
+        qemu_co_mutex_lock(&state->lock);
+    }
+}
+
+static int bdrv_co_do_load_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
+                                   int64_t pos)
+{
+    BdrvLoadVMState *state = bs->loadvm_state;
+    BdrvLoadVMChunk *c;
+    size_t off;
+
+    if (state == NULL) {
+        if (pos != 0) {
+            /* Normally this branch is not reachable from migration */
+            return bs->drv->bdrv_load_vmstate(bs, qiov, pos);
+        }
+
+        state = g_new(BdrvLoadVMState, 1);
+        *state = (BdrvLoadVMState) {
+            .pool = aio_task_pool_new(BDRV_VMSTATE_WORKERS_MAX),
+            .chunks = QLIST_HEAD_INITIALIZER(state->chunks),
+            .loading = QLIST_HEAD_INITIALIZER(state->loading),
+        };
+        qemu_co_mutex_init(&state->lock);
+        qemu_co_queue_init(&state->waiters);
+
+        bs->loadvm_state = state;
+    }
+
+    if (state->offset != pos) {
+        /* Normally this branch is not reachable from migration */
+        return bs->drv->bdrv_load_vmstate(bs, qiov, pos);
+    }
+
+    off = 0;
+    qemu_co_mutex_lock(&state->lock);
+    bdrv_co_start_loadvmstate(bs, state);
+
+    while (off < qiov->size && aio_task_pool_status(state->pool) == 0) {
+        c = bdrv_co_find_loadvmstate_chunk(pos, QLIST_FIRST(&state->chunks));
+        if (c != NULL) {
+            ssize_t chunk_off = pos - c->offset;
+            ssize_t to_copy = MIN(qiov->size - off, c->bytes - chunk_off);
+
+            qemu_iovec_from_buf(qiov, off, c->buf + chunk_off, to_copy);
+
+            off += to_copy;
+            pos += to_copy;
+
+            if (pos == c->offset + c->bytes) {
+                state->chunk_count--;
+                /* End of buffer, discard it from the list */
+                QLIST_REMOVE(c, list);
+                bdrv_free_loadvm_chunk(c);
+            }
+
+            state->offset += to_copy;
+            continue;
+        }
+
+        c = bdrv_co_find_loadvmstate_chunk(pos, QLIST_FIRST(&state->loading));
+        if (c != NULL) {
+            qemu_co_queue_wait(&state->waiters, &state->lock);
+            continue;
+        }
+
+        bdrv_co_start_loadvmstate(bs, state);
+    }
+    qemu_co_mutex_unlock(&state->lock);
+
+    return aio_task_pool_status(state->pool);
+}
+
 static int coroutine_fn
 bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
                    bool is_read)
@@ -2752,7 +2926,7 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
         ret = -ENOMEDIUM;
     } else if (drv->bdrv_load_vmstate) {
         if (is_read) {
-            ret = drv->bdrv_load_vmstate(bs, qiov, pos);
+            ret = bdrv_co_do_load_vmstate(bs, qiov, pos);
         } else {
             ret = bdrv_co_do_save_vmstate(bs, qiov, pos);
         }
@@ -2823,13 +2997,13 @@ int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
     return bdrv_rw_vmstate(bs, qiov, pos, true);
 }
 
-static int coroutine_fn bdrv_co_finalize_vmstate(BlockDriverState *bs)
+static int coroutine_fn bdrv_co_finalize_save_vmstate(BlockDriverState *bs)
 {
     int err;
     BdrvSaveVMState *state = bs->savevm_state;
 
     if (bs->drv->bdrv_save_vmstate == NULL && bs->file != NULL) {
-        return bdrv_co_finalize_vmstate(bs->file->bs);
+        return bdrv_co_finalize_save_vmstate(bs->file->bs);
     }
     if (state == NULL) {
         return 0;
@@ -2851,6 +3025,51 @@ static int coroutine_fn bdrv_co_finalize_vmstate(BlockDriverState *bs)
     return err;
 }
 
+static int coroutine_fn bdrv_co_finalize_load_vmstate(BlockDriverState *bs)
+{
+    int err;
+    BdrvLoadVMState *state = bs->loadvm_state;
+    BdrvLoadVMChunk *c, *tmp;
+
+    if (bs->drv->bdrv_load_vmstate == NULL && bs->file != NULL) {
+        return bdrv_co_finalize_load_vmstate(bs->file->bs);
+    }
+    if (state == NULL) {
+        return 0;
+    }
+
+    aio_task_pool_wait_all(state->pool);
+    err = aio_task_pool_status(state->pool);
+    aio_task_pool_free(state->pool);
+
+    QLIST_FOREACH(c, &state->loading, list) {
+        assert(1);  /* this list must be empty as all tasks are committed */
+    }
+    QLIST_FOREACH_SAFE(c, &state->chunks, list, tmp) {
+        QLIST_REMOVE(c, list);
+        bdrv_free_loadvm_chunk(c);
+    }
+
+    g_free(state);
+
+    bs->loadvm_state = NULL;
+
+    return err;
+}
+
+static int coroutine_fn bdrv_co_finalize_vmstate(BlockDriverState *bs)
+{
+    int err1 = bdrv_co_finalize_save_vmstate(bs);
+    int err2 = bdrv_co_finalize_load_vmstate(bs);
+    if (err1 < 0) {
+        return err1;
+    }
+    if (err2 < 0) {
+        return err2;
+    }
+    return 0;
+}
+
 static int coroutine_fn bdrv_finalize_vmstate_co_entry(void *opaque)
 {
     return bdrv_co_finalize_vmstate(opaque);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index f90f0e8b6a..0942578a74 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -788,6 +788,7 @@ struct BdrvChild {
 
 
 typedef struct BdrvSaveVMState BdrvSaveVMState;
+typedef struct BdrvLoadVMState BdrvLoadVMState;
 
 /*
  * Note: the function bdrv_append() copies and swaps contents of
@@ -955,6 +956,8 @@ struct BlockDriverState {
 
     /* Intermediate buffer for VM state saving from snapshot creation code */
     BdrvSaveVMState *savevm_state;
+    /* Intermediate buffer for VM state loading */
+    BdrvLoadVMState *loadvm_state;
 };
 
 struct BlockBackendRootState {
-- 
2.17.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 0/6] block: seriously improve savevm performance
  2020-06-19 11:02 ` [PATCH v5 0/6] block: seriously " no-reply
@ 2020-06-22  8:34   ` Denis V. Lunev
  0 siblings, 0 replies; 14+ messages in thread
From: Denis V. Lunev @ 2020-06-22  8:34 UTC (permalink / raw)
  To: qemu-devel
  Cc: kwolf, fam, vsementsov, qemu-block, quintela, dgilbert, mreitz,
	dplotnikov, stefanha

On 6/19/20 2:02 PM, no-reply@patchew.org wrote:
> Patchew URL: https://patchew.org/QEMU/20200619100708.30440-1-den@openvz.org/
>
>
>
> Hi,
>
> This series failed the asan build test. Please find the testing commands and
> their output below. If you have Docker installed, you can probably reproduce it
> locally.
>
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> export ARCH=x86_64
> make docker-image-fedora V=1 NETWORK=1
> time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
> === TEST SCRIPT END ===
>
>   GEN     docs/interop/qemu-qmp-ref.html
>   GEN     docs/interop/qemu-qmp-ref.txt
>   GEN     docs/interop/qemu-qmp-ref.7
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `  CC      qga/commands.o
>   CC      qga/guest-agent-command-state.o
>   CC      qga/main.o
>   CC      qga/commands-posix.o
> ---
>   LINK    elf2dmp
>   CC      qemu-img.o
>   AR      libvhost-user.a
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
>   AS      pc-bios/optionrom/linuxboot.o
>   CC      pc-bios/optionrom/linuxboot_dma.o
>   AS      pc-bios/optionrom/kvmvapic.o
> ---
>   SIGN    pc-bios/optionrom/pvh.bin
>   LINK    qemu-ga
>   LINK    qemu-keymap
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
>   LINK    ivshmem-client
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
>   LINK    ivshmem-server
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
>   LINK    qemu-nbd
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
>   LINK    qemu-storage-daemon
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
>   LINK    qemu-img
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
>   LINK    qemu-io
>   LINK    qemu-edid
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
>   LINK    fsdev/virtfs-proxy-helper
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
>   LINK    scsi/qemu-pr-helper
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
>   LINK    qemu-bridge-helper
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
>   LINK    virtiofsd
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
>   LINK    vhost-user-input
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
>   GEN     x86_64-softmmu/hmp-commands.h
>   GEN     x86_64-softmmu/config-devices.h
>   GEN     x86_64-softmmu/hmp-commands-info.h
> ---
>   CC      x86_64-softmmu/gdbstub-xml.o
>   CC      x86_64-softmmu/trace/generated-helpers.o
>   LINK    x86_64-softmmu/qemu-system-x86_64
> /usr/bin/ld: /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors_vfork.S.o): warning: common of `__interception::real_vfork' overridden by definition from /usr/lib64/clang/10.0.0/lib/linux/libclang_rt.asan-x86_64.a(asan_interceptors.cpp.o)
> common.rc: line 50: test: check: binary operator expected
> (printf '#define QEMU_PKGVERSION ""\n'; printf '#define QEMU_FULL_VERSION "5.0.50"\n'; ) > qemu-version.h.tmp
> make -C /tmp/qemu-test/src/slirp BUILD_DIR="/tmp/qemu-test/build/slirp" PKG_CONFIG="pkg-config" CC="clang" AR="ar"      LD="ld" RANLIB="ranlib" CFLAGS="-I/usr/include/pixman-1   -Werror -fsanitize=undefined -fsanitize=address  -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include  -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99  -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition -Wno-tautological-type-limit-compare -fstack-protector-strong   -I/usr/include/p11-kit-1   -DSTRUCT_IOVEC_DEFINED  -I/usr/include/libpng16  -I/usr/include/spice-1 -I/usr/include/spice-server -I/usr/include/cacard -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/nss3 -I/usr/include/nspr4 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/pixman-1   -I/tmp/qemu-test/src/tests -I/tmp/qemu-test/src/tests/qtest -g " LDFLAGS="-Wl,--warn-common -fsanitize=undefined -fsanitize=address -Wl,-z,relro -Wl,-z,now -pie -m64  -fstack-protector-strong"
> ---
> clang -iquote /tmp/qemu-test/build/tests -iquote tests -iquote /tmp/qemu-test/src/tcg/i386 -isystem /tmp/qemu-test/src/linux-headers -isystem /tmp/qemu-test/build/linux-headers -iquote . -iquote /tmp/qemu-test/src -iquote /tmp/qemu-test/src/accel/tcg -iquote /tmp/qemu-test/src/include -iquote /tmp/qemu-test/src/disas/libvixl -I/usr/include/pixman-1   -Werror -fsanitize=undefined -fsanitize=address  -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include  -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99  -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition -Wno-tautological-type-limit-compare -fstack-protector-strong   -I/usr/include/p11-kit-1   -DSTRUCT_IOVEC_DEFINED  -I/usr/include/libpng16  -I/usr/include/spice-1 -I/usr/include/spice-server -I/usr/include/cacard -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/nss3 -I/usr/include/nspr4 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/pixman-1   -I/tmp/qemu-test/src/tests -I/tmp/qemu-test/src/tests/qtest -MMD -MP -MT tests/test-io-task.o -MF tests/test-io-task.d -g   -c -o tests/test-io-task.o /tmp/qemu-test/src/tests/test-io-task.c
> clang -iquote /tmp/qemu-test/build/tests -iquote tests -iquote /tmp/qemu-test/src/tcg/i386 -isystem /tmp/qemu-test/src/linux-headers -isystem /tmp/qemu-test/build/linux-headers -iquote . -iquote /tmp/qemu-test/src -iquote /tmp/qemu-test/src/accel/tcg -iquote /tmp/qemu-test/src/include -iquote /tmp/qemu-test/src/disas/libvixl -I/usr/include/pixman-1   -Werror -fsanitize=undefined -fsanitize=address  -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include  -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99  -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition -Wno-tautological-type-limit-compare -fstack-protector-strong   -I/usr/include/p11-kit-1   -DSTRUCT_IOVEC_DEFINED  -I/usr/include/libpng16  -I/usr/include/spice-1 -I/usr/include/spice-server -I/usr/include/cacard -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/nss3 -I/usr/include/nspr4 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/pixman-1   -I/tmp/qemu-test/src/tests -I/tmp/qemu-test/src/tests/qtest -MMD -MP -MT tests/test-io-channel-socket.o -MF tests/test-io-channel-socket.d -g   -c -o tests/test-io-channel-socket.o /tmp/qemu-test/src/tests/test-io-channel-socket.c
> clang -iquote /tmp/qemu-test/build/tests -iquote tests -iquote /tmp/qemu-test/src/tcg/i386 -isystem /tmp/qemu-test/src/linux-headers -isystem /tmp/qemu-test/build/linux-headers -iquote . -iquote /tmp/qemu-test/src -iquote /tmp/qemu-test/src/accel/tcg -iquote /tmp/qemu-test/src/include -iquote /tmp/qemu-test/src/disas/libvixl -I/usr/include/pixman-1   -Werror -fsanitize=undefined -fsanitize=address  -pthread -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include  -fPIE -DPIE -m64 -mcx16 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -std=gnu99  -Wold-style-definition -Wtype-limits -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wempty-body -Wnested-externs -Wendif-labels -Wexpansion-to-defined -Wno-initializer-overrides -Wno-missing-include-dirs -Wno-shift-negative-value -Wno-string-plus-int -Wno-typedef-redefinition -Wno-tautological-type-limit-compare -fstack-protector-strong   -I/usr/include/p11-kit-1   -DSTRUCT_IOVEC_DEFINED  -I/usr/include/libpng16  -I/usr/include/spice-1 -I/usr/include/spice-server -I/usr/include/cacard -I/usr/include/glib-2.0 -I/usr/lib64/glib-2.0/include -I/usr/include/nss3 -I/usr/include/nspr4 -pthread -I/usr/include/libmount -I/usr/include/blkid -I/usr/include/pixman-1   -I/tmp/qemu-test/src/tests -I/tmp/qemu-test/src/tests/qtest -MMD -MP -MT tests/io-channel-helpers.o -MF tests/io-channel-helpers.d -g   -c -o tests/io-channel-helpers.o /tmp/qemu-test/src/tests/io-channel-helpers.c
> /tmp/qemu-test/src/tests/qht-bench.c:287:29: error: implicit conversion from 'unsigned long' to 'double' changes value from 18446744073709551615 to 18446744073709551616 [-Werror,-Wimplicit-int-float-conversion]
>         *threshold = rate * UINT64_MAX;
>                           ~ ^~~~~~~~~~
> /usr/include/stdint.h:130:23: note: expanded from macro 'UINT64_MAX'
> ---
> 18446744073709551615UL
> ^~~~~~~~~~~~~~~~~~~~~~
> 1 error generated.
> make: *** [/tmp/qemu-test/src/rules.mak:69: tests/qht-bench.o] Error 1
> make: *** Waiting for unfinished jobs....
> Traceback (most recent call last):
>   File "./tests/docker/docker.py", line 669, in <module>
> ---
>     raise CalledProcessError(retcode, cmd)
> subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=c67c2e4d508c4d55a517b3da9a7e6c35', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=x86_64-softmmu', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-k4xm9z3a/src/docker-src.2020-06-19-06.55.27.1715:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-debug']' returned non-zero exit status 2.
> filter=--filter=label=com.qemu.instance.uuid=c67c2e4d508c4d55a517b3da9a7e6c35
> make[1]: *** [docker-run] Error 1
> make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-k4xm9z3a/src'
> make: *** [docker-run-test-debug@fedora] Error 2
>
> real    7m2.468s
> user    0m9.043s
>
>
> The full log is available at
> http://patchew.org/logs/20200619100708.30440-1-den@openvz.org/testing.asan/?type=message.
> ---
> Email generated automatically by Patchew [https://patchew.org/].
> Please send your feedback to patchew-devel@redhat.com
reported problems does not look patchset related :(



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 7/6] block/io: improve loadvm performance
  2020-06-22  8:33 ` [PATCH 7/6] block/io: improve loadvm performance Denis V. Lunev
@ 2020-06-22 17:02   ` Vladimir Sementsov-Ogievskiy
  2020-06-22 17:17     ` Denis V. Lunev
  0 siblings, 1 reply; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-06-22 17:02 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Juan Quintela, Dr. David Alan Gilbert,
	Max Reitz, Denis Plotnikov, Stefan Hajnoczi

22.06.2020 11:33, Denis V. Lunev wrote:
> This patch creates intermediate buffer for reading from block driver
> state and performs read-ahead to this buffer. Snapshot code performs
> reads sequentially and thus we know what offsets will be required
> and when they will become not needed.
> 
> Results are fantastic. Switch to snapshot times of 2GB Fedora 31 VM
> over NVME storage are the following:
>                  original     fixed
> cached:          1.84s       1.16s
> non-cached:     12.74s       1.27s
> 
> The difference over HDD would be more significant :)
> 
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Max Reitz <mreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Fam Zheng <fam@euphon.net>
> CC: Juan Quintela <quintela@redhat.com>
> CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
> ---
> 
>   block/io.c                | 225 +++++++++++++++++++++++++++++++++++++-
>   include/block/block_int.h |   3 +
>   2 files changed, 225 insertions(+), 3 deletions(-)
> 
> diff --git a/block/io.c b/block/io.c
> index 71a696deb7..bb06f750d8 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2739,6 +2739,180 @@ static int bdrv_co_do_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
>       }
>   }
>   
> +
> +typedef struct BdrvLoadVMChunk {
> +    void *buf;
> +    uint64_t offset;
> +    ssize_t bytes;
> +
> +    QLIST_ENTRY(BdrvLoadVMChunk) list;
> +} BdrvLoadVMChunk;
> +
> +typedef struct BdrvLoadVMState {
> +    AioTaskPool *pool;
> +
> +    int64_t offset;
> +    int64_t last_loaded;
> +
> +    int chunk_count;
> +    QLIST_HEAD(, BdrvLoadVMChunk) chunks;
> +    QLIST_HEAD(, BdrvLoadVMChunk) loading;
> +    CoMutex lock;
> +    CoQueue waiters;
> +} BdrvLoadVMState;
> +
> +typedef struct BdrvLoadVMStateTask {
> +    AioTask task;
> +
> +    BlockDriverState *bs;
> +    BdrvLoadVMChunk *chunk;
> +} BdrvLoadVMStateTask;
> +
> +static BdrvLoadVMChunk *bdrv_co_find_loadvmstate_chunk(int64_t pos,
> +                                                       BdrvLoadVMChunk *c)
> +{
> +    for (; c != NULL; c = QLIST_NEXT(c, list)) {
> +        if (c->offset <= pos && c->offset + c->bytes > pos) {
> +            return c;
> +        }
> +    }
> +
> +    return NULL;
> +}
> +
> +static void bdrv_free_loadvm_chunk(BdrvLoadVMChunk *c)
> +{
> +    qemu_vfree(c->buf);
> +    g_free(c);
> +}
> +
> +static coroutine_fn int bdrv_co_vmstate_load_task_entry(AioTask *task)
> +{
> +    int err = 0;
> +    BdrvLoadVMStateTask *t = container_of(task, BdrvLoadVMStateTask, task);
> +    BdrvLoadVMChunk *c = t->chunk;
> +    BdrvLoadVMState *state = t->bs->loadvm_state;
> +    QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, c->buf, c->bytes);
> +
> +    bdrv_inc_in_flight(t->bs);
> +    err = t->bs->drv->bdrv_load_vmstate(t->bs, &qiov, c->offset);
> +    bdrv_dec_in_flight(t->bs);
> +
> +    qemu_co_mutex_lock(&state->lock);
> +    QLIST_REMOVE(c, list);
> +    if (err == 0) {
> +        QLIST_INSERT_HEAD(&state->chunks, c, list);
> +    } else {
> +        bdrv_free_loadvm_chunk(c);
> +    }
> +    qemu_co_mutex_unlock(&state->lock);
> +    qemu_co_queue_restart_all(&state->waiters);
> +
> +    return err;
> +}
> +
> +static void bdrv_co_start_loadvmstate(BlockDriverState *bs,
> +                                      BdrvLoadVMState *state)
> +{
> +    int i;
> +    size_t buf_size = MAX(bdrv_get_cluster_size(bs), 1 * MiB);
> +
> +    qemu_co_mutex_assert_locked(&state->lock);
> +    for (i = state->chunk_count; i < BDRV_VMSTATE_WORKERS_MAX; i++) {
> +        BdrvLoadVMStateTask *t = g_new(BdrvLoadVMStateTask, 1);
> +
> +        *t = (BdrvLoadVMStateTask) {
> +            .task.func = bdrv_co_vmstate_load_task_entry,
> +            .bs = bs,
> +            .chunk = g_new(BdrvLoadVMChunk, 1),
> +        };
> +
> +        *t->chunk = (BdrvLoadVMChunk) {
> +            .buf = qemu_blockalign(bs, buf_size),
> +            .offset = state->last_loaded,
> +            .bytes = buf_size,
> +        };
> +        /* FIXME: tail of stream */
> +
> +        QLIST_INSERT_HEAD(&state->loading, t->chunk, list);
> +        state->chunk_count++;
> +        state->last_loaded += buf_size;
> +
> +        qemu_co_mutex_unlock(&state->lock);
> +        aio_task_pool_start_task(state->pool, &t->task);

Hmm, so, we unlock the mutex here. Am I missing something, or if there are
two parallel load-state requests in parallel (yes, this shouldn't happen
with current migration code, but if don't care at all the mutex is not needed),

we may have two such loops in parallel, which will create more than
BDRV_VMSTATE_WORKERS_MAX chunks, as each has own local "i" variable?

> +        qemu_co_mutex_lock(&state->lock);
> +    }
> +}
> +
> +static int bdrv_co_do_load_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
> +                                   int64_t pos)
> +{
> +    BdrvLoadVMState *state = bs->loadvm_state;
> +    BdrvLoadVMChunk *c;
> +    size_t off;
> +
> +    if (state == NULL) {
> +        if (pos != 0) {
> +            /* Normally this branch is not reachable from migration */
> +            return bs->drv->bdrv_load_vmstate(bs, qiov, pos);
> +        }
> +
> +        state = g_new(BdrvLoadVMState, 1);
> +        *state = (BdrvLoadVMState) {
> +            .pool = aio_task_pool_new(BDRV_VMSTATE_WORKERS_MAX),
> +            .chunks = QLIST_HEAD_INITIALIZER(state->chunks),
> +            .loading = QLIST_HEAD_INITIALIZER(state->loading),
> +        };
> +        qemu_co_mutex_init(&state->lock);
> +        qemu_co_queue_init(&state->waiters);
> +
> +        bs->loadvm_state = state;
> +    }
> +
> +    if (state->offset != pos) {
> +        /* Normally this branch is not reachable from migration */
> +        return bs->drv->bdrv_load_vmstate(bs, qiov, pos);
> +    }
> +
> +    off = 0;
> +    qemu_co_mutex_lock(&state->lock);
> +    bdrv_co_start_loadvmstate(bs, state);
> +
> +    while (off < qiov->size && aio_task_pool_status(state->pool) == 0) {
> +        c = bdrv_co_find_loadvmstate_chunk(pos, QLIST_FIRST(&state->chunks));
> +        if (c != NULL) {
> +            ssize_t chunk_off = pos - c->offset;
> +            ssize_t to_copy = MIN(qiov->size - off, c->bytes - chunk_off);
> +
> +            qemu_iovec_from_buf(qiov, off, c->buf + chunk_off, to_copy);
> +
> +            off += to_copy;
> +            pos += to_copy;
> +
> +            if (pos == c->offset + c->bytes) {
> +                state->chunk_count--;
> +                /* End of buffer, discard it from the list */
> +                QLIST_REMOVE(c, list);
> +                bdrv_free_loadvm_chunk(c);
> +            }
> +
> +            state->offset += to_copy;
> +            continue;
> +        }
> +
> +        c = bdrv_co_find_loadvmstate_chunk(pos, QLIST_FIRST(&state->loading));
> +        if (c != NULL) {
> +            qemu_co_queue_wait(&state->waiters, &state->lock);
> +            continue;
> +        }
> +
> +        bdrv_co_start_loadvmstate(bs, state);

Hmm, so, you assume that if we are in a loop, we'll definitely find our chunk at some moment.
It should work with normal migration stream, sequentially reading chunk by chunk. But if not,
it may hang:

Consider two reads at same position, running in parallel: they both pass "state->offset != pos"
check and wait on mutex. Then the first one will go into critical section, find and remove
corresponding chunk. Then the second will go into critical section and loop forever, because
corresponding chunk will never be generated again.

So, seems, we should check (state->offset == pos) on each iteration, or something like this.

====

Another idea came into my mind is that you create new requests when we can't find our chunk.
It may be a bit more optimal to create new requests as soon as we get a free slot after
QLIST_REMOVE(..)

And, if we do so, we don't need "loading" list: if there is no corresponding chunk in chunks,
we will just wait in "waiters" queue.



> +    }
> +    qemu_co_mutex_unlock(&state->lock);
> +
> +    return aio_task_pool_status(state->pool);
> +}
> +
>   static int coroutine_fn
>   bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
>                      bool is_read)
> @@ -2752,7 +2926,7 @@ bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos,
>           ret = -ENOMEDIUM;
>       } else if (drv->bdrv_load_vmstate) {
>           if (is_read) {
> -            ret = drv->bdrv_load_vmstate(bs, qiov, pos);
> +            ret = bdrv_co_do_load_vmstate(bs, qiov, pos);
>           } else {
>               ret = bdrv_co_do_save_vmstate(bs, qiov, pos);
>           }
> @@ -2823,13 +2997,13 @@ int bdrv_readv_vmstate(BlockDriverState *bs, QEMUIOVector *qiov, int64_t pos)
>       return bdrv_rw_vmstate(bs, qiov, pos, true);
>   }
>   
> -static int coroutine_fn bdrv_co_finalize_vmstate(BlockDriverState *bs)
> +static int coroutine_fn bdrv_co_finalize_save_vmstate(BlockDriverState *bs)
>   {
>       int err;
>       BdrvSaveVMState *state = bs->savevm_state;
>   
>       if (bs->drv->bdrv_save_vmstate == NULL && bs->file != NULL) {
> -        return bdrv_co_finalize_vmstate(bs->file->bs);
> +        return bdrv_co_finalize_save_vmstate(bs->file->bs);
>       }
>       if (state == NULL) {
>           return 0;
> @@ -2851,6 +3025,51 @@ static int coroutine_fn bdrv_co_finalize_vmstate(BlockDriverState *bs)
>       return err;
>   }
>   
> +static int coroutine_fn bdrv_co_finalize_load_vmstate(BlockDriverState *bs)
> +{
> +    int err;
> +    BdrvLoadVMState *state = bs->loadvm_state;
> +    BdrvLoadVMChunk *c, *tmp;
> +
> +    if (bs->drv->bdrv_load_vmstate == NULL && bs->file != NULL) {
> +        return bdrv_co_finalize_load_vmstate(bs->file->bs);
> +    }
> +    if (state == NULL) {
> +        return 0;
> +    }
> +
> +    aio_task_pool_wait_all(state->pool);
> +    err = aio_task_pool_status(state->pool);
> +    aio_task_pool_free(state->pool);
> +
> +    QLIST_FOREACH(c, &state->loading, list) {
> +        assert(1);  /* this list must be empty as all tasks are committed */

assert(1) never fail, abort() you mean..

or better assert(QLIST_EMPTY(&state->loading))

> +    }
> +    QLIST_FOREACH_SAFE(c, &state->chunks, list, tmp) {
> +        QLIST_REMOVE(c, list);
> +        bdrv_free_loadvm_chunk(c);
> +    }
> +
> +    g_free(state);
> +
> +    bs->loadvm_state = NULL;
> +
> +    return err;
> +}
> +
> +static int coroutine_fn bdrv_co_finalize_vmstate(BlockDriverState *bs)
> +{
> +    int err1 = bdrv_co_finalize_save_vmstate(bs);
> +    int err2 = bdrv_co_finalize_load_vmstate(bs);
> +    if (err1 < 0) {
> +        return err1;
> +    }
> +    if (err2 < 0) {
> +        return err2;
> +    }
> +    return 0;
> +}
> +
>   static int coroutine_fn bdrv_finalize_vmstate_co_entry(void *opaque)
>   {
>       return bdrv_co_finalize_vmstate(opaque);
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index f90f0e8b6a..0942578a74 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -788,6 +788,7 @@ struct BdrvChild {
>   
>   
>   typedef struct BdrvSaveVMState BdrvSaveVMState;
> +typedef struct BdrvLoadVMState BdrvLoadVMState;
>   
>   /*
>    * Note: the function bdrv_append() copies and swaps contents of
> @@ -955,6 +956,8 @@ struct BlockDriverState {
>   
>       /* Intermediate buffer for VM state saving from snapshot creation code */
>       BdrvSaveVMState *savevm_state;
> +    /* Intermediate buffer for VM state loading */
> +    BdrvLoadVMState *loadvm_state;
>   };
>   
>   struct BlockBackendRootState {
> 

Also, need call finalize from blk_load_vmstate(), like it's done for blk_save_vmstate().

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 7/6] block/io: improve loadvm performance
  2020-06-22 17:02   ` Vladimir Sementsov-Ogievskiy
@ 2020-06-22 17:17     ` Denis V. Lunev
  2020-06-22 18:13       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 14+ messages in thread
From: Denis V. Lunev @ 2020-06-22 17:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Juan Quintela, Dr. David Alan Gilbert,
	Max Reitz, Denis Plotnikov, Stefan Hajnoczi

On 6/22/20 8:02 PM, Vladimir Sementsov-Ogievskiy wrote:
> 22.06.2020 11:33, Denis V. Lunev wrote:
>> This patch creates intermediate buffer for reading from block driver
>> state and performs read-ahead to this buffer. Snapshot code performs
>> reads sequentially and thus we know what offsets will be required
>> and when they will become not needed.
>>
>> Results are fantastic. Switch to snapshot times of 2GB Fedora 31 VM
>> over NVME storage are the following:
>>                  original     fixed
>> cached:          1.84s       1.16s
>> non-cached:     12.74s       1.27s
>>
>> The difference over HDD would be more significant :)
>>
>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>> CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> CC: Kevin Wolf <kwolf@redhat.com>
>> CC: Max Reitz <mreitz@redhat.com>
>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>> CC: Fam Zheng <fam@euphon.net>
>> CC: Juan Quintela <quintela@redhat.com>
>> CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>> CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
>> ---
>>
>>   block/io.c                | 225 +++++++++++++++++++++++++++++++++++++-
>>   include/block/block_int.h |   3 +
>>   2 files changed, 225 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/io.c b/block/io.c
>> index 71a696deb7..bb06f750d8 100644
>> --- a/block/io.c
>> +++ b/block/io.c
>> @@ -2739,6 +2739,180 @@ static int
>> bdrv_co_do_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
>>       }
>>   }
>>   +
>> +typedef struct BdrvLoadVMChunk {
>> +    void *buf;
>> +    uint64_t offset;
>> +    ssize_t bytes;
>> +
>> +    QLIST_ENTRY(BdrvLoadVMChunk) list;
>> +} BdrvLoadVMChunk;
>> +
>> +typedef struct BdrvLoadVMState {
>> +    AioTaskPool *pool;
>> +
>> +    int64_t offset;
>> +    int64_t last_loaded;
>> +
>> +    int chunk_count;
>> +    QLIST_HEAD(, BdrvLoadVMChunk) chunks;
>> +    QLIST_HEAD(, BdrvLoadVMChunk) loading;
>> +    CoMutex lock;
>> +    CoQueue waiters;
>> +} BdrvLoadVMState;
>> +
>> +typedef struct BdrvLoadVMStateTask {
>> +    AioTask task;
>> +
>> +    BlockDriverState *bs;
>> +    BdrvLoadVMChunk *chunk;
>> +} BdrvLoadVMStateTask;
>> +
>> +static BdrvLoadVMChunk *bdrv_co_find_loadvmstate_chunk(int64_t pos,
>> +                                                      
>> BdrvLoadVMChunk *c)
>> +{
>> +    for (; c != NULL; c = QLIST_NEXT(c, list)) {
>> +        if (c->offset <= pos && c->offset + c->bytes > pos) {
>> +            return c;
>> +        }
>> +    }
>> +
>> +    return NULL;
>> +}
>> +
>> +static void bdrv_free_loadvm_chunk(BdrvLoadVMChunk *c)
>> +{
>> +    qemu_vfree(c->buf);
>> +    g_free(c);
>> +}
>> +
>> +static coroutine_fn int bdrv_co_vmstate_load_task_entry(AioTask *task)
>> +{
>> +    int err = 0;
>> +    BdrvLoadVMStateTask *t = container_of(task, BdrvLoadVMStateTask,
>> task);
>> +    BdrvLoadVMChunk *c = t->chunk;
>> +    BdrvLoadVMState *state = t->bs->loadvm_state;
>> +    QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, c->buf, c->bytes);
>> +
>> +    bdrv_inc_in_flight(t->bs);
>> +    err = t->bs->drv->bdrv_load_vmstate(t->bs, &qiov, c->offset);
>> +    bdrv_dec_in_flight(t->bs);
>> +
>> +    qemu_co_mutex_lock(&state->lock);
>> +    QLIST_REMOVE(c, list);
>> +    if (err == 0) {
>> +        QLIST_INSERT_HEAD(&state->chunks, c, list);
>> +    } else {
>> +        bdrv_free_loadvm_chunk(c);
>> +    }
>> +    qemu_co_mutex_unlock(&state->lock);
>> +    qemu_co_queue_restart_all(&state->waiters);
>> +
>> +    return err;
>> +}
>> +
>> +static void bdrv_co_start_loadvmstate(BlockDriverState *bs,
>> +                                      BdrvLoadVMState *state)
>> +{
>> +    int i;
>> +    size_t buf_size = MAX(bdrv_get_cluster_size(bs), 1 * MiB);
>> +
>> +    qemu_co_mutex_assert_locked(&state->lock);
>> +    for (i = state->chunk_count; i < BDRV_VMSTATE_WORKERS_MAX; i++) {
>> +        BdrvLoadVMStateTask *t = g_new(BdrvLoadVMStateTask, 1);
>> +
>> +        *t = (BdrvLoadVMStateTask) {
>> +            .task.func = bdrv_co_vmstate_load_task_entry,
>> +            .bs = bs,
>> +            .chunk = g_new(BdrvLoadVMChunk, 1),
>> +        };
>> +
>> +        *t->chunk = (BdrvLoadVMChunk) {
>> +            .buf = qemu_blockalign(bs, buf_size),
>> +            .offset = state->last_loaded,
>> +            .bytes = buf_size,
>> +        };
>> +        /* FIXME: tail of stream */
>> +
>> +        QLIST_INSERT_HEAD(&state->loading, t->chunk, list);
>> +        state->chunk_count++;
>> +        state->last_loaded += buf_size;
>> +
>> +        qemu_co_mutex_unlock(&state->lock);
>> +        aio_task_pool_start_task(state->pool, &t->task);
>
> Hmm, so, we unlock the mutex here. Am I missing something, or if there
> are
> two parallel load-state requests in parallel (yes, this shouldn't happen
> with current migration code, but if don't care at all the mutex is not
> needed),
>
> we may have two such loops in parallel, which will create more than
> BDRV_VMSTATE_WORKERS_MAX chunks, as each has own local "i" variable?

I could stall on waiting free slot in the pool (technically)
while pool completion will need the lock. This should
NOT happen with the current arch, but would be better
to do things correctly, i.e. drop the lock.

>
>> +        qemu_co_mutex_lock(&state->lock);
>> +    }
>> +}
>> +
>> +static int bdrv_co_do_load_vmstate(BlockDriverState *bs,
>> QEMUIOVector *qiov,
>> +                                   int64_t pos)
>> +{
>> +    BdrvLoadVMState *state = bs->loadvm_state;
>> +    BdrvLoadVMChunk *c;
>> +    size_t off;
>> +
>> +    if (state == NULL) {
>> +        if (pos != 0) {
>> +            /* Normally this branch is not reachable from migration */
>> +            return bs->drv->bdrv_load_vmstate(bs, qiov, pos);
>> +        }
>> +
>> +        state = g_new(BdrvLoadVMState, 1);
>> +        *state = (BdrvLoadVMState) {
>> +            .pool = aio_task_pool_new(BDRV_VMSTATE_WORKERS_MAX),
>> +            .chunks = QLIST_HEAD_INITIALIZER(state->chunks),
>> +            .loading = QLIST_HEAD_INITIALIZER(state->loading),
>> +        };
>> +        qemu_co_mutex_init(&state->lock);
>> +        qemu_co_queue_init(&state->waiters);
>> +
>> +        bs->loadvm_state = state;
>> +    }
>> +
>> +    if (state->offset != pos) {
>> +        /* Normally this branch is not reachable from migration */
>> +        return bs->drv->bdrv_load_vmstate(bs, qiov, pos);
>> +    }
>> +
>> +    off = 0;
>> +    qemu_co_mutex_lock(&state->lock);
>> +    bdrv_co_start_loadvmstate(bs, state);
>> +
>> +    while (off < qiov->size && aio_task_pool_status(state->pool) ==
>> 0) {
>> +        c = bdrv_co_find_loadvmstate_chunk(pos,
>> QLIST_FIRST(&state->chunks));
>> +        if (c != NULL) {
>> +            ssize_t chunk_off = pos - c->offset;
>> +            ssize_t to_copy = MIN(qiov->size - off, c->bytes -
>> chunk_off);
>> +
>> +            qemu_iovec_from_buf(qiov, off, c->buf + chunk_off,
>> to_copy);
>> +
>> +            off += to_copy;
>> +            pos += to_copy;
>> +
>> +            if (pos == c->offset + c->bytes) {
>> +                state->chunk_count--;
>> +                /* End of buffer, discard it from the list */
>> +                QLIST_REMOVE(c, list);
>> +                bdrv_free_loadvm_chunk(c);
>> +            }
>> +
>> +            state->offset += to_copy;
>> +            continue;
>> +        }
>> +
>> +        c = bdrv_co_find_loadvmstate_chunk(pos,
>> QLIST_FIRST(&state->loading));
>> +        if (c != NULL) {
>> +            qemu_co_queue_wait(&state->waiters, &state->lock);
>> +            continue;
>> +        }
>> +
>> +        bdrv_co_start_loadvmstate(bs, state);
>
> Hmm, so, you assume that if we are in a loop, we'll definitely find
> our chunk at some moment.
> It should work with normal migration stream, sequentially reading
> chunk by chunk. But if not,
> it may hang:
>
> Consider two reads at same position, running in parallel: they both
> pass "state->offset != pos"
> check and wait on mutex. Then the first one will go into critical
> section, find and remove
> corresponding chunk. Then the second will go into critical section and
> loop forever, because
> corresponding chunk will never be generated again.
>
> So, seems, we should check (state->offset == pos) on each iteration,
> or something like this.
>
we can. But the reader is synchronous and could not
issue 2 requests. Not a problem.


> ====
>
> Another idea came into my mind is that you create new requests when we
> can't find our chunk.
> It may be a bit more optimal to create new requests as soon as we get
> a free slot after
> QLIST_REMOVE(..)
>
> And, if we do so, we don't need "loading" list: if there is no
> corresponding chunk in chunks,
> we will just wait in "waiters" queue.
>
no. I would prefer to keep all requests in some lists.
This would save a ton of time if something will
go wrong.

But the idea of refilling near REMOVE is fine to
work with.

>
>
>> +    }
>> +    qemu_co_mutex_unlock(&state->lock);
>> +
>> +    return aio_task_pool_status(state->pool);
>> +}
>> +
>>   static int coroutine_fn
>>   bdrv_co_rw_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
>> int64_t pos,
>>                      bool is_read)
>> @@ -2752,7 +2926,7 @@ bdrv_co_rw_vmstate(BlockDriverState *bs,
>> QEMUIOVector *qiov, int64_t pos,
>>           ret = -ENOMEDIUM;
>>       } else if (drv->bdrv_load_vmstate) {
>>           if (is_read) {
>> -            ret = drv->bdrv_load_vmstate(bs, qiov, pos);
>> +            ret = bdrv_co_do_load_vmstate(bs, qiov, pos);
>>           } else {
>>               ret = bdrv_co_do_save_vmstate(bs, qiov, pos);
>>           }
>> @@ -2823,13 +2997,13 @@ int bdrv_readv_vmstate(BlockDriverState *bs,
>> QEMUIOVector *qiov, int64_t pos)
>>       return bdrv_rw_vmstate(bs, qiov, pos, true);
>>   }
>>   -static int coroutine_fn bdrv_co_finalize_vmstate(BlockDriverState
>> *bs)
>> +static int coroutine_fn
>> bdrv_co_finalize_save_vmstate(BlockDriverState *bs)
>>   {
>>       int err;
>>       BdrvSaveVMState *state = bs->savevm_state;
>>         if (bs->drv->bdrv_save_vmstate == NULL && bs->file != NULL) {
>> -        return bdrv_co_finalize_vmstate(bs->file->bs);
>> +        return bdrv_co_finalize_save_vmstate(bs->file->bs);
>>       }
>>       if (state == NULL) {
>>           return 0;
>> @@ -2851,6 +3025,51 @@ static int coroutine_fn
>> bdrv_co_finalize_vmstate(BlockDriverState *bs)
>>       return err;
>>   }
>>   +static int coroutine_fn
>> bdrv_co_finalize_load_vmstate(BlockDriverState *bs)
>> +{
>> +    int err;
>> +    BdrvLoadVMState *state = bs->loadvm_state;
>> +    BdrvLoadVMChunk *c, *tmp;
>> +
>> +    if (bs->drv->bdrv_load_vmstate == NULL && bs->file != NULL) {
>> +        return bdrv_co_finalize_load_vmstate(bs->file->bs);
>> +    }
>> +    if (state == NULL) {
>> +        return 0;
>> +    }
>> +
>> +    aio_task_pool_wait_all(state->pool);
>> +    err = aio_task_pool_status(state->pool);
>> +    aio_task_pool_free(state->pool);
>> +
>> +    QLIST_FOREACH(c, &state->loading, list) {
>> +        assert(1);  /* this list must be empty as all tasks are
>> committed */
>
> assert(1) never fail, abort() you mean..
>
> or better assert(QLIST_EMPTY(&state->loading))
>
yep!

>> +    }
>> +    QLIST_FOREACH_SAFE(c, &state->chunks, list, tmp) {
>> +        QLIST_REMOVE(c, list);
>> +        bdrv_free_loadvm_chunk(c);
>> +    }
>> +
>> +    g_free(state);
>> +
>> +    bs->loadvm_state = NULL;
>> +
>> +    return err;
>> +}
>> +
>> +static int coroutine_fn bdrv_co_finalize_vmstate(BlockDriverState *bs)
>> +{
>> +    int err1 = bdrv_co_finalize_save_vmstate(bs);
>> +    int err2 = bdrv_co_finalize_load_vmstate(bs);
>> +    if (err1 < 0) {
>> +        return err1;
>> +    }
>> +    if (err2 < 0) {
>> +        return err2;
>> +    }
>> +    return 0;
>> +}
>> +
>>   static int coroutine_fn bdrv_finalize_vmstate_co_entry(void *opaque)
>>   {
>>       return bdrv_co_finalize_vmstate(opaque);
>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>> index f90f0e8b6a..0942578a74 100644
>> --- a/include/block/block_int.h
>> +++ b/include/block/block_int.h
>> @@ -788,6 +788,7 @@ struct BdrvChild {
>>       typedef struct BdrvSaveVMState BdrvSaveVMState;
>> +typedef struct BdrvLoadVMState BdrvLoadVMState;
>>     /*
>>    * Note: the function bdrv_append() copies and swaps contents of
>> @@ -955,6 +956,8 @@ struct BlockDriverState {
>>         /* Intermediate buffer for VM state saving from snapshot
>> creation code */
>>       BdrvSaveVMState *savevm_state;
>> +    /* Intermediate buffer for VM state loading */
>> +    BdrvLoadVMState *loadvm_state;
>>   };
>>     struct BlockBackendRootState {
>>
>
> Also, need call finalize from blk_load_vmstate(), like it's done for
> blk_save_vmstate().
>
good catch!


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 7/6] block/io: improve loadvm performance
  2020-06-22 17:17     ` Denis V. Lunev
@ 2020-06-22 18:13       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 14+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2020-06-22 18:13 UTC (permalink / raw)
  To: Denis V. Lunev, qemu-block, qemu-devel
  Cc: Kevin Wolf, Fam Zheng, Juan Quintela, Dr. David Alan Gilbert,
	Max Reitz, Denis Plotnikov, Stefan Hajnoczi

22.06.2020 20:17, Denis V. Lunev wrote:
> On 6/22/20 8:02 PM, Vladimir Sementsov-Ogievskiy wrote:
>> 22.06.2020 11:33, Denis V. Lunev wrote:
>>> This patch creates intermediate buffer for reading from block driver
>>> state and performs read-ahead to this buffer. Snapshot code performs
>>> reads sequentially and thus we know what offsets will be required
>>> and when they will become not needed.
>>>
>>> Results are fantastic. Switch to snapshot times of 2GB Fedora 31 VM
>>> over NVME storage are the following:
>>>                   original     fixed
>>> cached:          1.84s       1.16s
>>> non-cached:     12.74s       1.27s
>>>
>>> The difference over HDD would be more significant :)
>>>
>>> Signed-off-by: Denis V. Lunev <den@openvz.org>
>>> CC: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>> CC: Kevin Wolf <kwolf@redhat.com>
>>> CC: Max Reitz <mreitz@redhat.com>
>>> CC: Stefan Hajnoczi <stefanha@redhat.com>
>>> CC: Fam Zheng <fam@euphon.net>
>>> CC: Juan Quintela <quintela@redhat.com>
>>> CC: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
>>> CC: Denis Plotnikov <dplotnikov@virtuozzo.com>
>>> ---
>>>
>>>    block/io.c                | 225 +++++++++++++++++++++++++++++++++++++-
>>>    include/block/block_int.h |   3 +
>>>    2 files changed, 225 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/block/io.c b/block/io.c
>>> index 71a696deb7..bb06f750d8 100644
>>> --- a/block/io.c
>>> +++ b/block/io.c
>>> @@ -2739,6 +2739,180 @@ static int
>>> bdrv_co_do_save_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
>>>        }
>>>    }
>>>    +
>>> +typedef struct BdrvLoadVMChunk {
>>> +    void *buf;
>>> +    uint64_t offset;
>>> +    ssize_t bytes;
>>> +
>>> +    QLIST_ENTRY(BdrvLoadVMChunk) list;
>>> +} BdrvLoadVMChunk;
>>> +
>>> +typedef struct BdrvLoadVMState {
>>> +    AioTaskPool *pool;
>>> +
>>> +    int64_t offset;
>>> +    int64_t last_loaded;
>>> +
>>> +    int chunk_count;
>>> +    QLIST_HEAD(, BdrvLoadVMChunk) chunks;
>>> +    QLIST_HEAD(, BdrvLoadVMChunk) loading;
>>> +    CoMutex lock;
>>> +    CoQueue waiters;
>>> +} BdrvLoadVMState;
>>> +
>>> +typedef struct BdrvLoadVMStateTask {
>>> +    AioTask task;
>>> +
>>> +    BlockDriverState *bs;
>>> +    BdrvLoadVMChunk *chunk;
>>> +} BdrvLoadVMStateTask;
>>> +
>>> +static BdrvLoadVMChunk *bdrv_co_find_loadvmstate_chunk(int64_t pos,
>>> +
>>> BdrvLoadVMChunk *c)
>>> +{
>>> +    for (; c != NULL; c = QLIST_NEXT(c, list)) {
>>> +        if (c->offset <= pos && c->offset + c->bytes > pos) {
>>> +            return c;
>>> +        }
>>> +    }
>>> +
>>> +    return NULL;
>>> +}
>>> +
>>> +static void bdrv_free_loadvm_chunk(BdrvLoadVMChunk *c)
>>> +{
>>> +    qemu_vfree(c->buf);
>>> +    g_free(c);
>>> +}
>>> +
>>> +static coroutine_fn int bdrv_co_vmstate_load_task_entry(AioTask *task)
>>> +{
>>> +    int err = 0;
>>> +    BdrvLoadVMStateTask *t = container_of(task, BdrvLoadVMStateTask,
>>> task);
>>> +    BdrvLoadVMChunk *c = t->chunk;
>>> +    BdrvLoadVMState *state = t->bs->loadvm_state;
>>> +    QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, c->buf, c->bytes);
>>> +
>>> +    bdrv_inc_in_flight(t->bs);
>>> +    err = t->bs->drv->bdrv_load_vmstate(t->bs, &qiov, c->offset);
>>> +    bdrv_dec_in_flight(t->bs);
>>> +
>>> +    qemu_co_mutex_lock(&state->lock);
>>> +    QLIST_REMOVE(c, list);
>>> +    if (err == 0) {
>>> +        QLIST_INSERT_HEAD(&state->chunks, c, list);
>>> +    } else {
>>> +        bdrv_free_loadvm_chunk(c);
>>> +    }
>>> +    qemu_co_mutex_unlock(&state->lock);
>>> +    qemu_co_queue_restart_all(&state->waiters);
>>> +
>>> +    return err;
>>> +}
>>> +
>>> +static void bdrv_co_start_loadvmstate(BlockDriverState *bs,
>>> +                                      BdrvLoadVMState *state)
>>> +{
>>> +    int i;
>>> +    size_t buf_size = MAX(bdrv_get_cluster_size(bs), 1 * MiB);
>>> +
>>> +    qemu_co_mutex_assert_locked(&state->lock);
>>> +    for (i = state->chunk_count; i < BDRV_VMSTATE_WORKERS_MAX; i++) {
>>> +        BdrvLoadVMStateTask *t = g_new(BdrvLoadVMStateTask, 1);
>>> +
>>> +        *t = (BdrvLoadVMStateTask) {
>>> +            .task.func = bdrv_co_vmstate_load_task_entry,
>>> +            .bs = bs,
>>> +            .chunk = g_new(BdrvLoadVMChunk, 1),
>>> +        };
>>> +
>>> +        *t->chunk = (BdrvLoadVMChunk) {
>>> +            .buf = qemu_blockalign(bs, buf_size),
>>> +            .offset = state->last_loaded,
>>> +            .bytes = buf_size,
>>> +        };
>>> +        /* FIXME: tail of stream */
>>> +
>>> +        QLIST_INSERT_HEAD(&state->loading, t->chunk, list);
>>> +        state->chunk_count++;
>>> +        state->last_loaded += buf_size;
>>> +
>>> +        qemu_co_mutex_unlock(&state->lock);
>>> +        aio_task_pool_start_task(state->pool, &t->task);
>>
>> Hmm, so, we unlock the mutex here. Am I missing something, or if there
>> are
>> two parallel load-state requests in parallel (yes, this shouldn't happen
>> with current migration code, but if don't care at all the mutex is not
>> needed),
>>
>> we may have two such loops in parallel, which will create more than
>> BDRV_VMSTATE_WORKERS_MAX chunks, as each has own local "i" variable?
> 
> I could stall on waiting free slot in the pool (technically)
> while pool completion will need the lock. This should
> NOT happen with the current arch, but would be better
> to do things correctly, i.e. drop the lock.
> 
>>
>>> +        qemu_co_mutex_lock(&state->lock);
>>> +    }
>>> +}
>>> +
>>> +static int bdrv_co_do_load_vmstate(BlockDriverState *bs,
>>> QEMUIOVector *qiov,
>>> +                                   int64_t pos)
>>> +{
>>> +    BdrvLoadVMState *state = bs->loadvm_state;
>>> +    BdrvLoadVMChunk *c;
>>> +    size_t off;
>>> +
>>> +    if (state == NULL) {
>>> +        if (pos != 0) {
>>> +            /* Normally this branch is not reachable from migration */
>>> +            return bs->drv->bdrv_load_vmstate(bs, qiov, pos);
>>> +        }
>>> +
>>> +        state = g_new(BdrvLoadVMState, 1);
>>> +        *state = (BdrvLoadVMState) {
>>> +            .pool = aio_task_pool_new(BDRV_VMSTATE_WORKERS_MAX),
>>> +            .chunks = QLIST_HEAD_INITIALIZER(state->chunks),
>>> +            .loading = QLIST_HEAD_INITIALIZER(state->loading),
>>> +        };
>>> +        qemu_co_mutex_init(&state->lock);
>>> +        qemu_co_queue_init(&state->waiters);
>>> +
>>> +        bs->loadvm_state = state;
>>> +    }
>>> +
>>> +    if (state->offset != pos) {
>>> +        /* Normally this branch is not reachable from migration */
>>> +        return bs->drv->bdrv_load_vmstate(bs, qiov, pos);
>>> +    }
>>> +
>>> +    off = 0;
>>> +    qemu_co_mutex_lock(&state->lock);
>>> +    bdrv_co_start_loadvmstate(bs, state);
>>> +
>>> +    while (off < qiov->size && aio_task_pool_status(state->pool) ==
>>> 0) {
>>> +        c = bdrv_co_find_loadvmstate_chunk(pos,
>>> QLIST_FIRST(&state->chunks));
>>> +        if (c != NULL) {
>>> +            ssize_t chunk_off = pos - c->offset;
>>> +            ssize_t to_copy = MIN(qiov->size - off, c->bytes -
>>> chunk_off);
>>> +
>>> +            qemu_iovec_from_buf(qiov, off, c->buf + chunk_off,
>>> to_copy);
>>> +
>>> +            off += to_copy;
>>> +            pos += to_copy;
>>> +
>>> +            if (pos == c->offset + c->bytes) {
>>> +                state->chunk_count--;
>>> +                /* End of buffer, discard it from the list */
>>> +                QLIST_REMOVE(c, list);
>>> +                bdrv_free_loadvm_chunk(c);
>>> +            }
>>> +
>>> +            state->offset += to_copy;
>>> +            continue;
>>> +        }
>>> +
>>> +        c = bdrv_co_find_loadvmstate_chunk(pos,
>>> QLIST_FIRST(&state->loading));
>>> +        if (c != NULL) {
>>> +            qemu_co_queue_wait(&state->waiters, &state->lock);
>>> +            continue;
>>> +        }
>>> +
>>> +        bdrv_co_start_loadvmstate(bs, state);
>>
>> Hmm, so, you assume that if we are in a loop, we'll definitely find
>> our chunk at some moment.
>> It should work with normal migration stream, sequentially reading
>> chunk by chunk. But if not,
>> it may hang:
>>
>> Consider two reads at same position, running in parallel: they both
>> pass "state->offset != pos"
>> check and wait on mutex. Then the first one will go into critical
>> section, find and remove
>> corresponding chunk. Then the second will go into critical section and
>> loop forever, because
>> corresponding chunk will never be generated again.
>>
>> So, seems, we should check (state->offset == pos) on each iteration,
>> or something like this.
>>
> we can. But the reader is synchronous and could not
> issue 2 requests. Not a problem.
> 

I think, it's OK to rely on current one-synchronous-reader arch, but some
relating comments/assertions won't hurt.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2020-06-22 18:14 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2020-06-19 10:07 [PATCH v5 0/6] block: seriously improve savevm performance Denis V. Lunev
2020-06-19 10:07 ` [PATCH 1/6] migration/savevm: respect qemu_fclose() error code in save_snapshot() Denis V. Lunev
2020-06-19 10:07 ` [PATCH 2/6] block/aio_task: allow start/wait task from any coroutine Denis V. Lunev
2020-06-19 10:07 ` [PATCH 3/6] block/aio_task: drop aio_task_pool_wait_one() helper Denis V. Lunev
2020-06-19 10:07 ` [PATCH 4/6] block/block-backend: remove always true check from blk_save_vmstate Denis V. Lunev
2020-06-20  3:06   ` Vladimir Sementsov-Ogievskiy
2020-06-19 10:07 ` [PATCH 5/6] block, migration: add bdrv_finalize_vmstate helper Denis V. Lunev
2020-06-19 10:07 ` [PATCH 6/6] block/io: improve savevm performance Denis V. Lunev
2020-06-19 11:02 ` [PATCH v5 0/6] block: seriously " no-reply
2020-06-22  8:34   ` Denis V. Lunev
2020-06-22  8:33 ` [PATCH 7/6] block/io: improve loadvm performance Denis V. Lunev
2020-06-22 17:02   ` Vladimir Sementsov-Ogievskiy
2020-06-22 17:17     ` Denis V. Lunev
2020-06-22 18:13       ` Vladimir Sementsov-Ogievskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).