qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/6] tests/qtest: make migration-test massively faster
@ 2023-04-21 17:14 Daniel P. Berrangé
  2023-04-21 17:14 ` [PATCH v2 1/6] tests/qtest: replace qmp_discard_response with qtest_qmp_assert_success Daniel P. Berrangé
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-04-21 17:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Paolo Bonzini, Thomas Huth, John Snow, Li Zhijian,
	Juan Quintela, Stefan Hajnoczi, Zhang Chen, Laurent Vivier,
	Daniel P. Berrangé

This makes migration-test faster by observing that most of the pre-copy
tests don't need to be doing a live migration. They get sufficient code
coverage with the guest CPUs paused.

On my machine this cuts the overall execution time of migration-test
by 50% from 15 minutes, down to 8 minutes, without sacrificing any
noticeable code coverage.

This is still quite slow though.

The tests which do still run in live mode all want to guarantee there
are at least 2 iterations of migration, to exercise the dirty page
handling code. This is achieved by running the 1 iteration with an
incredibly small bandwidth and max downtime to prevent convergance,
and watching query-migrate for the reported iteration to increment.
This guarantees that all the tests take at least 30 seconds to run.

Watching for the iteration counter to flip is inefficient and not
actually needed. Instead we merely need to prove that some amount
of already transferred data has been made dirty again. This in turn
will guarantee that a further iteration is required beyond the current
one. This proof is easy to achieve by monitoring the values at two
distinct addresses in guest RAM, and can cut the 30 second duration
down to 1 second.

After this optimization, and Juan's patch to disable autoconverge
testnig, the migration test runs in merely 1 minute which I think
it pretty impressive given the number of scenarios we're testing.

Daniel P. Berrangé (5):
  tests/qtest: replace qmp_discard_response with
    qtest_qmp_assert_success
  tests/qtests: remove migration test iterations config
  tests/qtest: capture RESUME events during migration
  tests/qtest: make more migration pre-copy scenarios run non-live
  tests/qtest: massively speed up migration-tet

Juan Quintela (1):
  tests/migration: Only run auto_converge in slow mode

 tests/qtest/ahci-test.c              |  31 ++--
 tests/qtest/boot-order-test.c        |   5 +-
 tests/qtest/fdc-test.c               |  15 +-
 tests/qtest/ide-test.c               |   5 +-
 tests/qtest/migration-helpers.c      |  12 +-
 tests/qtest/migration-helpers.h      |   1 +
 tests/qtest/migration-test.c         | 252 +++++++++++++++++++--------
 tests/qtest/test-filter-mirror.c     |   5 +-
 tests/qtest/test-filter-redirector.c |   7 +-
 tests/qtest/virtio-blk-test.c        |  24 +--
 10 files changed, 227 insertions(+), 130 deletions(-)

-- 
2.40.0



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/6] tests/qtest: replace qmp_discard_response with qtest_qmp_assert_success
  2023-04-21 17:14 [PATCH v2 0/6] tests/qtest: make migration-test massively faster Daniel P. Berrangé
@ 2023-04-21 17:14 ` Daniel P. Berrangé
  2023-04-21 21:52   ` Juan Quintela
  2023-04-23  2:22   ` Zhang, Chen
  2023-04-21 17:14 ` [PATCH v2 2/6] tests/qtests: remove migration test iterations config Daniel P. Berrangé
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-04-21 17:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Paolo Bonzini, Thomas Huth, John Snow, Li Zhijian,
	Juan Quintela, Stefan Hajnoczi, Zhang Chen, Laurent Vivier,
	Daniel P. Berrangé

The qmp_discard_response method simply ignores the result of the QMP
command, merely unref'ing the object. This is a bad idea for tests
as it leaves no trace if the QMP command unexpectedly failed. The
qtest_qmp_assert_success method will validate that the QMP command
returned without error, and if errors occur, it will print a message
on the console aiding debugging.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/ahci-test.c              | 31 ++++++++++++++--------------
 tests/qtest/boot-order-test.c        |  5 +----
 tests/qtest/fdc-test.c               | 15 +++++++-------
 tests/qtest/ide-test.c               |  5 +----
 tests/qtest/migration-test.c         |  5 +----
 tests/qtest/test-filter-mirror.c     |  5 +----
 tests/qtest/test-filter-redirector.c |  7 ++-----
 tests/qtest/virtio-blk-test.c        | 24 ++++++++++-----------
 8 files changed, 40 insertions(+), 57 deletions(-)

diff --git a/tests/qtest/ahci-test.c b/tests/qtest/ahci-test.c
index 1967cd5898..abab761c26 100644
--- a/tests/qtest/ahci-test.c
+++ b/tests/qtest/ahci-test.c
@@ -36,9 +36,6 @@
 #include "hw/pci/pci_ids.h"
 #include "hw/pci/pci_regs.h"
 
-/* TODO actually test the results and get rid of this */
-#define qmp_discard_response(s, ...) qobject_unref(qtest_qmp(s, __VA_ARGS__))
-
 /* Test images sizes in MB */
 #define TEST_IMAGE_SIZE_MB_LARGE (200 * 1024)
 #define TEST_IMAGE_SIZE_MB_SMALL 64
@@ -1595,9 +1592,9 @@ static void test_atapi_tray(void)
     rsp = qtest_qmp_receive(ahci->parent->qts);
     qobject_unref(rsp);
 
-    qmp_discard_response(ahci->parent->qts,
-                         "{'execute': 'blockdev-remove-medium', "
-                         "'arguments': {'id': 'cd0'}}");
+    qtest_qmp_assert_success(ahci->parent->qts,
+                             "{'execute': 'blockdev-remove-medium', "
+                             "'arguments': {'id': 'cd0'}}");
 
     /* Test the tray without a medium */
     ahci_atapi_load(ahci, port);
@@ -1607,16 +1604,18 @@ static void test_atapi_tray(void)
     atapi_wait_tray(ahci, true);
 
     /* Re-insert media */
-    qmp_discard_response(ahci->parent->qts,
-                         "{'execute': 'blockdev-add', "
-                         "'arguments': {'node-name': 'node0', "
-                                        "'driver': 'raw', "
-                                        "'file': { 'driver': 'file', "
-                                                  "'filename': %s }}}", iso);
-    qmp_discard_response(ahci->parent->qts,
-                         "{'execute': 'blockdev-insert-medium',"
-                         "'arguments': { 'id': 'cd0', "
-                                         "'node-name': 'node0' }}");
+    qtest_qmp_assert_success(
+        ahci->parent->qts,
+        "{'execute': 'blockdev-add', "
+        "'arguments': {'node-name': 'node0', "
+                      "'driver': 'raw', "
+                      "'file': { 'driver': 'file', "
+                                "'filename': %s }}}", iso);
+    qtest_qmp_assert_success(
+        ahci->parent->qts,
+        "{'execute': 'blockdev-insert-medium',"
+        "'arguments': { 'id': 'cd0', "
+                       "'node-name': 'node0' }}");
 
     /* Again, the event shows up first */
     qtest_qmp_send(ahci->parent->qts, "{'execute': 'blockdev-close-tray', "
diff --git a/tests/qtest/boot-order-test.c b/tests/qtest/boot-order-test.c
index 0680d79d6d..8f2b6ef05a 100644
--- a/tests/qtest/boot-order-test.c
+++ b/tests/qtest/boot-order-test.c
@@ -16,9 +16,6 @@
 #include "qapi/qmp/qdict.h"
 #include "standard-headers/linux/qemu_fw_cfg.h"
 
-/* TODO actually test the results and get rid of this */
-#define qmp_discard_response(qs, ...) qobject_unref(qtest_qmp(qs, __VA_ARGS__))
-
 typedef struct {
     const char *args;
     uint64_t expected_boot;
@@ -43,7 +40,7 @@ static void test_a_boot_order(const char *machine,
                       machine ?: "", test_args);
     actual = read_boot_order(qts);
     g_assert_cmphex(actual, ==, expected_boot);
-    qmp_discard_response(qts, "{ 'execute': 'system_reset' }");
+    qtest_qmp_assert_success(qts, "{ 'execute': 'system_reset' }");
     /*
      * system_reset only requests reset.  We get a RESET event after
      * the actual reset completes.  Need to wait for that.
diff --git a/tests/qtest/fdc-test.c b/tests/qtest/fdc-test.c
index 1f9b99ad6d..5e8fbda9df 100644
--- a/tests/qtest/fdc-test.c
+++ b/tests/qtest/fdc-test.c
@@ -28,9 +28,6 @@
 #include "libqtest-single.h"
 #include "qapi/qmp/qdict.h"
 
-/* TODO actually test the results and get rid of this */
-#define qmp_discard_response(...) qobject_unref(qmp(__VA_ARGS__))
-
 #define DRIVE_FLOPPY_BLANK \
     "-drive if=floppy,file=null-co://,file.read-zeroes=on,format=raw,size=1440k"
 
@@ -304,9 +301,10 @@ static void test_media_insert(void)
 
     /* Insert media in drive. DSKCHK should not be reset until a step pulse
      * is sent. */
-    qmp_discard_response("{'execute':'blockdev-change-medium', 'arguments':{"
-                         " 'id':'floppy0', 'filename': %s, 'format': 'raw' }}",
-                         test_image);
+    qtest_qmp_assert_success(global_qtest,
+                             "{'execute':'blockdev-change-medium', 'arguments':{"
+                             " 'id':'floppy0', 'filename': %s, 'format': 'raw' }}",
+                             test_image);
 
     dir = inb(FLOPPY_BASE + reg_dir);
     assert_bit_set(dir, DSKCHG);
@@ -335,8 +333,9 @@ static void test_media_change(void)
 
     /* Eject the floppy and check that DSKCHG is set. Reading it out doesn't
      * reset the bit. */
-    qmp_discard_response("{'execute':'eject', 'arguments':{"
-                         " 'id':'floppy0' }}");
+    qtest_qmp_assert_success(global_qtest,
+                             "{'execute':'eject', 'arguments':{"
+                             " 'id':'floppy0' }}");
 
     dir = inb(FLOPPY_BASE + reg_dir);
     assert_bit_set(dir, DSKCHG);
diff --git a/tests/qtest/ide-test.c b/tests/qtest/ide-test.c
index dcb050bf9b..d6b4f6e36a 100644
--- a/tests/qtest/ide-test.c
+++ b/tests/qtest/ide-test.c
@@ -34,9 +34,6 @@
 #include "hw/pci/pci_ids.h"
 #include "hw/pci/pci_regs.h"
 
-/* TODO actually test the results and get rid of this */
-#define qmp_discard_response(q, ...) qobject_unref(qtest_qmp(q, __VA_ARGS__))
-
 #define TEST_IMAGE_SIZE 64 * 1024 * 1024
 
 #define IDE_PCI_DEV     1
@@ -766,7 +763,7 @@ static void test_pci_retry_flush(void)
     qtest_qmp_eventwait(qts, "STOP");
 
     /* Complete the command */
-    qmp_discard_response(qts, "{'execute':'cont' }");
+    qtest_qmp_assert_success(qts, "{'execute':'cont' }");
 
     /* Check registers */
     data = qpci_io_readb(dev, ide_bar, reg_device);
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 3b615b0da9..ac2e8ecac6 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -40,9 +40,6 @@
 #include "linux/kvm.h"
 #endif
 
-/* TODO actually test the results and get rid of this */
-#define qtest_qmp_discard_response(...) qobject_unref(qtest_qmp(__VA_ARGS__))
-
 unsigned start_address;
 unsigned end_address;
 static bool uffd_feature_thread_id;
@@ -731,7 +728,7 @@ static void test_migrate_end(QTestState *from, QTestState *to, bool test_dest)
             usleep(1000 * 10);
         } while (dest_byte_a == dest_byte_b);
 
-        qtest_qmp_discard_response(to, "{ 'execute' : 'stop'}");
+        qtest_qmp_assert_success(to, "{ 'execute' : 'stop'}");
 
         /* With it stopped, check nothing changes */
         qtest_memread(to, start_address, &dest_byte_c, 1);
diff --git a/tests/qtest/test-filter-mirror.c b/tests/qtest/test-filter-mirror.c
index 248fc88699..adeada3eb8 100644
--- a/tests/qtest/test-filter-mirror.c
+++ b/tests/qtest/test-filter-mirror.c
@@ -16,9 +16,6 @@
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 
-/* TODO actually test the results and get rid of this */
-#define qmp_discard_response(qs, ...) qobject_unref(qtest_qmp(qs, __VA_ARGS__))
-
 static void test_mirror(void)
 {
     int send_sock[2], recv_sock[2];
@@ -52,7 +49,7 @@ static void test_mirror(void)
     };
 
     /* send a qmp command to guarantee that 'connected' is setting to true. */
-    qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
+    qtest_qmp_assert_success(qts, "{ 'execute' : 'query-status'}");
     ret = iov_send(send_sock[0], iov, 2, 0, sizeof(size) + sizeof(send_buf));
     g_assert_cmpint(ret, ==, sizeof(send_buf) + sizeof(size));
     close(send_sock[0]);
diff --git a/tests/qtest/test-filter-redirector.c b/tests/qtest/test-filter-redirector.c
index 24ca9280f8..e72e3b7873 100644
--- a/tests/qtest/test-filter-redirector.c
+++ b/tests/qtest/test-filter-redirector.c
@@ -58,9 +58,6 @@
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
 
-/* TODO actually test the results and get rid of this */
-#define qmp_discard_response(qs, ...) qobject_unref(qtest_qmp(qs, __VA_ARGS__))
-
 static void test_redirector_tx(void)
 {
     int backend_sock[2], recv_sock;
@@ -98,7 +95,7 @@ static void test_redirector_tx(void)
     g_assert_cmpint(recv_sock, !=, -1);
 
     /* send a qmp command to guarantee that 'connected' is setting to true. */
-    qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
+    qtest_qmp_assert_success(qts, "{ 'execute' : 'query-status'}");
 
     struct iovec iov[] = {
         {
@@ -176,7 +173,7 @@ static void test_redirector_rx(void)
     send_sock = unix_connect(sock_path1, NULL);
     g_assert_cmpint(send_sock, !=, -1);
     /* send a qmp command to guarantee that 'connected' is setting to true. */
-    qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
+    qtest_qmp_assert_success(qts, "{ 'execute' : 'query-status'}");
 
     ret = iov_send(send_sock, iov, 2, 0, sizeof(size) + sizeof(send_buf));
     g_assert_cmpint(ret, ==, sizeof(send_buf) + sizeof(size));
diff --git a/tests/qtest/virtio-blk-test.c b/tests/qtest/virtio-blk-test.c
index 19c01f808b..98c906ebb4 100644
--- a/tests/qtest/virtio-blk-test.c
+++ b/tests/qtest/virtio-blk-test.c
@@ -17,9 +17,6 @@
 #include "libqos/qgraph.h"
 #include "libqos/virtio-blk.h"
 
-/* TODO actually test the results and get rid of this */
-#define qmp_discard_response(...) qobject_unref(qmp(__VA_ARGS__))
-
 #define TEST_IMAGE_SIZE         (64 * 1024 * 1024)
 #define QVIRTIO_BLK_TIMEOUT_US  (30 * 1000 * 1000)
 #define PCI_SLOT_HP             0x06
@@ -453,9 +450,10 @@ static void config(void *obj, void *data, QGuestAllocator *t_alloc)
 
     qvirtio_set_driver_ok(dev);
 
-    qmp_discard_response("{ 'execute': 'block_resize', "
-                         " 'arguments': { 'device': 'drive0', "
-                         " 'size': %d } }", n_size);
+    qtest_qmp_assert_success(global_qtest,
+                             "{ 'execute': 'block_resize', "
+                             " 'arguments': { 'device': 'drive0', "
+                             " 'size': %d } }", n_size);
     qvirtio_wait_config_isr(dev, QVIRTIO_BLK_TIMEOUT_US);
 
     capacity = qvirtio_config_readq(dev, 0);
@@ -502,9 +500,10 @@ static void msix(void *obj, void *u_data, QGuestAllocator *t_alloc)
 
     qvirtio_set_driver_ok(dev);
 
-    qmp_discard_response("{ 'execute': 'block_resize', "
-                         " 'arguments': { 'device': 'drive0', "
-                         " 'size': %d } }", n_size);
+    qtest_qmp_assert_success(global_qtest,
+                             "{ 'execute': 'block_resize', "
+                             " 'arguments': { 'device': 'drive0', "
+                             " 'size': %d } }", n_size);
 
     qvirtio_wait_config_isr(dev, QVIRTIO_BLK_TIMEOUT_US);
 
@@ -758,9 +757,10 @@ static void resize(void *obj, void *data, QGuestAllocator *t_alloc)
 
     vq = test_basic(dev, t_alloc);
 
-    qmp_discard_response("{ 'execute': 'block_resize', "
-                         " 'arguments': { 'device': 'drive0', "
-                         " 'size': %d } }", n_size);
+    qtest_qmp_assert_success(global_qtest,
+                             "{ 'execute': 'block_resize', "
+                             " 'arguments': { 'device': 'drive0', "
+                             " 'size': %d } }", n_size);
 
     qvirtio_wait_queue_isr(qts, dev, vq, QVIRTIO_BLK_TIMEOUT_US);
 
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/6] tests/qtests: remove migration test iterations config
  2023-04-21 17:14 [PATCH v2 0/6] tests/qtest: make migration-test massively faster Daniel P. Berrangé
  2023-04-21 17:14 ` [PATCH v2 1/6] tests/qtest: replace qmp_discard_response with qtest_qmp_assert_success Daniel P. Berrangé
@ 2023-04-21 17:14 ` Daniel P. Berrangé
  2023-04-21 21:54   ` Juan Quintela
  2023-04-21 17:14 ` [PATCH v2 3/6] tests/qtest: capture RESUME events during migration Daniel P. Berrangé
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-04-21 17:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Paolo Bonzini, Thomas Huth, John Snow, Li Zhijian,
	Juan Quintela, Stefan Hajnoczi, Zhang Chen, Laurent Vivier,
	Daniel P. Berrangé

The 'unsigned int interations' config for migration is somewhat
overkill. Most tests don't set it, and a value of '0' is treated
as equivalent to '1'. The only test that does set it, xbzrle,
used a value of '2'.

This setting, however, only relates to the migration iterations
that take place prior to allowing convergence. IOW, on top of
this iteration count, there is always at least 1 further migration
iteration done to deal with pages that are dirtied during the
previous iteration(s).

IOW, even with iterations==1, the xbzrle test will be running for
a minimum of 2 iterations. With this in mind we can simplify the
code and just get rid of the special case.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/migration-test.c | 13 +------------
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index ac2e8ecac6..e16120ff30 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -568,9 +568,6 @@ typedef struct {
         MIG_TEST_FAIL_DEST_QUIT_ERR,
     } result;
 
-    /* Optional: set number of migration passes to wait for */
-    unsigned int iterations;
-
     /* Postcopy specific fields */
     void *postcopy_data;
     bool postcopy_preempt;
@@ -1354,13 +1351,7 @@ static void test_precopy_common(MigrateCommon *args)
             qtest_set_expected_status(to, EXIT_FAILURE);
         }
     } else {
-        if (args->iterations) {
-            while (args->iterations--) {
-                wait_for_migration_pass(from);
-            }
-        } else {
-            wait_for_migration_pass(from);
-        }
+        wait_for_migration_pass(from);
 
         migrate_ensure_converge(from);
 
@@ -1514,8 +1505,6 @@ static void test_precopy_unix_xbzrle(void)
         .listen_uri = uri,
 
         .start_hook = test_migrate_xbzrle_start,
-
-        .iterations = 2,
     };
 
     test_precopy_common(&args);
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 3/6] tests/qtest: capture RESUME events during migration
  2023-04-21 17:14 [PATCH v2 0/6] tests/qtest: make migration-test massively faster Daniel P. Berrangé
  2023-04-21 17:14 ` [PATCH v2 1/6] tests/qtest: replace qmp_discard_response with qtest_qmp_assert_success Daniel P. Berrangé
  2023-04-21 17:14 ` [PATCH v2 2/6] tests/qtests: remove migration test iterations config Daniel P. Berrangé
@ 2023-04-21 17:14 ` Daniel P. Berrangé
  2023-04-21 21:59   ` Juan Quintela
  2023-04-21 17:14 ` [PATCH v2 4/6] tests/qtest: make more migration pre-copy scenarios run non-live Daniel P. Berrangé
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-04-21 17:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Paolo Bonzini, Thomas Huth, John Snow, Li Zhijian,
	Juan Quintela, Stefan Hajnoczi, Zhang Chen, Laurent Vivier,
	Daniel P. Berrangé

When running migration tests we monitor for a STOP event so we can skip
redundant waits. This will be needed for the RESUME event too shortly.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/migration-helpers.c | 12 +++++++++---
 tests/qtest/migration-helpers.h |  1 +
 tests/qtest/migration-test.c    |  1 +
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
index f6f3c6680f..61396335cc 100644
--- a/tests/qtest/migration-helpers.c
+++ b/tests/qtest/migration-helpers.c
@@ -24,14 +24,20 @@
 #define MIGRATION_STATUS_WAIT_TIMEOUT 120
 
 bool got_stop;
+bool got_resume;
 
-static void check_stop_event(QTestState *who)
+static void check_events(QTestState *who)
 {
     QDict *event = qtest_qmp_event_ref(who, "STOP");
     if (event) {
         got_stop = true;
         qobject_unref(event);
     }
+    event = qtest_qmp_event_ref(who, "RESUME");
+    if (event) {
+        got_resume = true;
+        qobject_unref(event);
+    }
 }
 
 #ifndef _WIN32
@@ -48,7 +54,7 @@ QDict *wait_command_fd(QTestState *who, int fd, const char *command, ...)
     va_end(ap);
 
     resp = qtest_qmp_receive(who);
-    check_stop_event(who);
+    check_events(who);
 
     g_assert(!qdict_haskey(resp, "error"));
     g_assert(qdict_haskey(resp, "return"));
@@ -73,7 +79,7 @@ QDict *wait_command(QTestState *who, const char *command, ...)
     resp = qtest_vqmp(who, command, ap);
     va_end(ap);
 
-    check_stop_event(who);
+    check_events(who);
 
     g_assert(!qdict_haskey(resp, "error"));
     g_assert(qdict_haskey(resp, "return"));
diff --git a/tests/qtest/migration-helpers.h b/tests/qtest/migration-helpers.h
index a188b62787..726a66cfc1 100644
--- a/tests/qtest/migration-helpers.h
+++ b/tests/qtest/migration-helpers.h
@@ -16,6 +16,7 @@
 #include "libqtest.h"
 
 extern bool got_stop;
+extern bool got_resume;
 
 #ifndef _WIN32
 G_GNUC_PRINTF(3, 4)
diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index e16120ff30..6492ffa7fe 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -596,6 +596,7 @@ static int test_migrate_start(QTestState **from, QTestState **to,
     }
 
     got_stop = false;
+    got_resume = false;
     bootpath = g_strdup_printf("%s/bootsect", tmpfs);
     if (strcmp(arch, "i386") == 0 || strcmp(arch, "x86_64") == 0) {
         /* the assembled x86 boot sector should be exactly one sector large */
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 4/6] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-04-21 17:14 [PATCH v2 0/6] tests/qtest: make migration-test massively faster Daniel P. Berrangé
                   ` (2 preceding siblings ...)
  2023-04-21 17:14 ` [PATCH v2 3/6] tests/qtest: capture RESUME events during migration Daniel P. Berrangé
@ 2023-04-21 17:14 ` Daniel P. Berrangé
  2023-04-21 22:06   ` Juan Quintela
  2023-04-24 21:01   ` Fabiano Rosas
  2023-04-21 17:14 ` [PATCH v2 5/6] tests/qtest: massively speed up migration-tet Daniel P. Berrangé
  2023-04-21 17:14 ` [PATCH v2 6/6] tests/migration: Only run auto_converge in slow mode Daniel P. Berrangé
  5 siblings, 2 replies; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-04-21 17:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Paolo Bonzini, Thomas Huth, John Snow, Li Zhijian,
	Juan Quintela, Stefan Hajnoczi, Zhang Chen, Laurent Vivier,
	Daniel P. Berrangé

There are 27 pre-copy live migration scenarios being tested. In all of
these we force non-convergance and run for one iteration, then let it
converge and wait for completion during the second (or following)
iterations. At 3 mbps bandwidth limit the first iteration takes a very
long time (~30 seconds).

While it is important to test the migration passes and convergance
logic, it is overkill to do this for all 27 pre-copy scenarios. The
TLS migration scenarios in particular are merely exercising different
code paths during connection establishment.

To optimize time taken, switch most of the test scenarios to run
non-live (ie guest CPUs paused) with no bandwidth limits. This gives
a massive speed up for most of the test scenarios.

For test coverage the following scenarios are unchanged

 * Precopy with UNIX sockets
 * Precopy with UNIX sockets and dirty ring tracking
 * Precopy with XBZRLE
 * Precopy with multifd

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/migration-test.c | 60 ++++++++++++++++++++++++++++++------
 1 file changed, 50 insertions(+), 10 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 6492ffa7fe..40d0f75480 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -568,6 +568,9 @@ typedef struct {
         MIG_TEST_FAIL_DEST_QUIT_ERR,
     } result;
 
+    /* Whether the guest CPUs should be running during migration */
+    bool live;
+
     /* Postcopy specific fields */
     void *postcopy_data;
     bool postcopy_preempt;
@@ -1324,8 +1327,6 @@ static void test_precopy_common(MigrateCommon *args)
         return;
     }
 
-    migrate_ensure_non_converge(from);
-
     if (args->start_hook) {
         data_hook = args->start_hook(from, to);
     }
@@ -1335,6 +1336,31 @@ static void test_precopy_common(MigrateCommon *args)
         wait_for_serial("src_serial");
     }
 
+    if (args->live) {
+        /*
+         * Testing live migration, we want to ensure that some
+         * memory is re-dirtied after being transferred, so that
+         * we exercise logic for dirty page handling. We achieve
+         * this with a ridiculosly low bandwidth that guarantees
+         * non-convergance.
+         */
+        migrate_ensure_non_converge(from);
+    } else {
+        /*
+         * Testing non-live migration, we allow it to run at
+         * full speed to ensure short test case duration.
+         * For tests expected to fail, we don't need to
+         * change anything.
+         */
+        if (args->result == MIG_TEST_SUCCEED) {
+            qtest_qmp_assert_success(from, "{ 'execute' : 'stop'}");
+            if (!got_stop) {
+                qtest_qmp_eventwait(from, "STOP");
+            }
+            migrate_ensure_converge(from);
+        }
+    }
+
     if (!args->connect_uri) {
         g_autofree char *local_connect_uri =
             migrate_get_socket_address(to, "socket-address");
@@ -1352,19 +1378,29 @@ static void test_precopy_common(MigrateCommon *args)
             qtest_set_expected_status(to, EXIT_FAILURE);
         }
     } else {
-        wait_for_migration_pass(from);
+        if (args->live) {
+            wait_for_migration_pass(from);
 
-        migrate_ensure_converge(from);
+            migrate_ensure_converge(from);
 
-        /* We do this first, as it has a timeout to stop us
-         * hanging forever if migration didn't converge */
-        wait_for_migration_complete(from);
+            /*
+             * We do this first, as it has a timeout to stop us
+             * hanging forever if migration didn't converge
+             */
+            wait_for_migration_complete(from);
+
+            if (!got_stop) {
+                qtest_qmp_eventwait(from, "STOP");
+            }
+        } else {
+            wait_for_migration_complete(from);
 
-        if (!got_stop) {
-            qtest_qmp_eventwait(from, "STOP");
+            qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
         }
 
-        qtest_qmp_eventwait(to, "RESUME");
+        if (!got_resume) {
+            qtest_qmp_eventwait(to, "RESUME");
+        }
 
         wait_for_serial("dest_serial");
     }
@@ -1382,6 +1418,7 @@ static void test_precopy_unix_plain(void)
     MigrateCommon args = {
         .listen_uri = uri,
         .connect_uri = uri,
+        .live = true,
     };
 
     test_precopy_common(&args);
@@ -1397,6 +1434,7 @@ static void test_precopy_unix_dirty_ring(void)
         },
         .listen_uri = uri,
         .connect_uri = uri,
+        .live = true,
     };
 
     test_precopy_common(&args);
@@ -1506,6 +1544,7 @@ static void test_precopy_unix_xbzrle(void)
         .listen_uri = uri,
 
         .start_hook = test_migrate_xbzrle_start,
+        .live = true,
     };
 
     test_precopy_common(&args);
@@ -1906,6 +1945,7 @@ static void test_multifd_tcp_none(void)
     MigrateCommon args = {
         .listen_uri = "defer",
         .start_hook = test_migrate_precopy_tcp_multifd_start,
+        .live = true,
     };
     test_precopy_common(&args);
 }
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 5/6] tests/qtest: massively speed up migration-tet
  2023-04-21 17:14 [PATCH v2 0/6] tests/qtest: make migration-test massively faster Daniel P. Berrangé
                   ` (3 preceding siblings ...)
  2023-04-21 17:14 ` [PATCH v2 4/6] tests/qtest: make more migration pre-copy scenarios run non-live Daniel P. Berrangé
@ 2023-04-21 17:14 ` Daniel P. Berrangé
  2023-04-21 22:15   ` Juan Quintela
  2023-04-21 17:14 ` [PATCH v2 6/6] tests/migration: Only run auto_converge in slow mode Daniel P. Berrangé
  5 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-04-21 17:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Paolo Bonzini, Thomas Huth, John Snow, Li Zhijian,
	Juan Quintela, Stefan Hajnoczi, Zhang Chen, Laurent Vivier,
	Daniel P. Berrangé

The migration test cases that actually exercise live migration want to
ensure there is a minimum of two iterations of pre-copy, in order to
exercise the dirty tracking code.

Historically we've queried the migration status, looking for the
'dirty-sync-count' value to increment to track iterations. This was
not entirely reliable because often all the data would get transferred
quickly enough that the migration would finish before we wanted it
to. So we massively dropped the bandwidth and max downtime to
guarantee non-convergance. This had the unfortunate side effect
that every migration took at least 30 seconds to run (100 MB of
dirty pages / 3 MB/sec).

This optimization takes a different approach to ensuring that a
mimimum of two iterations. Rather than waiting for dirty-sync-count
to increment, directly look for an indication that the source VM
has dirtied RAM that has already been transferred.

On the source VM a magic marker is written just after the 3 MB
offset. The destination VM is now montiored to detect when the
magic marker is transferred. This gives a guarantee that the
first 3 MB of memory have been transferred. Now the source VM
memory is monitored at exactly the 3MB offset until we observe
a flip in its value. This gives us a guaranteed that the guest
workload has dirtied a byte that has already been transferred.

Since we're looking at a place that is only 3 MB from the start
of memory, with the 3 MB/sec bandwidth, this test should complete
in 1 second, instead of 30 seconds.

Once we've proved there is some dirty memory, migration can be
set back to full speed for the remainder of the 1st iteration,
and the entire of the second iteration at which point migration
should be complete.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
---
 tests/qtest/migration-test.c | 170 +++++++++++++++++++++++------------
 1 file changed, 114 insertions(+), 56 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 40d0f75480..63bd8a1893 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -44,6 +44,20 @@ unsigned start_address;
 unsigned end_address;
 static bool uffd_feature_thread_id;
 
+/*
+ * An initial 3 MB offset is used as that corresponds
+ * to ~1 sec of data transfer with our bandwidth setting.
+ */
+#define MAGIC_OFFSET_BASE (3 * 1024 * 1024)
+/*
+ * A further 1k is added to ensure we're not a multiple
+ * of TEST_MEM_PAGE_SIZE, thus avoid clash with writes
+ * from the migration guest workload.
+ */
+#define MAGIC_OFFSET_SHUFFLE 1024
+#define MAGIC_OFFSET (MAGIC_OFFSET_BASE + MAGIC_OFFSET_SHUFFLE)
+#define MAGIC_MARKER 0xFEED12345678CAFEULL
+
 /*
  * Dirtylimit stop working if dirty page rate error
  * value less than DIRTYLIMIT_TOLERANCE_RANGE
@@ -172,28 +186,6 @@ static void wait_for_serial(const char *side)
     } while (true);
 }
 
-/*
- * It's tricky to use qemu's migration event capability with qtest,
- * events suddenly appearing confuse the qmp()/hmp() responses.
- */
-
-static int64_t read_ram_property_int(QTestState *who, const char *property)
-{
-    QDict *rsp_return, *rsp_ram;
-    int64_t result;
-
-    rsp_return = migrate_query_not_failed(who);
-    if (!qdict_haskey(rsp_return, "ram")) {
-        /* Still in setup */
-        result = 0;
-    } else {
-        rsp_ram = qdict_get_qdict(rsp_return, "ram");
-        result = qdict_get_try_int(rsp_ram, property, 0);
-    }
-    qobject_unref(rsp_return);
-    return result;
-}
-
 static int64_t read_migrate_property_int(QTestState *who, const char *property)
 {
     QDict *rsp_return;
@@ -205,10 +197,6 @@ static int64_t read_migrate_property_int(QTestState *who, const char *property)
     return result;
 }
 
-static uint64_t get_migration_pass(QTestState *who)
-{
-    return read_ram_property_int(who, "dirty-sync-count");
-}
 
 static void read_blocktime(QTestState *who)
 {
@@ -219,23 +207,6 @@ static void read_blocktime(QTestState *who)
     qobject_unref(rsp_return);
 }
 
-static void wait_for_migration_pass(QTestState *who)
-{
-    uint64_t initial_pass = get_migration_pass(who);
-    uint64_t pass;
-
-    /* Wait for the 1st sync */
-    while (!got_stop && !initial_pass) {
-        usleep(1000);
-        initial_pass = get_migration_pass(who);
-    }
-
-    do {
-        usleep(1000);
-        pass = get_migration_pass(who);
-    } while (pass == initial_pass && !got_stop);
-}
-
 static void check_guests_ram(QTestState *who)
 {
     /* Our ASM test will have been incrementing one byte from each page from
@@ -417,6 +388,91 @@ static void migrate_ensure_converge(QTestState *who)
     migrate_set_parameter_int(who, "downtime-limit", 30 * 1000);
 }
 
+/*
+ * Our goal is to ensure that we run a single full migration
+ * iteration, and also dirty memory, ensuring that at least
+ * one further iteration is required.
+ *
+ * We can't directly synchronize with the start of a migration
+ * so we have to apply some tricks monitoring memory that is
+ * transferred.
+ *
+ * Initially we set the migration bandwidth to an insanely
+ * low value, with tiny max downtime too. This basically
+ * guarantees migration will never complete.
+ *
+ * This will result in a test that is unacceptably slow though,
+ * so we can't let the entire migration pass run at this speed.
+ * Our intent is to let it run just long enough that we can
+ * prove data prior to the marker has been transferred *AND*
+ * also prove this transferred data is dirty again.
+ *
+ * Before migration starts, we write a 64-bit magic marker
+ * into a fixed location in the src VM RAM.
+ *
+ * Then watch dst memory until the marker appears. This is
+ * proof that start_address -> MAGIC_OFFSET_BASE has been
+ * transferred.
+ *
+ * Finally we go back to the source and read a byte just
+ * before the marker untill we see it flip in value. This
+ * is proof that start_address -> MAGIC_OFFSET_BASE
+ * is now dirty again.
+ *
+ * IOW, we're guaranteed at least a 2nd migration pass
+ * at this point.
+ *
+ * We can now let migration run at full speed to finish
+ * the test
+ */
+static void migrate_prepare_for_dirty_mem(QTestState *from)
+{
+    /*
+     * The guest workflow iterates from start_address to
+     * end_address, writing 1 byte every TEST_MEM_PAGE_SIZE
+     * bytes.
+     *
+     * IOW, if we write to mem at a point which is NOT
+     * a multiple of TEST_MEM_PAGE_SIZE, our write won't
+     * conflict with the migration workflow.
+     *
+     * We put in a marker here, that we'll use to determine
+     * when the data has been transferred to the dst.
+     */
+    qtest_writeq(from, start_address + MAGIC_OFFSET, MAGIC_MARKER);
+}
+
+static void migrate_wait_for_dirty_mem(QTestState *from,
+                                       QTestState *to)
+{
+    uint64_t watch_address = start_address + MAGIC_OFFSET_BASE;
+    uint64_t marker_address = start_address + MAGIC_OFFSET;
+    uint8_t watch_byte;
+
+    /*
+     * Wait for the MAGIC_MARKER to get transferred, as an
+     * indicator that a migration pass has made some known
+     * amount of progress.
+     */
+    do {
+        usleep(1000 * 10);
+    } while (qtest_readq(to, marker_address) != MAGIC_MARKER);
+
+    /*
+     * Now ensure that already transferred bytes are
+     * dirty again from the guest workload. Note the
+     * guest byte value will wrap around and by chance
+     * match the original watch_byte. This is harmless
+     * as we'll eventually see a different value if we
+     * keep watching
+     */
+    watch_byte = qtest_readb(from, watch_address);
+    do {
+        usleep(1000 * 10);
+    } while (qtest_readb(from, watch_address) == watch_byte);
+}
+
+
 static void migrate_pause(QTestState *who)
 {
     QDict *rsp;
@@ -1116,12 +1172,14 @@ static int migrate_postcopy_prepare(QTestState **from_ptr,
 
     migrate_ensure_non_converge(from);
 
+    migrate_prepare_for_dirty_mem(from);
+
     /* Wait for the first serial output from the source */
     wait_for_serial("src_serial");
 
     migrate_qmp(from, uri, "{}");
 
-    wait_for_migration_pass(from);
+    migrate_wait_for_dirty_mem(from, to);
 
     *from_ptr = from;
     *to_ptr = to;
@@ -1337,14 +1395,8 @@ static void test_precopy_common(MigrateCommon *args)
     }
 
     if (args->live) {
-        /*
-         * Testing live migration, we want to ensure that some
-         * memory is re-dirtied after being transferred, so that
-         * we exercise logic for dirty page handling. We achieve
-         * this with a ridiculosly low bandwidth that guarantees
-         * non-convergance.
-         */
         migrate_ensure_non_converge(from);
+        migrate_prepare_for_dirty_mem(from);
     } else {
         /*
          * Testing non-live migration, we allow it to run at
@@ -1379,7 +1431,7 @@ static void test_precopy_common(MigrateCommon *args)
         }
     } else {
         if (args->live) {
-            wait_for_migration_pass(from);
+            migrate_wait_for_dirty_mem(from, to);
 
             migrate_ensure_converge(from);
 
@@ -1498,6 +1550,9 @@ static void test_ignore_shared(void)
         return;
     }
 
+    migrate_ensure_non_converge(from);
+    migrate_prepare_for_dirty_mem(from);
+
     migrate_set_capability(from, "x-ignore-shared", true);
     migrate_set_capability(to, "x-ignore-shared", true);
 
@@ -1506,7 +1561,7 @@ static void test_ignore_shared(void)
 
     migrate_qmp(from, uri, "{}");
 
-    wait_for_migration_pass(from);
+    migrate_wait_for_dirty_mem(from, to);
 
     if (!got_stop) {
         qtest_qmp_eventwait(from, "STOP");
@@ -2152,6 +2207,7 @@ static void test_multifd_tcp_cancel(void)
     }
 
     migrate_ensure_non_converge(from);
+    migrate_prepare_for_dirty_mem(from);
 
     migrate_set_parameter_int(from, "multifd-channels", 16);
     migrate_set_parameter_int(to, "multifd-channels", 16);
@@ -2171,7 +2227,7 @@ static void test_multifd_tcp_cancel(void)
 
     migrate_qmp(from, uri, "{}");
 
-    wait_for_migration_pass(from);
+    migrate_wait_for_dirty_mem(from, to);
 
     migrate_cancel(from);
 
@@ -2201,11 +2257,13 @@ static void test_multifd_tcp_cancel(void)
 
     wait_for_migration_status(from, "cancelled", NULL);
 
-    migrate_ensure_converge(from);
+    migrate_ensure_non_converge(from);
 
     migrate_qmp(from, uri, "{}");
 
-    wait_for_migration_pass(from);
+    migrate_wait_for_dirty_mem(from, to);
+
+    migrate_ensure_converge(from);
 
     if (!got_stop) {
         qtest_qmp_eventwait(from, "STOP");
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 6/6] tests/migration: Only run auto_converge in slow mode
  2023-04-21 17:14 [PATCH v2 0/6] tests/qtest: make migration-test massively faster Daniel P. Berrangé
                   ` (4 preceding siblings ...)
  2023-04-21 17:14 ` [PATCH v2 5/6] tests/qtest: massively speed up migration-tet Daniel P. Berrangé
@ 2023-04-21 17:14 ` Daniel P. Berrangé
  2023-04-23  2:41   ` Zhang, Chen
  2023-04-24  8:06   ` Zhang, Chen
  5 siblings, 2 replies; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-04-21 17:14 UTC (permalink / raw)
  To: qemu-devel
  Cc: qemu-block, Paolo Bonzini, Thomas Huth, John Snow, Li Zhijian,
	Juan Quintela, Stefan Hajnoczi, Zhang Chen, Laurent Vivier

From: Juan Quintela <quintela@redhat.com>

Signed-off-by: Juan Quintela <quintela@redhat.com>
---
 tests/qtest/migration-test.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index 63bd8a1893..9ed178aa03 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -1877,6 +1877,21 @@ static void test_validate_uuid_dst_not_set(void)
     do_test_validate_uuid(&args, false);
 }
 
+/*
+ * The way auto_converge works, we need to do too many passes to
+ * run this test.  Auto_converge logic is only run once every
+ * three iterations, so:
+ *
+ * - 3 iterations without auto_converge enabled
+ * - 3 iterations with pct = 5
+ * - 3 iterations with pct = 30
+ * - 3 iterations with pct = 55
+ * - 3 iterations with pct = 80
+ * - 3 iterations with pct = 95 (max(95, 80 + 25))
+ *
+ * To make things even worse, we need to run the initial stage at
+ * 3MB/s so we enter autoconverge even when host is (over)loaded.
+ */
 static void test_migrate_auto_converge(void)
 {
     g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
@@ -2660,8 +2675,12 @@ int main(int argc, char **argv)
                    test_validate_uuid_src_not_set);
     qtest_add_func("/migration/validate_uuid_dst_not_set",
                    test_validate_uuid_dst_not_set);
-
-    qtest_add_func("/migration/auto_converge", test_migrate_auto_converge);
+    /*
+     * See explanation why this test is slow on function definition
+     */
+    if (g_test_slow()) {
+        qtest_add_func("/migration/auto_converge", test_migrate_auto_converge);
+    }
     qtest_add_func("/migration/multifd/tcp/plain/none",
                    test_multifd_tcp_none);
     /*
-- 
2.40.0



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/6] tests/qtest: replace qmp_discard_response with qtest_qmp_assert_success
  2023-04-21 17:14 ` [PATCH v2 1/6] tests/qtest: replace qmp_discard_response with qtest_qmp_assert_success Daniel P. Berrangé
@ 2023-04-21 21:52   ` Juan Quintela
  2023-04-23  2:22   ` Zhang, Chen
  1 sibling, 0 replies; 26+ messages in thread
From: Juan Quintela @ 2023-04-21 21:52 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, qemu-block, Paolo Bonzini, Thomas Huth, John Snow,
	Li Zhijian, Stefan Hajnoczi, Zhang Chen, Laurent Vivier

Daniel P. Berrangé <berrange@redhat.com> wrote:
> The qmp_discard_response method simply ignores the result of the QMP
> command, merely unref'ing the object. This is a bad idea for tests
> as it leaves no trace if the QMP command unexpectedly failed. The
> qtest_qmp_assert_success method will validate that the QMP command
> returned without error, and if errors occur, it will print a message
> on the console aiding debugging.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  tests/qtest/ahci-test.c              | 31 ++++++++++++++--------------
>  tests/qtest/boot-order-test.c        |  5 +----
>  tests/qtest/fdc-test.c               | 15 +++++++-------
>  tests/qtest/ide-test.c               |  5 +----
>  tests/qtest/migration-test.c         |  5 +----
>  tests/qtest/test-filter-mirror.c     |  5 +----
>  tests/qtest/test-filter-redirector.c |  7 ++-----
>  tests/qtest/virtio-blk-test.c        | 24 ++++++++++-----------
>  8 files changed, 40 insertions(+), 57 deletions(-)

Reviewed-by: Juan Quintela <quintela@redhat.com>

> -/* TODO actually test the results and get rid of this */
> -#define qmp_discard_response(s, ...) qobject_unref(qtest_qmp(s, __VA_ARGS__))

As it couldn't be otherwise, all bad patterns are copied.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/6] tests/qtests: remove migration test iterations config
  2023-04-21 17:14 ` [PATCH v2 2/6] tests/qtests: remove migration test iterations config Daniel P. Berrangé
@ 2023-04-21 21:54   ` Juan Quintela
  2023-04-26  9:07     ` Daniel P. Berrangé
  0 siblings, 1 reply; 26+ messages in thread
From: Juan Quintela @ 2023-04-21 21:54 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, qemu-block, Paolo Bonzini, Thomas Huth, John Snow,
	Li Zhijian, Stefan Hajnoczi, Zhang Chen, Laurent Vivier

Daniel P. Berrangé <berrange@redhat.com> wrote:
> The 'unsigned int interations' config for migration is somewhat
> overkill. Most tests don't set it, and a value of '0' is treated
> as equivalent to '1'. The only test that does set it, xbzrle,
> used a value of '2'.
>
> This setting, however, only relates to the migration iterations
> that take place prior to allowing convergence. IOW, on top of
> this iteration count, there is always at least 1 further migration
> iteration done to deal with pages that are dirtied during the
> previous iteration(s).
>
> IOW, even with iterations==1, the xbzrle test will be running for
> a minimum of 2 iterations. With this in mind we can simplify the
> code and just get rid of the special case.

Perhaps the old code was already wrong, but we need at least three
iterations for the xbzrle test:
- 1st iteration: xbzrle is not used, nothing is on cache.
- 2nd iteration: pages are put into cache, no xbzrle is used because
  there is no previous page.
- 3rd iteration: We really use xbzrle now against the copy of the
  previous iterations.

And yes, this should be commented somewhere.

Later, Juan.





^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/6] tests/qtest: capture RESUME events during migration
  2023-04-21 17:14 ` [PATCH v2 3/6] tests/qtest: capture RESUME events during migration Daniel P. Berrangé
@ 2023-04-21 21:59   ` Juan Quintela
  2023-04-24  9:53     ` Daniel P. Berrangé
  0 siblings, 1 reply; 26+ messages in thread
From: Juan Quintela @ 2023-04-21 21:59 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, qemu-block, Paolo Bonzini, Thomas Huth, John Snow,
	Li Zhijian, Stefan Hajnoczi, Zhang Chen, Laurent Vivier

Daniel P. Berrangé <berrange@redhat.com> wrote:
> When running migration tests we monitor for a STOP event so we can skip
> redundant waits. This will be needed for the RESUME event too shortly.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

i.e. it is better that what we have now.

But


> diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> index f6f3c6680f..61396335cc 100644
> --- a/tests/qtest/migration-helpers.c
> +++ b/tests/qtest/migration-helpers.c
> @@ -24,14 +24,20 @@
>  #define MIGRATION_STATUS_WAIT_TIMEOUT 120
>  
>  bool got_stop;
> +bool got_resume;
>  
> -static void check_stop_event(QTestState *who)
> +static void check_events(QTestState *who)
>  {
>      QDict *event = qtest_qmp_event_ref(who, "STOP");
>      if (event) {
>          got_stop = true;
>          qobject_unref(event);
>      }
> +    event = qtest_qmp_event_ref(who, "RESUME");
> +    if (event) {
> +        got_resume = true;
> +        qobject_unref(event);
> +    }
>  }

What happens if we receive the events in the order RESUME/STOP (I mean
in the big scheme of things, not that it makes sense in this particular
case).

QDict *qtest_qmp_event_ref(QTestState *s, const char *event)
{
    while (s->pending_events) {

        GList *first = s->pending_events;
        QDict *response = (QDict *)first->data;

        s->pending_events = g_list_delete_link(s->pending_events, first);

        if (!strcmp(qdict_get_str(response, "event"), event)) {
            return response;
        }
        qobject_unref(response);
    }
    return NULL;
}

if we don't found the event that we are searching for, we just drop it.
Does this makes sense if we are searching only for more than one event?

Later, Juan.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 4/6] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-04-21 17:14 ` [PATCH v2 4/6] tests/qtest: make more migration pre-copy scenarios run non-live Daniel P. Berrangé
@ 2023-04-21 22:06   ` Juan Quintela
  2023-04-24 21:01   ` Fabiano Rosas
  1 sibling, 0 replies; 26+ messages in thread
From: Juan Quintela @ 2023-04-21 22:06 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, qemu-block, Paolo Bonzini, Thomas Huth, John Snow,
	Li Zhijian, Stefan Hajnoczi, Zhang Chen, Laurent Vivier

Daniel P. Berrangé <berrange@redhat.com> wrote:
> There are 27 pre-copy live migration scenarios being tested. In all of
> these we force non-convergance and run for one iteration, then let it
> converge and wait for completion during the second (or following)
> iterations. At 3 mbps bandwidth limit the first iteration takes a very
> long time (~30 seconds).
>
> While it is important to test the migration passes and convergance
> logic, it is overkill to do this for all 27 pre-copy scenarios. The
> TLS migration scenarios in particular are merely exercising different
> code paths during connection establishment.
>
> To optimize time taken, switch most of the test scenarios to run
> non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> a massive speed up for most of the test scenarios.
>
> For test coverage the following scenarios are unchanged
>
>  * Precopy with UNIX sockets
>  * Precopy with UNIX sockets and dirty ring tracking
>  * Precopy with XBZRLE
>  * Precopy with multifd
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

It is "infinitely" better that what we have.

But I wonder if we can do better.  We could just add a migration
parameter that says _don't_ complete, continue running.  We have
(almost) all of the functionality that we need for colo, just not an
easy way to set it up.

Just food for thought.

Later, Juan.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/6] tests/qtest: massively speed up migration-tet
  2023-04-21 17:14 ` [PATCH v2 5/6] tests/qtest: massively speed up migration-tet Daniel P. Berrangé
@ 2023-04-21 22:15   ` Juan Quintela
  0 siblings, 0 replies; 26+ messages in thread
From: Juan Quintela @ 2023-04-21 22:15 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, qemu-block, Paolo Bonzini, Thomas Huth, John Snow,
	Li Zhijian, Stefan Hajnoczi, Zhang Chen, Laurent Vivier

Daniel P. Berrangé <berrange@redhat.com> wrote:
> The migration test cases that actually exercise live migration want to
> ensure there is a minimum of two iterations of pre-copy, in order to
> exercise the dirty tracking code.
>
> Historically we've queried the migration status, looking for the
> 'dirty-sync-count' value to increment to track iterations. This was
> not entirely reliable because often all the data would get transferred
> quickly enough that the migration would finish before we wanted it
> to. So we massively dropped the bandwidth and max downtime to
> guarantee non-convergance. This had the unfortunate side effect
> that every migration took at least 30 seconds to run (100 MB of
> dirty pages / 3 MB/sec).
>
> This optimization takes a different approach to ensuring that a
> mimimum of two iterations. Rather than waiting for dirty-sync-count
> to increment, directly look for an indication that the source VM
> has dirtied RAM that has already been transferred.
>
> On the source VM a magic marker is written just after the 3 MB
> offset. The destination VM is now montiored to detect when the
> magic marker is transferred. This gives a guarantee that the
> first 3 MB of memory have been transferred. Now the source VM
> memory is monitored at exactly the 3MB offset until we observe
> a flip in its value. This gives us a guaranteed that the guest
> workload has dirtied a byte that has already been transferred.
>
> Since we're looking at a place that is only 3 MB from the start
> of memory, with the 3 MB/sec bandwidth, this test should complete
> in 1 second, instead of 30 seconds.
>
> Once we've proved there is some dirty memory, migration can be
> set back to full speed for the remainder of the 1st iteration,
> and the entire of the second iteration at which point migration
> should be complete.
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Hi

I think this is not enough.  As said before:
- xbzrle needs 3 iterations
- auto converge needs around 12 iterations (forgot) the exact number,
  but it is a lot.
- for (almost) all the rest of the tests, we don't really care, we just
  need the migration to finish.

One easy way to "test" it is: Change the "meaning" of ZERO downtime to
mean that we don't want to enter the completion stage, just continue
sending data.

Changig this in qemu:

modified   migration/migration.c
@@ -2726,6 +2726,9 @@ static MigIterateState migration_iteration_run(MigrationState *s)
 
     trace_migrate_pending_estimate(pending_size, must_precopy, can_postcopy);
 
+    if (s->threshold_size == 0) {
+        return MIG_ITERATE_RESUME;
+    }
     if (must_precopy <= s->threshold_size) {
         qemu_savevm_state_pending_exact(&must_precopy, &can_postcopy);
         pending_size = must_precopy + can_postcopy;

And just setting the downtime to zero should be enough.

It is too late, so before I start with this, what do you think?

Later, Juan.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v2 1/6] tests/qtest: replace qmp_discard_response with qtest_qmp_assert_success
  2023-04-21 17:14 ` [PATCH v2 1/6] tests/qtest: replace qmp_discard_response with qtest_qmp_assert_success Daniel P. Berrangé
  2023-04-21 21:52   ` Juan Quintela
@ 2023-04-23  2:22   ` Zhang, Chen
  1 sibling, 0 replies; 26+ messages in thread
From: Zhang, Chen @ 2023-04-23  2:22 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel@nongnu.org
  Cc: qemu-block@nongnu.org, Paolo Bonzini, Thomas Huth, John Snow,
	Li Zhijian, Juan Quintela, Stefan Hajnoczi, Laurent Vivier



> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Saturday, April 22, 2023 1:14 AM
> To: qemu-devel@nongnu.org
> Cc: qemu-block@nongnu.org; Paolo Bonzini <pbonzini@redhat.com>;
> Thomas Huth <thuth@redhat.com>; John Snow <jsnow@redhat.com>; Li
> Zhijian <lizhijian@fujitsu.com>; Juan Quintela <quintela@redhat.com>;
> Stefan Hajnoczi <stefanha@redhat.com>; Zhang, Chen
> <chen.zhang@intel.com>; Laurent Vivier <lvivier@redhat.com>; Daniel P.
> Berrangé <berrange@redhat.com>
> Subject: [PATCH v2 1/6] tests/qtest: replace qmp_discard_response with
> qtest_qmp_assert_success
> 
> The qmp_discard_response method simply ignores the result of the QMP
> command, merely unref'ing the object. This is a bad idea for tests as it leaves
> no trace if the QMP command unexpectedly failed. The
> qtest_qmp_assert_success method will validate that the QMP command
> returned without error, and if errors occur, it will print a message on the
> console aiding debugging.
> 
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>

Reviewed-by: Zhang Chen <chen.zhang@intel.com>

> ---
>  tests/qtest/ahci-test.c              | 31 ++++++++++++++--------------
>  tests/qtest/boot-order-test.c        |  5 +----
>  tests/qtest/fdc-test.c               | 15 +++++++-------
>  tests/qtest/ide-test.c               |  5 +----
>  tests/qtest/migration-test.c         |  5 +----
>  tests/qtest/test-filter-mirror.c     |  5 +----
>  tests/qtest/test-filter-redirector.c |  7 ++-----
>  tests/qtest/virtio-blk-test.c        | 24 ++++++++++-----------
>  8 files changed, 40 insertions(+), 57 deletions(-)
> 
> diff --git a/tests/qtest/ahci-test.c b/tests/qtest/ahci-test.c index
> 1967cd5898..abab761c26 100644
> --- a/tests/qtest/ahci-test.c
> +++ b/tests/qtest/ahci-test.c
> @@ -36,9 +36,6 @@
>  #include "hw/pci/pci_ids.h"
>  #include "hw/pci/pci_regs.h"
> 
> -/* TODO actually test the results and get rid of this */ -#define
> qmp_discard_response(s, ...) qobject_unref(qtest_qmp(s, __VA_ARGS__))
> -
>  /* Test images sizes in MB */
>  #define TEST_IMAGE_SIZE_MB_LARGE (200 * 1024)  #define
> TEST_IMAGE_SIZE_MB_SMALL 64 @@ -1595,9 +1592,9 @@ static void
> test_atapi_tray(void)
>      rsp = qtest_qmp_receive(ahci->parent->qts);
>      qobject_unref(rsp);
> 
> -    qmp_discard_response(ahci->parent->qts,
> -                         "{'execute': 'blockdev-remove-medium', "
> -                         "'arguments': {'id': 'cd0'}}");
> +    qtest_qmp_assert_success(ahci->parent->qts,
> +                             "{'execute': 'blockdev-remove-medium', "
> +                             "'arguments': {'id': 'cd0'}}");
> 
>      /* Test the tray without a medium */
>      ahci_atapi_load(ahci, port);
> @@ -1607,16 +1604,18 @@ static void test_atapi_tray(void)
>      atapi_wait_tray(ahci, true);
> 
>      /* Re-insert media */
> -    qmp_discard_response(ahci->parent->qts,
> -                         "{'execute': 'blockdev-add', "
> -                         "'arguments': {'node-name': 'node0', "
> -                                        "'driver': 'raw', "
> -                                        "'file': { 'driver': 'file', "
> -                                                  "'filename': %s }}}", iso);
> -    qmp_discard_response(ahci->parent->qts,
> -                         "{'execute': 'blockdev-insert-medium',"
> -                         "'arguments': { 'id': 'cd0', "
> -                                         "'node-name': 'node0' }}");
> +    qtest_qmp_assert_success(
> +        ahci->parent->qts,
> +        "{'execute': 'blockdev-add', "
> +        "'arguments': {'node-name': 'node0', "
> +                      "'driver': 'raw', "
> +                      "'file': { 'driver': 'file', "
> +                                "'filename': %s }}}", iso);
> +    qtest_qmp_assert_success(
> +        ahci->parent->qts,
> +        "{'execute': 'blockdev-insert-medium',"
> +        "'arguments': { 'id': 'cd0', "
> +                       "'node-name': 'node0' }}");
> 
>      /* Again, the event shows up first */
>      qtest_qmp_send(ahci->parent->qts, "{'execute': 'blockdev-close-tray', "
> diff --git a/tests/qtest/boot-order-test.c b/tests/qtest/boot-order-test.c
> index 0680d79d6d..8f2b6ef05a 100644
> --- a/tests/qtest/boot-order-test.c
> +++ b/tests/qtest/boot-order-test.c
> @@ -16,9 +16,6 @@
>  #include "qapi/qmp/qdict.h"
>  #include "standard-headers/linux/qemu_fw_cfg.h"
> 
> -/* TODO actually test the results and get rid of this */ -#define
> qmp_discard_response(qs, ...) qobject_unref(qtest_qmp(qs, __VA_ARGS__))
> -
>  typedef struct {
>      const char *args;
>      uint64_t expected_boot;
> @@ -43,7 +40,7 @@ static void test_a_boot_order(const char *machine,
>                        machine ?: "", test_args);
>      actual = read_boot_order(qts);
>      g_assert_cmphex(actual, ==, expected_boot);
> -    qmp_discard_response(qts, "{ 'execute': 'system_reset' }");
> +    qtest_qmp_assert_success(qts, "{ 'execute': 'system_reset' }");
>      /*
>       * system_reset only requests reset.  We get a RESET event after
>       * the actual reset completes.  Need to wait for that.
> diff --git a/tests/qtest/fdc-test.c b/tests/qtest/fdc-test.c index
> 1f9b99ad6d..5e8fbda9df 100644
> --- a/tests/qtest/fdc-test.c
> +++ b/tests/qtest/fdc-test.c
> @@ -28,9 +28,6 @@
>  #include "libqtest-single.h"
>  #include "qapi/qmp/qdict.h"
> 
> -/* TODO actually test the results and get rid of this */ -#define
> qmp_discard_response(...) qobject_unref(qmp(__VA_ARGS__))
> -
>  #define DRIVE_FLOPPY_BLANK \
>      "-drive if=floppy,file=null-co://,file.read-
> zeroes=on,format=raw,size=1440k"
> 
> @@ -304,9 +301,10 @@ static void test_media_insert(void)
> 
>      /* Insert media in drive. DSKCHK should not be reset until a step pulse
>       * is sent. */
> -    qmp_discard_response("{'execute':'blockdev-change-medium',
> 'arguments':{"
> -                         " 'id':'floppy0', 'filename': %s, 'format': 'raw' }}",
> -                         test_image);
> +    qtest_qmp_assert_success(global_qtest,
> +                             "{'execute':'blockdev-change-medium', 'arguments':{"
> +                             " 'id':'floppy0', 'filename': %s, 'format': 'raw' }}",
> +                             test_image);
> 
>      dir = inb(FLOPPY_BASE + reg_dir);
>      assert_bit_set(dir, DSKCHG);
> @@ -335,8 +333,9 @@ static void test_media_change(void)
> 
>      /* Eject the floppy and check that DSKCHG is set. Reading it out doesn't
>       * reset the bit. */
> -    qmp_discard_response("{'execute':'eject', 'arguments':{"
> -                         " 'id':'floppy0' }}");
> +    qtest_qmp_assert_success(global_qtest,
> +                             "{'execute':'eject', 'arguments':{"
> +                             " 'id':'floppy0' }}");
> 
>      dir = inb(FLOPPY_BASE + reg_dir);
>      assert_bit_set(dir, DSKCHG);
> diff --git a/tests/qtest/ide-test.c b/tests/qtest/ide-test.c index
> dcb050bf9b..d6b4f6e36a 100644
> --- a/tests/qtest/ide-test.c
> +++ b/tests/qtest/ide-test.c
> @@ -34,9 +34,6 @@
>  #include "hw/pci/pci_ids.h"
>  #include "hw/pci/pci_regs.h"
> 
> -/* TODO actually test the results and get rid of this */ -#define
> qmp_discard_response(q, ...) qobject_unref(qtest_qmp(q, __VA_ARGS__))
> -
>  #define TEST_IMAGE_SIZE 64 * 1024 * 1024
> 
>  #define IDE_PCI_DEV     1
> @@ -766,7 +763,7 @@ static void test_pci_retry_flush(void)
>      qtest_qmp_eventwait(qts, "STOP");
> 
>      /* Complete the command */
> -    qmp_discard_response(qts, "{'execute':'cont' }");
> +    qtest_qmp_assert_success(qts, "{'execute':'cont' }");
> 
>      /* Check registers */
>      data = qpci_io_readb(dev, ide_bar, reg_device); diff --git
> a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index
> 3b615b0da9..ac2e8ecac6 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -40,9 +40,6 @@
>  #include "linux/kvm.h"
>  #endif
> 
> -/* TODO actually test the results and get rid of this */ -#define
> qtest_qmp_discard_response(...) qobject_unref(qtest_qmp(__VA_ARGS__))
> -
>  unsigned start_address;
>  unsigned end_address;
>  static bool uffd_feature_thread_id;
> @@ -731,7 +728,7 @@ static void test_migrate_end(QTestState *from,
> QTestState *to, bool test_dest)
>              usleep(1000 * 10);
>          } while (dest_byte_a == dest_byte_b);
> 
> -        qtest_qmp_discard_response(to, "{ 'execute' : 'stop'}");
> +        qtest_qmp_assert_success(to, "{ 'execute' : 'stop'}");
> 
>          /* With it stopped, check nothing changes */
>          qtest_memread(to, start_address, &dest_byte_c, 1); diff --git
> a/tests/qtest/test-filter-mirror.c b/tests/qtest/test-filter-mirror.c
> index 248fc88699..adeada3eb8 100644
> --- a/tests/qtest/test-filter-mirror.c
> +++ b/tests/qtest/test-filter-mirror.c
> @@ -16,9 +16,6 @@
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
> 
> -/* TODO actually test the results and get rid of this */ -#define
> qmp_discard_response(qs, ...) qobject_unref(qtest_qmp(qs, __VA_ARGS__))
> -
>  static void test_mirror(void)
>  {
>      int send_sock[2], recv_sock[2];
> @@ -52,7 +49,7 @@ static void test_mirror(void)
>      };
> 
>      /* send a qmp command to guarantee that 'connected' is setting to true.
> */
> -    qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
> +    qtest_qmp_assert_success(qts, "{ 'execute' : 'query-status'}");
>      ret = iov_send(send_sock[0], iov, 2, 0, sizeof(size) + sizeof(send_buf));
>      g_assert_cmpint(ret, ==, sizeof(send_buf) + sizeof(size));
>      close(send_sock[0]);
> diff --git a/tests/qtest/test-filter-redirector.c b/tests/qtest/test-filter-
> redirector.c
> index 24ca9280f8..e72e3b7873 100644
> --- a/tests/qtest/test-filter-redirector.c
> +++ b/tests/qtest/test-filter-redirector.c
> @@ -58,9 +58,6 @@
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
> 
> -/* TODO actually test the results and get rid of this */ -#define
> qmp_discard_response(qs, ...) qobject_unref(qtest_qmp(qs, __VA_ARGS__))
> -
>  static void test_redirector_tx(void)
>  {
>      int backend_sock[2], recv_sock;
> @@ -98,7 +95,7 @@ static void test_redirector_tx(void)
>      g_assert_cmpint(recv_sock, !=, -1);
> 
>      /* send a qmp command to guarantee that 'connected' is setting to true.
> */
> -    qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
> +    qtest_qmp_assert_success(qts, "{ 'execute' : 'query-status'}");
> 
>      struct iovec iov[] = {
>          {
> @@ -176,7 +173,7 @@ static void test_redirector_rx(void)
>      send_sock = unix_connect(sock_path1, NULL);
>      g_assert_cmpint(send_sock, !=, -1);
>      /* send a qmp command to guarantee that 'connected' is setting to true.
> */
> -    qmp_discard_response(qts, "{ 'execute' : 'query-status'}");
> +    qtest_qmp_assert_success(qts, "{ 'execute' : 'query-status'}");
> 
>      ret = iov_send(send_sock, iov, 2, 0, sizeof(size) + sizeof(send_buf));
>      g_assert_cmpint(ret, ==, sizeof(send_buf) + sizeof(size)); diff --git
> a/tests/qtest/virtio-blk-test.c b/tests/qtest/virtio-blk-test.c index
> 19c01f808b..98c906ebb4 100644
> --- a/tests/qtest/virtio-blk-test.c
> +++ b/tests/qtest/virtio-blk-test.c
> @@ -17,9 +17,6 @@
>  #include "libqos/qgraph.h"
>  #include "libqos/virtio-blk.h"
> 
> -/* TODO actually test the results and get rid of this */ -#define
> qmp_discard_response(...) qobject_unref(qmp(__VA_ARGS__))
> -
>  #define TEST_IMAGE_SIZE         (64 * 1024 * 1024)
>  #define QVIRTIO_BLK_TIMEOUT_US  (30 * 1000 * 1000)
>  #define PCI_SLOT_HP             0x06
> @@ -453,9 +450,10 @@ static void config(void *obj, void *data,
> QGuestAllocator *t_alloc)
> 
>      qvirtio_set_driver_ok(dev);
> 
> -    qmp_discard_response("{ 'execute': 'block_resize', "
> -                         " 'arguments': { 'device': 'drive0', "
> -                         " 'size': %d } }", n_size);
> +    qtest_qmp_assert_success(global_qtest,
> +                             "{ 'execute': 'block_resize', "
> +                             " 'arguments': { 'device': 'drive0', "
> +                             " 'size': %d } }", n_size);
>      qvirtio_wait_config_isr(dev, QVIRTIO_BLK_TIMEOUT_US);
> 
>      capacity = qvirtio_config_readq(dev, 0); @@ -502,9 +500,10 @@ static
> void msix(void *obj, void *u_data, QGuestAllocator *t_alloc)
> 
>      qvirtio_set_driver_ok(dev);
> 
> -    qmp_discard_response("{ 'execute': 'block_resize', "
> -                         " 'arguments': { 'device': 'drive0', "
> -                         " 'size': %d } }", n_size);
> +    qtest_qmp_assert_success(global_qtest,
> +                             "{ 'execute': 'block_resize', "
> +                             " 'arguments': { 'device': 'drive0', "
> +                             " 'size': %d } }", n_size);
> 
>      qvirtio_wait_config_isr(dev, QVIRTIO_BLK_TIMEOUT_US);
> 
> @@ -758,9 +757,10 @@ static void resize(void *obj, void *data,
> QGuestAllocator *t_alloc)
> 
>      vq = test_basic(dev, t_alloc);
> 
> -    qmp_discard_response("{ 'execute': 'block_resize', "
> -                         " 'arguments': { 'device': 'drive0', "
> -                         " 'size': %d } }", n_size);
> +    qtest_qmp_assert_success(global_qtest,
> +                             "{ 'execute': 'block_resize', "
> +                             " 'arguments': { 'device': 'drive0', "
> +                             " 'size': %d } }", n_size);
> 
>      qvirtio_wait_queue_isr(qts, dev, vq, QVIRTIO_BLK_TIMEOUT_US);
> 
> --
> 2.40.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v2 6/6] tests/migration: Only run auto_converge in slow mode
  2023-04-21 17:14 ` [PATCH v2 6/6] tests/migration: Only run auto_converge in slow mode Daniel P. Berrangé
@ 2023-04-23  2:41   ` Zhang, Chen
  2023-04-24  5:58     ` Juan Quintela
  2023-04-24  8:06   ` Zhang, Chen
  1 sibling, 1 reply; 26+ messages in thread
From: Zhang, Chen @ 2023-04-23  2:41 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel@nongnu.org
  Cc: qemu-block@nongnu.org, Paolo Bonzini, Thomas Huth, John Snow,
	Li Zhijian, Juan Quintela, Stefan Hajnoczi, Laurent Vivier



> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Saturday, April 22, 2023 1:14 AM
> To: qemu-devel@nongnu.org
> Cc: qemu-block@nongnu.org; Paolo Bonzini <pbonzini@redhat.com>;
> Thomas Huth <thuth@redhat.com>; John Snow <jsnow@redhat.com>; Li
> Zhijian <lizhijian@fujitsu.com>; Juan Quintela <quintela@redhat.com>;
> Stefan Hajnoczi <stefanha@redhat.com>; Zhang, Chen
> <chen.zhang@intel.com>; Laurent Vivier <lvivier@redhat.com>
> Subject: [PATCH v2 6/6] tests/migration: Only run auto_converge in slow
> mode
> 

What kind of scenario will the qtest open this g_test_init() -m slow to trigger the slow mode?

Thanks
Chen

> From: Juan Quintela <quintela@redhat.com>
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>
> ---
>  tests/qtest/migration-test.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)
> 
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c index
> 63bd8a1893..9ed178aa03 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -1877,6 +1877,21 @@ static void test_validate_uuid_dst_not_set(void)
>      do_test_validate_uuid(&args, false);  }
> 
> +/*
> + * The way auto_converge works, we need to do too many passes to
> + * run this test.  Auto_converge logic is only run once every
> + * three iterations, so:
> + *
> + * - 3 iterations without auto_converge enabled
> + * - 3 iterations with pct = 5
> + * - 3 iterations with pct = 30
> + * - 3 iterations with pct = 55
> + * - 3 iterations with pct = 80
> + * - 3 iterations with pct = 95 (max(95, 80 + 25))
> + *
> + * To make things even worse, we need to run the initial stage at
> + * 3MB/s so we enter autoconverge even when host is (over)loaded.
> + */
>  static void test_migrate_auto_converge(void)  {
>      g_autofree char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs); @@ -
> 2660,8 +2675,12 @@ int main(int argc, char **argv)
>                     test_validate_uuid_src_not_set);
>      qtest_add_func("/migration/validate_uuid_dst_not_set",
>                     test_validate_uuid_dst_not_set);
> -
> -    qtest_add_func("/migration/auto_converge",
> test_migrate_auto_converge);
> +    /*
> +     * See explanation why this test is slow on function definition
> +     */
> +    if (g_test_slow()) {
> +        qtest_add_func("/migration/auto_converge",
> test_migrate_auto_converge);
> +    }
>      qtest_add_func("/migration/multifd/tcp/plain/none",
>                     test_multifd_tcp_none);
>      /*
> --
> 2.40.0



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 6/6] tests/migration: Only run auto_converge in slow mode
  2023-04-23  2:41   ` Zhang, Chen
@ 2023-04-24  5:58     ` Juan Quintela
  2023-04-24  6:56       ` Thomas Huth
  0 siblings, 1 reply; 26+ messages in thread
From: Juan Quintela @ 2023-04-24  5:58 UTC (permalink / raw)
  To: Zhang, Chen
  Cc: Daniel P. Berrangé, qemu-devel@nongnu.org,
	qemu-block@nongnu.org, Paolo Bonzini, Thomas Huth, John Snow,
	Li Zhijian, Stefan Hajnoczi, Laurent Vivier

"Zhang, Chen" <chen.zhang@intel.com> wrote:
>> -----Original Message-----
>> From: Daniel P. Berrangé <berrange@redhat.com>
>> Sent: Saturday, April 22, 2023 1:14 AM
>> To: qemu-devel@nongnu.org
>> Cc: qemu-block@nongnu.org; Paolo Bonzini <pbonzini@redhat.com>;
>> Thomas Huth <thuth@redhat.com>; John Snow <jsnow@redhat.com>; Li
>> Zhijian <lizhijian@fujitsu.com>; Juan Quintela <quintela@redhat.com>;
>> Stefan Hajnoczi <stefanha@redhat.com>; Zhang, Chen
>> <chen.zhang@intel.com>; Laurent Vivier <lvivier@redhat.com>
>> Subject: [PATCH v2 6/6] tests/migration: Only run auto_converge in slow
>> mode
>> 
>
> What kind of scenario will the qtest open this g_test_init() -m slow to trigger the slow mode?

The only way that I know is:

export G_TEST_SLOW=1
make check (or whatever individual test that you want)

Later, Juan.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 6/6] tests/migration: Only run auto_converge in slow mode
  2023-04-24  5:58     ` Juan Quintela
@ 2023-04-24  6:56       ` Thomas Huth
  2023-04-24  8:05         ` Zhang, Chen
  0 siblings, 1 reply; 26+ messages in thread
From: Thomas Huth @ 2023-04-24  6:56 UTC (permalink / raw)
  To: quintela, Zhang, Chen
  Cc: Daniel P. Berrangé, qemu-devel@nongnu.org,
	qemu-block@nongnu.org, Paolo Bonzini, John Snow, Li Zhijian,
	Stefan Hajnoczi, Laurent Vivier

On 24/04/2023 07.58, Juan Quintela wrote:
> "Zhang, Chen" <chen.zhang@intel.com> wrote:
>>> -----Original Message-----
>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>> Sent: Saturday, April 22, 2023 1:14 AM
>>> To: qemu-devel@nongnu.org
>>> Cc: qemu-block@nongnu.org; Paolo Bonzini <pbonzini@redhat.com>;
>>> Thomas Huth <thuth@redhat.com>; John Snow <jsnow@redhat.com>; Li
>>> Zhijian <lizhijian@fujitsu.com>; Juan Quintela <quintela@redhat.com>;
>>> Stefan Hajnoczi <stefanha@redhat.com>; Zhang, Chen
>>> <chen.zhang@intel.com>; Laurent Vivier <lvivier@redhat.com>
>>> Subject: [PATCH v2 6/6] tests/migration: Only run auto_converge in slow
>>> mode
>>>
>>
>> What kind of scenario will the qtest open this g_test_init() -m slow to trigger the slow mode?
> 
> The only way that I know is:
> 
> export G_TEST_SLOW=1
> make check (or whatever individual test that you want)

Or even simpler:

  make check SPEED=slow

Or if you want to run the test manually:

  QTEST_QEMU_BINARY=./qemu-system-x86_64 \
  tests/qtest/migration-test -m slow

HTH,
  Thomas



^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v2 6/6] tests/migration: Only run auto_converge in slow mode
  2023-04-24  6:56       ` Thomas Huth
@ 2023-04-24  8:05         ` Zhang, Chen
  0 siblings, 0 replies; 26+ messages in thread
From: Zhang, Chen @ 2023-04-24  8:05 UTC (permalink / raw)
  To: Thomas Huth, quintela@redhat.com
  Cc: Daniel P. Berrangé, qemu-devel@nongnu.org,
	qemu-block@nongnu.org, Paolo Bonzini, John Snow, Li Zhijian,
	Stefan Hajnoczi, Laurent Vivier



> -----Original Message-----
> From: Thomas Huth <thuth@redhat.com>
> Sent: Monday, April 24, 2023 2:56 PM
> To: quintela@redhat.com; Zhang, Chen <chen.zhang@intel.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>; qemu-devel@nongnu.org;
> qemu-block@nongnu.org; Paolo Bonzini <pbonzini@redhat.com>; John
> Snow <jsnow@redhat.com>; Li Zhijian <lizhijian@fujitsu.com>; Stefan
> Hajnoczi <stefanha@redhat.com>; Laurent Vivier <lvivier@redhat.com>
> Subject: Re: [PATCH v2 6/6] tests/migration: Only run auto_converge in slow
> mode
> 
> On 24/04/2023 07.58, Juan Quintela wrote:
> > "Zhang, Chen" <chen.zhang@intel.com> wrote:
> >>> -----Original Message-----
> >>> From: Daniel P. Berrangé <berrange@redhat.com>
> >>> Sent: Saturday, April 22, 2023 1:14 AM
> >>> To: qemu-devel@nongnu.org
> >>> Cc: qemu-block@nongnu.org; Paolo Bonzini <pbonzini@redhat.com>;
> >>> Thomas Huth <thuth@redhat.com>; John Snow <jsnow@redhat.com>; Li
> >>> Zhijian <lizhijian@fujitsu.com>; Juan Quintela
> >>> <quintela@redhat.com>; Stefan Hajnoczi <stefanha@redhat.com>;
> Zhang,
> >>> Chen <chen.zhang@intel.com>; Laurent Vivier <lvivier@redhat.com>
> >>> Subject: [PATCH v2 6/6] tests/migration: Only run auto_converge in
> >>> slow mode
> >>>
> >>
> >> What kind of scenario will the qtest open this g_test_init() -m slow to
> trigger the slow mode?
> >
> > The only way that I know is:
> >
> > export G_TEST_SLOW=1
> > make check (or whatever individual test that you want)
> 
> Or even simpler:
> 
>   make check SPEED=slow
> 
> Or if you want to run the test manually:
> 
>   QTEST_QEMU_BINARY=./qemu-system-x86_64 \
>   tests/qtest/migration-test -m slow
> 

Thanks for the explanation.

Thanks
Chen

> HTH,
>   Thomas


^ permalink raw reply	[flat|nested] 26+ messages in thread

* RE: [PATCH v2 6/6] tests/migration: Only run auto_converge in slow mode
  2023-04-21 17:14 ` [PATCH v2 6/6] tests/migration: Only run auto_converge in slow mode Daniel P. Berrangé
  2023-04-23  2:41   ` Zhang, Chen
@ 2023-04-24  8:06   ` Zhang, Chen
  1 sibling, 0 replies; 26+ messages in thread
From: Zhang, Chen @ 2023-04-24  8:06 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel@nongnu.org
  Cc: qemu-block@nongnu.org, Paolo Bonzini, Thomas Huth, John Snow,
	Li Zhijian, Juan Quintela, Stefan Hajnoczi, Laurent Vivier



> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Saturday, April 22, 2023 1:14 AM
> To: qemu-devel@nongnu.org
> Cc: qemu-block@nongnu.org; Paolo Bonzini <pbonzini@redhat.com>;
> Thomas Huth <thuth@redhat.com>; John Snow <jsnow@redhat.com>; Li
> Zhijian <lizhijian@fujitsu.com>; Juan Quintela <quintela@redhat.com>;
> Stefan Hajnoczi <stefanha@redhat.com>; Zhang, Chen
> <chen.zhang@intel.com>; Laurent Vivier <lvivier@redhat.com>
> Subject: [PATCH v2 6/6] tests/migration: Only run auto_converge in slow
> mode
> 
> From: Juan Quintela <quintela@redhat.com>
> 
> Signed-off-by: Juan Quintela <quintela@redhat.com>

LGTM.
Reviewed-by: Zhang Chen <chen.zhang@intel.com>

> ---
>  tests/qtest/migration-test.c | 23 +++++++++++++++++++++--
>  1 file changed, 21 insertions(+), 2 deletions(-)



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/6] tests/qtest: capture RESUME events during migration
  2023-04-21 21:59   ` Juan Quintela
@ 2023-04-24  9:53     ` Daniel P. Berrangé
  2023-05-26 11:56       ` Daniel P. Berrangé
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-04-24  9:53 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, qemu-block, Paolo Bonzini, Thomas Huth, John Snow,
	Li Zhijian, Stefan Hajnoczi, Zhang Chen, Laurent Vivier

On Fri, Apr 21, 2023 at 11:59:25PM +0200, Juan Quintela wrote:
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > When running migration tests we monitor for a STOP event so we can skip
> > redundant waits. This will be needed for the RESUME event too shortly.
> >
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> 
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> 
> i.e. it is better that what we have now.
> 
> But
> 
> 
> > diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> > index f6f3c6680f..61396335cc 100644
> > --- a/tests/qtest/migration-helpers.c
> > +++ b/tests/qtest/migration-helpers.c
> > @@ -24,14 +24,20 @@
> >  #define MIGRATION_STATUS_WAIT_TIMEOUT 120
> >  
> >  bool got_stop;
> > +bool got_resume;
> >  
> > -static void check_stop_event(QTestState *who)
> > +static void check_events(QTestState *who)
> >  {
> >      QDict *event = qtest_qmp_event_ref(who, "STOP");
> >      if (event) {
> >          got_stop = true;
> >          qobject_unref(event);
> >      }
> > +    event = qtest_qmp_event_ref(who, "RESUME");
> > +    if (event) {
> > +        got_resume = true;
> > +        qobject_unref(event);
> > +    }
> >  }
> 
> What happens if we receive the events in the order RESUME/STOP (I mean
> in the big scheme of things, not that it makes sense in this particular
> case).
> 
> QDict *qtest_qmp_event_ref(QTestState *s, const char *event)
> {
>     while (s->pending_events) {
> 
>         GList *first = s->pending_events;
>         QDict *response = (QDict *)first->data;
> 
>         s->pending_events = g_list_delete_link(s->pending_events, first);
> 
>         if (!strcmp(qdict_get_str(response, "event"), event)) {
>             return response;
>         }
>         qobject_unref(response);
>     }
>     return NULL;
> }
> 
> if we don't found the event that we are searching for, we just drop it.
> Does this makes sense if we are searching only for more than one event?

You are right about this code being broken in general for multiple events.

In this particular series though we're looking at STOP on the src host and
RESUME on the dst host, so there's no ordering problem to worry about.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 4/6] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-04-21 17:14 ` [PATCH v2 4/6] tests/qtest: make more migration pre-copy scenarios run non-live Daniel P. Berrangé
  2023-04-21 22:06   ` Juan Quintela
@ 2023-04-24 21:01   ` Fabiano Rosas
  2023-05-26 17:58     ` Daniel P. Berrangé
  1 sibling, 1 reply; 26+ messages in thread
From: Fabiano Rosas @ 2023-04-24 21:01 UTC (permalink / raw)
  To: Daniel P. Berrangé, qemu-devel
  Cc: qemu-block, Paolo Bonzini, Thomas Huth, John Snow, Li Zhijian,
	Juan Quintela, Stefan Hajnoczi, Zhang Chen, Laurent Vivier,
	Daniel P. Berrangé

Daniel P. Berrangé <berrange@redhat.com> writes:

> There are 27 pre-copy live migration scenarios being tested. In all of
> these we force non-convergance and run for one iteration, then let it
> converge and wait for completion during the second (or following)
> iterations. At 3 mbps bandwidth limit the first iteration takes a very
> long time (~30 seconds).
>
> While it is important to test the migration passes and convergance
> logic, it is overkill to do this for all 27 pre-copy scenarios. The
> TLS migration scenarios in particular are merely exercising different
> code paths during connection establishment.
>
> To optimize time taken, switch most of the test scenarios to run
> non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> a massive speed up for most of the test scenarios.
>
> For test coverage the following scenarios are unchanged
>
>  * Precopy with UNIX sockets
>  * Precopy with UNIX sockets and dirty ring tracking
>  * Precopy with XBZRLE
>  * Precopy with multifd
>
> Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> ---
>  tests/qtest/migration-test.c | 60 ++++++++++++++++++++++++++++++------
>  1 file changed, 50 insertions(+), 10 deletions(-)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 6492ffa7fe..40d0f75480 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -568,6 +568,9 @@ typedef struct {
>          MIG_TEST_FAIL_DEST_QUIT_ERR,
>      } result;
>  
> +    /* Whether the guest CPUs should be running during migration */
> +    bool live;
> +
>      /* Postcopy specific fields */
>      void *postcopy_data;
>      bool postcopy_preempt;
> @@ -1324,8 +1327,6 @@ static void test_precopy_common(MigrateCommon *args)
>          return;
>      }
>  
> -    migrate_ensure_non_converge(from);
> -
>      if (args->start_hook) {
>          data_hook = args->start_hook(from, to);
>      }
> @@ -1335,6 +1336,31 @@ static void test_precopy_common(MigrateCommon *args)
>          wait_for_serial("src_serial");
>      }
>  
> +    if (args->live) {
> +        /*
> +         * Testing live migration, we want to ensure that some
> +         * memory is re-dirtied after being transferred, so that
> +         * we exercise logic for dirty page handling. We achieve
> +         * this with a ridiculosly low bandwidth that guarantees
> +         * non-convergance.
> +         */
> +        migrate_ensure_non_converge(from);
> +    } else {
> +        /*
> +         * Testing non-live migration, we allow it to run at
> +         * full speed to ensure short test case duration.
> +         * For tests expected to fail, we don't need to
> +         * change anything.
> +         */
> +        if (args->result == MIG_TEST_SUCCEED) {
> +            qtest_qmp_assert_success(from, "{ 'execute' : 'stop'}");
> +            if (!got_stop) {
> +                qtest_qmp_eventwait(from, "STOP");
> +            }
> +            migrate_ensure_converge(from);
> +        }
> +    }
> +
>      if (!args->connect_uri) {
>          g_autofree char *local_connect_uri =
>              migrate_get_socket_address(to, "socket-address");
> @@ -1352,19 +1378,29 @@ static void test_precopy_common(MigrateCommon *args)
>              qtest_set_expected_status(to, EXIT_FAILURE);
>          }
>      } else {
> -        wait_for_migration_pass(from);
> +        if (args->live) {
> +            wait_for_migration_pass(from);
>  
> -        migrate_ensure_converge(from);
> +            migrate_ensure_converge(from);
>  
> -        /* We do this first, as it has a timeout to stop us
> -         * hanging forever if migration didn't converge */
> -        wait_for_migration_complete(from);
> +            /*
> +             * We do this first, as it has a timeout to stop us
> +             * hanging forever if migration didn't converge
> +             */
> +            wait_for_migration_complete(from);
> +
> +            if (!got_stop) {
> +                qtest_qmp_eventwait(from, "STOP");
> +            }
> +        } else {
> +            wait_for_migration_complete(from);
>  
> -        if (!got_stop) {
> -            qtest_qmp_eventwait(from, "STOP");
> +            qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");

I retested and the problem still persists. The issue is with this wait +
cont sequence:

wait_for_migration_complete(from);
qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");

We wait for the source to finish but by the time qmp_cont executes, the
dst is still INMIGRATE, autostart gets set and I never see the RESUME
event.

When the dst migration finishes the VM gets put in RUN_STATE_PAUSED (at
process_incoming_migration_bh):

    if (!global_state_received() ||
        global_state_get_runstate() == RUN_STATE_RUNNING) {
        if (autostart) {
            vm_start();
        } else {
            runstate_set(RUN_STATE_PAUSED);
        }
    } else if (migration_incoming_colo_enabled()) {
        migration_incoming_disable_colo();
        vm_start();
    } else {
        runstate_set(global_state_get_runstate());  <-- HERE
    }

Do we need to add something to that routine like this?

    if (autostart &&
        global_state_get_runstate() != RUN_STATE_RUNNING) {
        vm_start();
    }

Otherwise it seems we'll just ignore a 'cont' that was received when the
migration is still ongoing.

>          }
>  
> -        qtest_qmp_eventwait(to, "RESUME");
> +        if (!got_resume) {
> +            qtest_qmp_eventwait(to, "RESUME");
> +        }
>  
>          wait_for_serial("dest_serial");
>      }


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/6] tests/qtests: remove migration test iterations config
  2023-04-21 21:54   ` Juan Quintela
@ 2023-04-26  9:07     ` Daniel P. Berrangé
  2023-04-26  9:42       ` Juan Quintela
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-04-26  9:07 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, qemu-block, Paolo Bonzini, Thomas Huth, John Snow,
	Li Zhijian, Stefan Hajnoczi, Zhang Chen, Laurent Vivier

On Fri, Apr 21, 2023 at 11:54:55PM +0200, Juan Quintela wrote:
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > The 'unsigned int interations' config for migration is somewhat
> > overkill. Most tests don't set it, and a value of '0' is treated
> > as equivalent to '1'. The only test that does set it, xbzrle,
> > used a value of '2'.
> >
> > This setting, however, only relates to the migration iterations
> > that take place prior to allowing convergence. IOW, on top of
> > this iteration count, there is always at least 1 further migration
> > iteration done to deal with pages that are dirtied during the
> > previous iteration(s).
> >
> > IOW, even with iterations==1, the xbzrle test will be running for
> > a minimum of 2 iterations. With this in mind we can simplify the
> > code and just get rid of the special case.
> 
> Perhaps the old code was already wrong, but we need at least three
> iterations for the xbzrle test:
> - 1st iteration: xbzrle is not used, nothing is on cache.

Are you sure about this ?  I see ram_save_page() calling
save_xbzrle_page() and unless I'm mis-understanding the
code, it doesn't appear to skip anything on the 1st
iteration.

IIUC save_xbzrle_page will add pages into the cache on
the first iteration, so the second iteration will get
cache hits

> - 2nd iteration: pages are put into cache, no xbzrle is used because
>   there is no previous page.
> - 3rd iteration: We really use xbzrle now against the copy of the
>   previous iterations.
> 
> And yes, this should be commented somewhere.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/6] tests/qtests: remove migration test iterations config
  2023-04-26  9:07     ` Daniel P. Berrangé
@ 2023-04-26  9:42       ` Juan Quintela
  2023-04-26 10:15         ` Daniel P. Berrangé
  0 siblings, 1 reply; 26+ messages in thread
From: Juan Quintela @ 2023-04-26  9:42 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: qemu-devel, qemu-block, Paolo Bonzini, Thomas Huth, John Snow,
	Li Zhijian, Stefan Hajnoczi, Zhang Chen, Laurent Vivier

Daniel P. Berrangé <berrange@redhat.com> wrote:
> On Fri, Apr 21, 2023 at 11:54:55PM +0200, Juan Quintela wrote:
>> Daniel P. Berrangé <berrange@redhat.com> wrote:
>> > The 'unsigned int interations' config for migration is somewhat
>> > overkill. Most tests don't set it, and a value of '0' is treated
>> > as equivalent to '1'. The only test that does set it, xbzrle,
>> > used a value of '2'.
>> >
>> > This setting, however, only relates to the migration iterations
>> > that take place prior to allowing convergence. IOW, on top of
>> > this iteration count, there is always at least 1 further migration
>> > iteration done to deal with pages that are dirtied during the
>> > previous iteration(s).
>> >
>> > IOW, even with iterations==1, the xbzrle test will be running for
>> > a minimum of 2 iterations. With this in mind we can simplify the
>> > code and just get rid of the special case.
>> 
>> Perhaps the old code was already wrong, but we need at least three
>> iterations for the xbzrle test:
>> - 1st iteration: xbzrle is not used, nothing is on cache.
>
> Are you sure about this ?  I see ram_save_page() calling
> save_xbzrle_page() and unless I'm mis-understanding the
> code, it doesn't appear to skip anything on the 1st
> iteration.

I will admit that code is convoluted as hell.
And I confuse myself a lot here O:-)

struct RAM_STATE {
    ...
    /* Start using XBZRLE (e.g., after the first round). */
    bool xbzrle_enabled;
}

I.e. xbzrle_enabled() and m->xbzrle_enabled are two completely different things.

static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
{
    ...
    if (rs->xbzrle_enabled && !migration_in_postcopy()) {
        pages = save_xbzrle_page(rs, pss, &p, current_addr,
                                 block, offset);
        ....
    }
    ....
}

and

static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
{
    /* Update pss->page for the next dirty bit in ramblock */
    pss_find_next_dirty(pss);

    if (pss->complete_round && pss->block == rs->last_seen_block &&
        ...
        return PAGE_ALL_CLEAN;
    }
    if (!offset_in_ramblock(pss->block,
                            ((ram_addr_t)pss->page) << TARGET_PAGE_BITS)) {
        ....
        if (!pss->block) {
            ....
            if (migrate_use_xbzrle()) {
                rs->xbzrle_enabled = true;
            }
        }
        ...
    } else {
        /* We've found something */
        return PAGE_DIRTY_FOUND;
    }
}



> IIUC save_xbzrle_page will add pages into the cache on
> the first iteration, so the second iteration will get
> cache hits
>
>> - 2nd iteration: pages are put into cache, no xbzrle is used because
>>   there is no previous page.
>> - 3rd iteration: We really use xbzrle now against the copy of the
>>   previous iterations.
>> 
>> And yes, this should be commented somewhere.

Seeing that it has been able to confuse you, a single comment will not
make the trick O:-)

Later, Juan.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/6] tests/qtests: remove migration test iterations config
  2023-04-26  9:42       ` Juan Quintela
@ 2023-04-26 10:15         ` Daniel P. Berrangé
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-04-26 10:15 UTC (permalink / raw)
  To: Juan Quintela
  Cc: qemu-devel, qemu-block, Paolo Bonzini, Thomas Huth, John Snow,
	Li Zhijian, Stefan Hajnoczi, Zhang Chen, Laurent Vivier

On Wed, Apr 26, 2023 at 11:42:51AM +0200, Juan Quintela wrote:
> Daniel P. Berrangé <berrange@redhat.com> wrote:
> > On Fri, Apr 21, 2023 at 11:54:55PM +0200, Juan Quintela wrote:
> >> Daniel P. Berrangé <berrange@redhat.com> wrote:
> >> > The 'unsigned int interations' config for migration is somewhat
> >> > overkill. Most tests don't set it, and a value of '0' is treated
> >> > as equivalent to '1'. The only test that does set it, xbzrle,
> >> > used a value of '2'.
> >> >
> >> > This setting, however, only relates to the migration iterations
> >> > that take place prior to allowing convergence. IOW, on top of
> >> > this iteration count, there is always at least 1 further migration
> >> > iteration done to deal with pages that are dirtied during the
> >> > previous iteration(s).
> >> >
> >> > IOW, even with iterations==1, the xbzrle test will be running for
> >> > a minimum of 2 iterations. With this in mind we can simplify the
> >> > code and just get rid of the special case.
> >> 
> >> Perhaps the old code was already wrong, but we need at least three
> >> iterations for the xbzrle test:
> >> - 1st iteration: xbzrle is not used, nothing is on cache.
> >
> > Are you sure about this ?  I see ram_save_page() calling
> > save_xbzrle_page() and unless I'm mis-understanding the
> > code, it doesn't appear to skip anything on the 1st
> > iteration.
> 
> I will admit that code is convoluted as hell.
> And I confuse myself a lot here O:-)
> 
> struct RAM_STATE {
>     ...
>     /* Start using XBZRLE (e.g., after the first round). */
>     bool xbzrle_enabled;
> }
> 
> I.e. xbzrle_enabled() and m->xbzrle_enabled are two completely different things.

Aieeeee !  That's confusing indeed :-)

Lets rename that struct field to 'xbzrle_started', to better
distinguish active state from enabled state.


> static int ram_save_page(RAMState *rs, PageSearchStatus *pss)
> {
>     ...
>     if (rs->xbzrle_enabled && !migration_in_postcopy()) {
>         pages = save_xbzrle_page(rs, pss, &p, current_addr,
>                                  block, offset);
>         ....
>     }
>     ....
> }
> 
> and
> 
> static int find_dirty_block(RAMState *rs, PageSearchStatus *pss)
> {
>     /* Update pss->page for the next dirty bit in ramblock */
>     pss_find_next_dirty(pss);
> 
>     if (pss->complete_round && pss->block == rs->last_seen_block &&
>         ...
>         return PAGE_ALL_CLEAN;
>     }
>     if (!offset_in_ramblock(pss->block,
>                             ((ram_addr_t)pss->page) << TARGET_PAGE_BITS)) {
>         ....
>         if (!pss->block) {
>             ....
>             if (migrate_use_xbzrle()) {
>                 rs->xbzrle_enabled = true;
>             }
>         }
>         ...
>     } else {
>         /* We've found something */
>         return PAGE_DIRTY_FOUND;
>     }
> }
> 
> 
> 
> > IIUC save_xbzrle_page will add pages into the cache on
> > the first iteration, so the second iteration will get
> > cache hits
> >
> >> - 2nd iteration: pages are put into cache, no xbzrle is used because
> >>   there is no previous page.
> >> - 3rd iteration: We really use xbzrle now against the copy of the
> >>   previous iterations.
> >> 
> >> And yes, this should be commented somewhere.
> 
> Seeing that it has been able to confuse you, a single comment will not
> make the trick O:-)
> 
> Later, Juan.
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 3/6] tests/qtest: capture RESUME events during migration
  2023-04-24  9:53     ` Daniel P. Berrangé
@ 2023-05-26 11:56       ` Daniel P. Berrangé
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-05-26 11:56 UTC (permalink / raw)
  To: Juan Quintela, qemu-devel, qemu-block, Paolo Bonzini, Thomas Huth,
	John Snow, Li Zhijian, Stefan Hajnoczi, Zhang Chen,
	Laurent Vivier

On Mon, Apr 24, 2023 at 10:53:36AM +0100, Daniel P. Berrangé wrote:
> On Fri, Apr 21, 2023 at 11:59:25PM +0200, Juan Quintela wrote:
> > Daniel P. Berrangé <berrange@redhat.com> wrote:
> > > When running migration tests we monitor for a STOP event so we can skip
> > > redundant waits. This will be needed for the RESUME event too shortly.
> > >
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > 
> > Reviewed-by: Juan Quintela <quintela@redhat.com>
> > 
> > i.e. it is better that what we have now.
> > 
> > But
> > 
> > 
> > > diff --git a/tests/qtest/migration-helpers.c b/tests/qtest/migration-helpers.c
> > > index f6f3c6680f..61396335cc 100644
> > > --- a/tests/qtest/migration-helpers.c
> > > +++ b/tests/qtest/migration-helpers.c
> > > @@ -24,14 +24,20 @@
> > >  #define MIGRATION_STATUS_WAIT_TIMEOUT 120
> > >  
> > >  bool got_stop;
> > > +bool got_resume;
> > >  
> > > -static void check_stop_event(QTestState *who)
> > > +static void check_events(QTestState *who)
> > >  {
> > >      QDict *event = qtest_qmp_event_ref(who, "STOP");
> > >      if (event) {
> > >          got_stop = true;
> > >          qobject_unref(event);
> > >      }
> > > +    event = qtest_qmp_event_ref(who, "RESUME");
> > > +    if (event) {
> > > +        got_resume = true;
> > > +        qobject_unref(event);
> > > +    }
> > >  }
> > 
> > What happens if we receive the events in the order RESUME/STOP (I mean
> > in the big scheme of things, not that it makes sense in this particular
> > case).
> > 
> > QDict *qtest_qmp_event_ref(QTestState *s, const char *event)
> > {
> >     while (s->pending_events) {
> > 
> >         GList *first = s->pending_events;
> >         QDict *response = (QDict *)first->data;
> > 
> >         s->pending_events = g_list_delete_link(s->pending_events, first);
> > 
> >         if (!strcmp(qdict_get_str(response, "event"), event)) {
> >             return response;
> >         }
> >         qobject_unref(response);
> >     }
> >     return NULL;
> > }
> > 
> > if we don't found the event that we are searching for, we just drop it.
> > Does this makes sense if we are searching only for more than one event?
> 
> You are right about this code being broken in general for multiple events.
> 
> In this particular series though we're looking at STOP on the src host and
> RESUME on the dst host, so there's no ordering problem to worry about.

What I wrote is nonsense. This is broken, because on the dst host,
check_events will look for STOP and discard all remaining events,
so we'll never find the RESUME that we are looking for.

qtest_qmp_event_ref is essentially broken by design right now and
I'll need to fix it.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 4/6] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-04-24 21:01   ` Fabiano Rosas
@ 2023-05-26 17:58     ` Daniel P. Berrangé
  2023-05-31 12:15       ` Daniel P. Berrangé
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-05-26 17:58 UTC (permalink / raw)
  To: Fabiano Rosas
  Cc: qemu-devel, qemu-block, Paolo Bonzini, Thomas Huth, John Snow,
	Li Zhijian, Juan Quintela, Stefan Hajnoczi, Zhang Chen,
	Laurent Vivier

On Mon, Apr 24, 2023 at 06:01:36PM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > There are 27 pre-copy live migration scenarios being tested. In all of
> > these we force non-convergance and run for one iteration, then let it
> > converge and wait for completion during the second (or following)
> > iterations. At 3 mbps bandwidth limit the first iteration takes a very
> > long time (~30 seconds).
> >
> > While it is important to test the migration passes and convergance
> > logic, it is overkill to do this for all 27 pre-copy scenarios. The
> > TLS migration scenarios in particular are merely exercising different
> > code paths during connection establishment.
> >
> > To optimize time taken, switch most of the test scenarios to run
> > non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> > a massive speed up for most of the test scenarios.
> >
> > For test coverage the following scenarios are unchanged
> >
> >  * Precopy with UNIX sockets
> >  * Precopy with UNIX sockets and dirty ring tracking
> >  * Precopy with XBZRLE
> >  * Precopy with multifd
> >
> > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > ---
> >  tests/qtest/migration-test.c | 60 ++++++++++++++++++++++++++++++------
> >  1 file changed, 50 insertions(+), 10 deletions(-)
> >
> > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > index 6492ffa7fe..40d0f75480 100644
> > --- a/tests/qtest/migration-test.c
> > +++ b/tests/qtest/migration-test.c
> > @@ -568,6 +568,9 @@ typedef struct {
> >          MIG_TEST_FAIL_DEST_QUIT_ERR,
> >      } result;
> >  
> > +    /* Whether the guest CPUs should be running during migration */
> > +    bool live;
> > +
> >      /* Postcopy specific fields */
> >      void *postcopy_data;
> >      bool postcopy_preempt;
> > @@ -1324,8 +1327,6 @@ static void test_precopy_common(MigrateCommon *args)
> >          return;
> >      }
> >  
> > -    migrate_ensure_non_converge(from);
> > -
> >      if (args->start_hook) {
> >          data_hook = args->start_hook(from, to);
> >      }
> > @@ -1335,6 +1336,31 @@ static void test_precopy_common(MigrateCommon *args)
> >          wait_for_serial("src_serial");
> >      }
> >  
> > +    if (args->live) {
> > +        /*
> > +         * Testing live migration, we want to ensure that some
> > +         * memory is re-dirtied after being transferred, so that
> > +         * we exercise logic for dirty page handling. We achieve
> > +         * this with a ridiculosly low bandwidth that guarantees
> > +         * non-convergance.
> > +         */
> > +        migrate_ensure_non_converge(from);
> > +    } else {
> > +        /*
> > +         * Testing non-live migration, we allow it to run at
> > +         * full speed to ensure short test case duration.
> > +         * For tests expected to fail, we don't need to
> > +         * change anything.
> > +         */
> > +        if (args->result == MIG_TEST_SUCCEED) {
> > +            qtest_qmp_assert_success(from, "{ 'execute' : 'stop'}");
> > +            if (!got_stop) {
> > +                qtest_qmp_eventwait(from, "STOP");
> > +            }
> > +            migrate_ensure_converge(from);
> > +        }
> > +    }
> > +
> >      if (!args->connect_uri) {
> >          g_autofree char *local_connect_uri =
> >              migrate_get_socket_address(to, "socket-address");
> > @@ -1352,19 +1378,29 @@ static void test_precopy_common(MigrateCommon *args)
> >              qtest_set_expected_status(to, EXIT_FAILURE);
> >          }
> >      } else {
> > -        wait_for_migration_pass(from);
> > +        if (args->live) {
> > +            wait_for_migration_pass(from);
> >  
> > -        migrate_ensure_converge(from);
> > +            migrate_ensure_converge(from);
> >  
> > -        /* We do this first, as it has a timeout to stop us
> > -         * hanging forever if migration didn't converge */
> > -        wait_for_migration_complete(from);
> > +            /*
> > +             * We do this first, as it has a timeout to stop us
> > +             * hanging forever if migration didn't converge
> > +             */
> > +            wait_for_migration_complete(from);
> > +
> > +            if (!got_stop) {
> > +                qtest_qmp_eventwait(from, "STOP");
> > +            }
> > +        } else {
> > +            wait_for_migration_complete(from);
> >  
> > -        if (!got_stop) {
> > -            qtest_qmp_eventwait(from, "STOP");
> > +            qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
> 
> I retested and the problem still persists. The issue is with this wait +
> cont sequence:
> 
> wait_for_migration_complete(from);
> qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
> 
> We wait for the source to finish but by the time qmp_cont executes, the
> dst is still INMIGRATE, autostart gets set and I never see the RESUME
> event.

This is ultimately caused by the broken logic in the previous
patch 3 that looked for RESUME. The loooking for the STOP would
discard all non-STOP events, which includes the RESUME event
we were just about to look for. I've had to completely change
the event handling in migration-helpers and libqtest to fix this.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 4/6] tests/qtest: make more migration pre-copy scenarios run non-live
  2023-05-26 17:58     ` Daniel P. Berrangé
@ 2023-05-31 12:15       ` Daniel P. Berrangé
  0 siblings, 0 replies; 26+ messages in thread
From: Daniel P. Berrangé @ 2023-05-31 12:15 UTC (permalink / raw)
  To: Fabiano Rosas, qemu-devel, qemu-block, Paolo Bonzini, Thomas Huth,
	John Snow, Li Zhijian, Juan Quintela, Stefan Hajnoczi, Zhang Chen,
	Laurent Vivier

On Fri, May 26, 2023 at 06:58:45PM +0100, Daniel P. Berrangé wrote:
> On Mon, Apr 24, 2023 at 06:01:36PM -0300, Fabiano Rosas wrote:
> > Daniel P. Berrangé <berrange@redhat.com> writes:
> > 
> > > There are 27 pre-copy live migration scenarios being tested. In all of
> > > these we force non-convergance and run for one iteration, then let it
> > > converge and wait for completion during the second (or following)
> > > iterations. At 3 mbps bandwidth limit the first iteration takes a very
> > > long time (~30 seconds).
> > >
> > > While it is important to test the migration passes and convergance
> > > logic, it is overkill to do this for all 27 pre-copy scenarios. The
> > > TLS migration scenarios in particular are merely exercising different
> > > code paths during connection establishment.
> > >
> > > To optimize time taken, switch most of the test scenarios to run
> > > non-live (ie guest CPUs paused) with no bandwidth limits. This gives
> > > a massive speed up for most of the test scenarios.
> > >
> > > For test coverage the following scenarios are unchanged
> > >
> > >  * Precopy with UNIX sockets
> > >  * Precopy with UNIX sockets and dirty ring tracking
> > >  * Precopy with XBZRLE
> > >  * Precopy with multifd
> > >
> > > Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
> > > ---
> > >  tests/qtest/migration-test.c | 60 ++++++++++++++++++++++++++++++------
> > >  1 file changed, 50 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> > > index 6492ffa7fe..40d0f75480 100644
> > > --- a/tests/qtest/migration-test.c
> > > +++ b/tests/qtest/migration-test.c
> > > @@ -568,6 +568,9 @@ typedef struct {
> > >          MIG_TEST_FAIL_DEST_QUIT_ERR,
> > >      } result;
> > >  
> > > +    /* Whether the guest CPUs should be running during migration */
> > > +    bool live;
> > > +
> > >      /* Postcopy specific fields */
> > >      void *postcopy_data;
> > >      bool postcopy_preempt;
> > > @@ -1324,8 +1327,6 @@ static void test_precopy_common(MigrateCommon *args)
> > >          return;
> > >      }
> > >  
> > > -    migrate_ensure_non_converge(from);
> > > -
> > >      if (args->start_hook) {
> > >          data_hook = args->start_hook(from, to);
> > >      }
> > > @@ -1335,6 +1336,31 @@ static void test_precopy_common(MigrateCommon *args)
> > >          wait_for_serial("src_serial");
> > >      }
> > >  
> > > +    if (args->live) {
> > > +        /*
> > > +         * Testing live migration, we want to ensure that some
> > > +         * memory is re-dirtied after being transferred, so that
> > > +         * we exercise logic for dirty page handling. We achieve
> > > +         * this with a ridiculosly low bandwidth that guarantees
> > > +         * non-convergance.
> > > +         */
> > > +        migrate_ensure_non_converge(from);
> > > +    } else {
> > > +        /*
> > > +         * Testing non-live migration, we allow it to run at
> > > +         * full speed to ensure short test case duration.
> > > +         * For tests expected to fail, we don't need to
> > > +         * change anything.
> > > +         */
> > > +        if (args->result == MIG_TEST_SUCCEED) {
> > > +            qtest_qmp_assert_success(from, "{ 'execute' : 'stop'}");
> > > +            if (!got_stop) {
> > > +                qtest_qmp_eventwait(from, "STOP");
> > > +            }
> > > +            migrate_ensure_converge(from);
> > > +        }
> > > +    }
> > > +
> > >      if (!args->connect_uri) {
> > >          g_autofree char *local_connect_uri =
> > >              migrate_get_socket_address(to, "socket-address");
> > > @@ -1352,19 +1378,29 @@ static void test_precopy_common(MigrateCommon *args)
> > >              qtest_set_expected_status(to, EXIT_FAILURE);
> > >          }
> > >      } else {
> > > -        wait_for_migration_pass(from);
> > > +        if (args->live) {
> > > +            wait_for_migration_pass(from);
> > >  
> > > -        migrate_ensure_converge(from);
> > > +            migrate_ensure_converge(from);
> > >  
> > > -        /* We do this first, as it has a timeout to stop us
> > > -         * hanging forever if migration didn't converge */
> > > -        wait_for_migration_complete(from);
> > > +            /*
> > > +             * We do this first, as it has a timeout to stop us
> > > +             * hanging forever if migration didn't converge
> > > +             */
> > > +            wait_for_migration_complete(from);
> > > +
> > > +            if (!got_stop) {
> > > +                qtest_qmp_eventwait(from, "STOP");
> > > +            }
> > > +        } else {
> > > +            wait_for_migration_complete(from);
> > >  
> > > -        if (!got_stop) {
> > > -            qtest_qmp_eventwait(from, "STOP");
> > > +            qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
> > 
> > I retested and the problem still persists. The issue is with this wait +
> > cont sequence:
> > 
> > wait_for_migration_complete(from);
> > qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");
> > 
> > We wait for the source to finish but by the time qmp_cont executes, the
> > dst is still INMIGRATE, autostart gets set and I never see the RESUME
> > event.
> 
> This is ultimately caused by the broken logic in the previous
> patch 3 that looked for RESUME. The loooking for the STOP would
> discard all non-STOP events, which includes the RESUME event
> we were just about to look for. I've had to completely change
> the event handling in migration-helpers and libqtest to fix this.

Actually, no it is not. The broken logic wouldn't help, but the root
cause was indeed a race condition that Fabiano points out. 

We are issuing the 'cont' before tgt QEMU has finished reading data
from the source.  The solution is actually quite simple - we must
call 'query-migrate' on dst to check its status. ie the code needs
to be:

 wait_for_migration_complete(from);
 wait_for_migration_complete(to);
 qtest_qmp_assert_success(to, "{ 'execute' : 'cont'}");

this matches what libvirt does, and libvirt has a comment saying
it was not permitted to issue 'cont' before 'query-migrate' on
the dst indicated completion.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-05-31 12:16 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-21 17:14 [PATCH v2 0/6] tests/qtest: make migration-test massively faster Daniel P. Berrangé
2023-04-21 17:14 ` [PATCH v2 1/6] tests/qtest: replace qmp_discard_response with qtest_qmp_assert_success Daniel P. Berrangé
2023-04-21 21:52   ` Juan Quintela
2023-04-23  2:22   ` Zhang, Chen
2023-04-21 17:14 ` [PATCH v2 2/6] tests/qtests: remove migration test iterations config Daniel P. Berrangé
2023-04-21 21:54   ` Juan Quintela
2023-04-26  9:07     ` Daniel P. Berrangé
2023-04-26  9:42       ` Juan Quintela
2023-04-26 10:15         ` Daniel P. Berrangé
2023-04-21 17:14 ` [PATCH v2 3/6] tests/qtest: capture RESUME events during migration Daniel P. Berrangé
2023-04-21 21:59   ` Juan Quintela
2023-04-24  9:53     ` Daniel P. Berrangé
2023-05-26 11:56       ` Daniel P. Berrangé
2023-04-21 17:14 ` [PATCH v2 4/6] tests/qtest: make more migration pre-copy scenarios run non-live Daniel P. Berrangé
2023-04-21 22:06   ` Juan Quintela
2023-04-24 21:01   ` Fabiano Rosas
2023-05-26 17:58     ` Daniel P. Berrangé
2023-05-31 12:15       ` Daniel P. Berrangé
2023-04-21 17:14 ` [PATCH v2 5/6] tests/qtest: massively speed up migration-tet Daniel P. Berrangé
2023-04-21 22:15   ` Juan Quintela
2023-04-21 17:14 ` [PATCH v2 6/6] tests/migration: Only run auto_converge in slow mode Daniel P. Berrangé
2023-04-23  2:41   ` Zhang, Chen
2023-04-24  5:58     ` Juan Quintela
2023-04-24  6:56       ` Thomas Huth
2023-04-24  8:05         ` Zhang, Chen
2023-04-24  8:06   ` Zhang, Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).