* [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes
@ 2018-07-10 9:18 Peter Xu
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 01/10] migration: simplify check to use qemu file buffer Peter Xu
` (10 more replies)
0 siblings, 11 replies; 15+ messages in thread
From: Peter Xu @ 2018-07-10 9:18 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, Juan Quintela, Dr . David Alan Gilbert, Balamuruhan S
Based-on: <20180627132246.5576-1-peterx@redhat.com>
Based on the series to unbreak postcopy:
Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
Message-Id: <20180627132246.5576-1-peterx@redhat.com>
v2:
- collect r-bs, and t-b for Balamuruhan
- rename matching_target_page_size to matches_target_page_size [Dave]
- fixup the race that Dave reported: the 1st patch was added as
a new patch (patch 4); the 2nd patch was squashed into the unit test
patch (patch 9)
Please review. Thanks,
Peter Xu (10):
migration: simplify check to use qemu file buffer
migration: loosen recovery check when load vm
migration: fix incorrect bitmap size calculation
migration: show pause/recover state on dst host
tests: introduce migrate_postcopy_* helpers
tests: allow migrate() to take extra flags
tests: introduce migrate_query*() helpers
tests: introduce wait_for_migration_status()
tests: add postcopy recovery test
tests: hide stderr for postcopy recovery test
migration/migration.c | 2 +
migration/ram.c | 21 +++--
migration/savevm.c | 16 ++--
tests/migration-test.c | 205 ++++++++++++++++++++++++++++++++---------
4 files changed, 185 insertions(+), 59 deletions(-)
--
2.17.1
^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH for-3.0 v2 01/10] migration: simplify check to use qemu file buffer
2018-07-10 9:18 [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes Peter Xu
@ 2018-07-10 9:18 ` Peter Xu
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 02/10] migration: loosen recovery check when load vm Peter Xu
` (9 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2018-07-10 9:18 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, Juan Quintela, Dr . David Alan Gilbert, Balamuruhan S
Firstly, renaming the old matching_page_sizes variable to
matches_target_page_size, which suites more to what it did (it only
checks against target page size rather than multiple page sizes).
Meanwhile, simplify the check logic a bit, and enhance the comments.
Should have no functional change.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/ram.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index 23cea47090..49068e86d3 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3580,7 +3580,7 @@ static int ram_load_postcopy(QEMUFile *f)
{
int flags = 0, ret = 0;
bool place_needed = false;
- bool matching_page_sizes = false;
+ bool matches_target_page_size = false;
MigrationIncomingState *mis = migration_incoming_get_current();
/* Temporary page that is later 'placed' */
void *postcopy_host_page = postcopy_get_tmp_page(mis);
@@ -3620,7 +3620,7 @@ static int ram_load_postcopy(QEMUFile *f)
ret = -EINVAL;
break;
}
- matching_page_sizes = block->page_size == TARGET_PAGE_SIZE;
+ matches_target_page_size = block->page_size == TARGET_PAGE_SIZE;
/*
* Postcopy requires that we place whole host pages atomically;
* these may be huge pages for RAMBlocks that are backed by
@@ -3668,12 +3668,17 @@ static int ram_load_postcopy(QEMUFile *f)
case RAM_SAVE_FLAG_PAGE:
all_zero = false;
- if (!place_needed || !matching_page_sizes) {
+ if (!matches_target_page_size) {
+ /* For huge pages, we always use temporary buffer */
qemu_get_buffer(f, page_buffer, TARGET_PAGE_SIZE);
} else {
- /* Avoids the qemu_file copy during postcopy, which is
- * going to do a copy later; can only do it when we
- * do this read in one go (matching page sizes)
+ /*
+ * For small pages that matches target page size, we
+ * avoid the qemu_file copy. Instead we directly use
+ * the buffer of QEMUFile to place the page. Note: we
+ * cannot do any QEMUFile operation before using that
+ * buffer to make sure the buffer is valid when
+ * placing the page.
*/
qemu_get_buffer_in_place(f, (uint8_t **)&place_source,
TARGET_PAGE_SIZE);
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH for-3.0 v2 02/10] migration: loosen recovery check when load vm
2018-07-10 9:18 [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes Peter Xu
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 01/10] migration: simplify check to use qemu file buffer Peter Xu
@ 2018-07-10 9:18 ` Peter Xu
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 03/10] migration: fix incorrect bitmap size calculation Peter Xu
` (8 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2018-07-10 9:18 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, Juan Quintela, Dr . David Alan Gilbert, Balamuruhan S
We were checking against -EIO, assuming that it will cover all IO
failures. But actually it is not. One example is that in
qemu_loadvm_section_start_full() we can have tons of places that will
return -EINVAL even if the error is caused by IO failures on the
network.
Let's loosen the recovery check logic here to cover all the error cases
happened by removing the explicit check against -EIO. After all we
won't lose anything here if any other failure happened.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/savevm.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/migration/savevm.c b/migration/savevm.c
index 851d74e8b6..efcc795071 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2276,18 +2276,14 @@ out:
qemu_file_set_error(f, ret);
/*
- * Detect whether it is:
- *
- * 1. postcopy running (after receiving all device data, which
- * must be in POSTCOPY_INCOMING_RUNNING state. Note that
- * POSTCOPY_INCOMING_LISTENING is still not enough, it's
- * still receiving device states).
- * 2. network failure (-EIO)
- *
- * If so, we try to wait for a recovery.
+ * If we are during an active postcopy, then we pause instead
+ * of bail out to at least keep the VM's dirty data. Note
+ * that POSTCOPY_INCOMING_LISTENING stage is still not enough,
+ * during which we're still receiving device states and we
+ * still haven't yet started the VM on destination.
*/
if (postcopy_state_get() == POSTCOPY_INCOMING_RUNNING &&
- ret == -EIO && postcopy_pause_incoming(mis)) {
+ postcopy_pause_incoming(mis)) {
/* Reset f to point to the newly created channel */
f = mis->from_src_file;
goto retry;
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH for-3.0 v2 03/10] migration: fix incorrect bitmap size calculation
2018-07-10 9:18 [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes Peter Xu
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 01/10] migration: simplify check to use qemu file buffer Peter Xu
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 02/10] migration: loosen recovery check when load vm Peter Xu
@ 2018-07-10 9:18 ` Peter Xu
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 04/10] migration: show pause/recover state on dst host Peter Xu
` (7 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2018-07-10 9:18 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, Juan Quintela, Dr . David Alan Gilbert, Balamuruhan S
The calculation on size of received bitmap is incorrect for postcopy
recovery. Here we wanted to let the size to cover all the valid bits in
the bitmap, we should use DIV_ROUND_UP() instead of a division.
For example, a RAMBlock with size=4K (which contains only one single 4K
page) will have nbits=1, then nbits/8=0, then the real bitmap won't be
sent to source at all.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/ram.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index 49068e86d3..52dd678092 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -235,7 +235,7 @@ int64_t ramblock_recv_bitmap_send(QEMUFile *file,
bitmap_to_le(le_bitmap, block->receivedmap, nbits);
/* Size of the bitmap, in bytes */
- size = nbits / 8;
+ size = DIV_ROUND_UP(nbits, 8);
/*
* size is always aligned to 8 bytes for 64bit machines, but it
@@ -3944,7 +3944,7 @@ int ram_dirty_bitmap_reload(MigrationState *s, RAMBlock *block)
int ret = -EINVAL;
QEMUFile *file = s->rp_state.from_dst_file;
unsigned long *le_bitmap, nbits = block->used_length >> TARGET_PAGE_BITS;
- uint64_t local_size = nbits / 8;
+ uint64_t local_size = DIV_ROUND_UP(nbits, 8);
uint64_t size, end_mark;
trace_ram_dirty_bitmap_reload_begin(block->idstr);
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH for-3.0 v2 04/10] migration: show pause/recover state on dst host
2018-07-10 9:18 [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes Peter Xu
` (2 preceding siblings ...)
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 03/10] migration: fix incorrect bitmap size calculation Peter Xu
@ 2018-07-10 9:18 ` Peter Xu
2018-07-10 11:20 ` Dr. David Alan Gilbert
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 05/10] tests: introduce migrate_postcopy_* helpers Peter Xu
` (6 subsequent siblings)
10 siblings, 1 reply; 15+ messages in thread
From: Peter Xu @ 2018-07-10 9:18 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, Juan Quintela, Dr . David Alan Gilbert, Balamuruhan S
These two states will be missing when doing "query-migrate" on
destination VM. Add these states so that we can get the query results
as expected.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
migration/migration.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/migration/migration.c b/migration/migration.c
index 0404c53215..8d56d56930 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -911,6 +911,8 @@ static void fill_destination_migration_info(MigrationInfo *info)
case MIGRATION_STATUS_CANCELLED:
case MIGRATION_STATUS_ACTIVE:
case MIGRATION_STATUS_POSTCOPY_ACTIVE:
+ case MIGRATION_STATUS_POSTCOPY_PAUSED:
+ case MIGRATION_STATUS_POSTCOPY_RECOVER:
case MIGRATION_STATUS_FAILED:
case MIGRATION_STATUS_COLO:
info->has_status = true;
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH for-3.0 v2 05/10] tests: introduce migrate_postcopy_* helpers
2018-07-10 9:18 [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes Peter Xu
` (3 preceding siblings ...)
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 04/10] migration: show pause/recover state on dst host Peter Xu
@ 2018-07-10 9:18 ` Peter Xu
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 06/10] tests: allow migrate() to take extra flags Peter Xu
` (5 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2018-07-10 9:18 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, Juan Quintela, Dr . David Alan Gilbert, Balamuruhan S
Separate the old postcopy UNIX socket test into three steps, provide a
helper for each step. With these helpers, we can do more compliated
tests like postcopy recovery, while keep the codes shared.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
tests/migration-test.c | 38 +++++++++++++++++++++++++++-----------
1 file changed, 27 insertions(+), 11 deletions(-)
diff --git a/tests/migration-test.c b/tests/migration-test.c
index 3a85446f95..2155869b96 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -351,13 +351,19 @@ static void migrate(QTestState *who, const char *uri)
qobject_unref(rsp);
}
-static void migrate_start_postcopy(QTestState *who)
+static void migrate_postcopy_start(QTestState *from, QTestState *to)
{
QDict *rsp;
- rsp = wait_command(who, "{ 'execute': 'migrate-start-postcopy' }");
+ rsp = wait_command(from, "{ 'execute': 'migrate-start-postcopy' }");
g_assert(qdict_haskey(rsp, "return"));
qobject_unref(rsp);
+
+ if (!got_stop) {
+ qtest_qmp_eventwait(from, "STOP");
+ }
+
+ qtest_qmp_eventwait(to, "RESUME");
}
static void test_migrate_start(QTestState **from, QTestState **to,
@@ -505,7 +511,8 @@ static void test_deprecated(void)
qtest_quit(from);
}
-static void test_postcopy(void)
+static void migrate_postcopy_prepare(QTestState **from_ptr,
+ QTestState **to_ptr)
{
char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
QTestState *from, *to;
@@ -527,28 +534,37 @@ static void test_postcopy(void)
wait_for_serial("src_serial");
migrate(from, uri);
+ g_free(uri);
wait_for_migration_pass(from);
- migrate_start_postcopy(from);
-
- if (!got_stop) {
- qtest_qmp_eventwait(from, "STOP");
- }
+ *from_ptr = from;
+ *to_ptr = to;
+}
- qtest_qmp_eventwait(to, "RESUME");
+static void migrate_postcopy_complete(QTestState *from, QTestState *to)
+{
+ wait_for_migration_complete(from);
+ /* Make sure we get at least one "B" on destination */
wait_for_serial("dest_serial");
- wait_for_migration_complete(from);
if (uffd_feature_thread_id) {
read_blocktime(to);
}
- g_free(uri);
test_migrate_end(from, to, true);
}
+static void test_postcopy(void)
+{
+ QTestState *from, *to;
+
+ migrate_postcopy_prepare(&from, &to);
+ migrate_postcopy_start(from, to);
+ migrate_postcopy_complete(from, to);
+}
+
static void test_baddest(void)
{
QTestState *from, *to;
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH for-3.0 v2 06/10] tests: allow migrate() to take extra flags
2018-07-10 9:18 [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes Peter Xu
` (4 preceding siblings ...)
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 05/10] tests: introduce migrate_postcopy_* helpers Peter Xu
@ 2018-07-10 9:18 ` Peter Xu
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 07/10] tests: introduce migrate_query*() helpers Peter Xu
` (4 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2018-07-10 9:18 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, Juan Quintela, Dr . David Alan Gilbert, Balamuruhan S
For example, we can pass in '"resume": true' to resume a migration.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
tests/migration-test.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/tests/migration-test.c b/tests/migration-test.c
index 2155869b96..af82a04789 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -337,14 +337,14 @@ static void migrate_set_capability(QTestState *who, const char *capability,
qobject_unref(rsp);
}
-static void migrate(QTestState *who, const char *uri)
+static void migrate(QTestState *who, const char *uri, const char *extra)
{
QDict *rsp;
gchar *cmd;
cmd = g_strdup_printf("{ 'execute': 'migrate',"
- "'arguments': { 'uri': '%s' } }",
- uri);
+ " 'arguments': { 'uri': '%s' %s } }",
+ uri, extra ? extra : "");
rsp = qtest_qmp(who, cmd);
g_free(cmd);
g_assert(qdict_haskey(rsp, "return"));
@@ -533,7 +533,7 @@ static void migrate_postcopy_prepare(QTestState **from_ptr,
/* Wait for the first serial output from the source */
wait_for_serial("src_serial");
- migrate(from, uri);
+ migrate(from, uri, NULL);
g_free(uri);
wait_for_migration_pass(from);
@@ -573,7 +573,7 @@ static void test_baddest(void)
bool failed;
test_migrate_start(&from, &to, "tcp:0:0", true);
- migrate(from, "tcp:0:0");
+ migrate(from, "tcp:0:0", NULL);
do {
rsp = wait_command(from, "{ 'execute': 'query-migrate' }");
rsp_return = qdict_get_qdict(rsp, "return");
@@ -615,7 +615,7 @@ static void test_precopy_unix(void)
/* Wait for the first serial output from the source */
wait_for_serial("src_serial");
- migrate(from, uri);
+ migrate(from, uri, NULL);
wait_for_migration_pass(from);
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH for-3.0 v2 07/10] tests: introduce migrate_query*() helpers
2018-07-10 9:18 [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes Peter Xu
` (5 preceding siblings ...)
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 06/10] tests: allow migrate() to take extra flags Peter Xu
@ 2018-07-10 9:18 ` Peter Xu
2018-07-10 9:19 ` [Qemu-devel] [PATCH for-3.0 v2 08/10] tests: introduce wait_for_migration_status() Peter Xu
` (3 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2018-07-10 9:18 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, Juan Quintela, Dr . David Alan Gilbert, Balamuruhan S
Introduce helpers to query migration states and use it.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
tests/migration-test.c | 64 ++++++++++++++++++++++++++++--------------
1 file changed, 43 insertions(+), 21 deletions(-)
diff --git a/tests/migration-test.c b/tests/migration-test.c
index af82a04789..1d85ccbef1 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -168,6 +168,37 @@ static QDict *wait_command(QTestState *who, const char *command)
return response;
}
+/*
+ * Note: caller is responsible to free the returned object via
+ * qobject_unref() after use
+ */
+static QDict *migrate_query(QTestState *who)
+{
+ QDict *rsp, *rsp_return;
+
+ rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
+ rsp_return = qdict_get_qdict(rsp, "return");
+ g_assert(rsp_return);
+ qobject_ref(rsp_return);
+ qobject_unref(rsp);
+
+ return rsp_return;
+}
+
+/*
+ * Note: caller is responsible to free the returned object via
+ * g_free() after use
+ */
+static gchar *migrate_query_status(QTestState *who)
+{
+ QDict *rsp_return = migrate_query(who);
+ gchar *status = g_strdup(qdict_get_str(rsp_return, "status"));
+
+ g_assert(status);
+ qobject_unref(rsp_return);
+
+ return status;
+}
/*
* It's tricky to use qemu's migration event capability with qtest,
@@ -176,11 +207,10 @@ static QDict *wait_command(QTestState *who, const char *command)
static uint64_t get_migration_pass(QTestState *who)
{
- QDict *rsp, *rsp_return, *rsp_ram;
+ QDict *rsp_return, *rsp_ram;
uint64_t result;
- rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
- rsp_return = qdict_get_qdict(rsp, "return");
+ rsp_return = migrate_query(who);
if (!qdict_haskey(rsp_return, "ram")) {
/* Still in setup */
result = 0;
@@ -188,33 +218,29 @@ static uint64_t get_migration_pass(QTestState *who)
rsp_ram = qdict_get_qdict(rsp_return, "ram");
result = qdict_get_try_int(rsp_ram, "dirty-sync-count", 0);
}
- qobject_unref(rsp);
+ qobject_unref(rsp_return);
return result;
}
static void read_blocktime(QTestState *who)
{
- QDict *rsp, *rsp_return;
+ QDict *rsp_return;
- rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
- rsp_return = qdict_get_qdict(rsp, "return");
+ rsp_return = migrate_query(who);
g_assert(qdict_haskey(rsp_return, "postcopy-blocktime"));
- qobject_unref(rsp);
+ qobject_unref(rsp_return);
}
static void wait_for_migration_complete(QTestState *who)
{
while (true) {
- QDict *rsp, *rsp_return;
bool completed;
- const char *status;
+ char *status;
- rsp = wait_command(who, "{ 'execute': 'query-migrate' }");
- rsp_return = qdict_get_qdict(rsp, "return");
- status = qdict_get_str(rsp_return, "status");
+ status = migrate_query_status(who);
completed = strcmp(status, "completed") == 0;
g_assert_cmpstr(status, !=, "failed");
- qobject_unref(rsp);
+ g_free(status);
if (completed) {
return;
}
@@ -569,20 +595,16 @@ static void test_baddest(void)
{
QTestState *from, *to;
QDict *rsp, *rsp_return;
- const char *status;
+ char *status;
bool failed;
test_migrate_start(&from, &to, "tcp:0:0", true);
migrate(from, "tcp:0:0", NULL);
do {
- rsp = wait_command(from, "{ 'execute': 'query-migrate' }");
- rsp_return = qdict_get_qdict(rsp, "return");
-
- status = qdict_get_str(rsp_return, "status");
-
+ status = migrate_query_status(from);
g_assert(!strcmp(status, "setup") || !(strcmp(status, "failed")));
failed = !strcmp(status, "failed");
- qobject_unref(rsp);
+ g_free(status);
} while (!failed);
/* Is the machine currently running? */
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH for-3.0 v2 08/10] tests: introduce wait_for_migration_status()
2018-07-10 9:18 [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes Peter Xu
` (6 preceding siblings ...)
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 07/10] tests: introduce migrate_query*() helpers Peter Xu
@ 2018-07-10 9:19 ` Peter Xu
2018-07-10 9:19 ` [Qemu-devel] [PATCH for-3.0 v2 09/10] tests: add postcopy recovery test Peter Xu
` (2 subsequent siblings)
10 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2018-07-10 9:19 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, Juan Quintela, Dr . David Alan Gilbert, Balamuruhan S
It's generalized from wait_for_migration_complete() to allow us to wait
for any migration status besides failure.
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
tests/migration-test.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/tests/migration-test.c b/tests/migration-test.c
index 1d85ccbef1..761bf62ffe 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -231,14 +231,15 @@ static void read_blocktime(QTestState *who)
qobject_unref(rsp_return);
}
-static void wait_for_migration_complete(QTestState *who)
+static void wait_for_migration_status(QTestState *who,
+ const char *goal)
{
while (true) {
bool completed;
char *status;
status = migrate_query_status(who);
- completed = strcmp(status, "completed") == 0;
+ completed = strcmp(status, goal) == 0;
g_assert_cmpstr(status, !=, "failed");
g_free(status);
if (completed) {
@@ -248,6 +249,11 @@ static void wait_for_migration_complete(QTestState *who)
}
}
+static void wait_for_migration_complete(QTestState *who)
+{
+ wait_for_migration_status(who, "completed");
+}
+
static void wait_for_migration_pass(QTestState *who)
{
uint64_t initial_pass = get_migration_pass(who);
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH for-3.0 v2 09/10] tests: add postcopy recovery test
2018-07-10 9:18 [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes Peter Xu
` (7 preceding siblings ...)
2018-07-10 9:19 ` [Qemu-devel] [PATCH for-3.0 v2 08/10] tests: introduce wait_for_migration_status() Peter Xu
@ 2018-07-10 9:19 ` Peter Xu
2018-07-10 11:25 ` Dr. David Alan Gilbert
2018-07-10 12:23 ` Juan Quintela
2018-07-10 9:19 ` [Qemu-devel] [PATCH for-3.0 v2 10/10] tests: hide stderr for " Peter Xu
2018-07-10 12:18 ` [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes Dr. David Alan Gilbert
10 siblings, 2 replies; 15+ messages in thread
From: Peter Xu @ 2018-07-10 9:19 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, Juan Quintela, Dr . David Alan Gilbert, Balamuruhan S
Test the postcopy recovery procedure by emulating a network failure
using migrate-pause command.
Tested-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
tests/migration-test.c | 78 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 78 insertions(+)
diff --git a/tests/migration-test.c b/tests/migration-test.c
index 761bf62ffe..e952d94529 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -352,6 +352,29 @@ static void migrate_set_parameter(QTestState *who, const char *parameter,
migrate_check_parameter(who, parameter, value);
}
+static void migrate_pause(QTestState *who)
+{
+ QDict *rsp;
+
+ rsp = wait_command(who, "{ 'execute': 'migrate-pause' }");
+ g_assert(qdict_haskey(rsp, "return"));
+ qobject_unref(rsp);
+}
+
+static void migrate_recover(QTestState *who, const char *uri)
+{
+ QDict *rsp;
+ gchar *cmd = g_strdup_printf(
+ "{ 'execute': 'migrate-recover', "
+ " 'id': 'recover-cmd', "
+ " 'arguments': { 'uri': '%s' } }", uri);
+
+ rsp = wait_command(who, cmd);
+ g_assert(qdict_haskey(rsp, "return"));
+ g_free(cmd);
+ qobject_unref(rsp);
+}
+
static void migrate_set_capability(QTestState *who, const char *capability,
const char *value)
{
@@ -597,6 +620,60 @@ static void test_postcopy(void)
migrate_postcopy_complete(from, to);
}
+static void test_postcopy_recovery(void)
+{
+ QTestState *from, *to;
+ char *uri;
+
+ migrate_postcopy_prepare(&from, &to);
+
+ /* Turn postcopy speed down, 4K/s is slow enough on any machines */
+ migrate_set_parameter(from, "max-postcopy-bandwidth", "4096");
+
+ /* Now we start the postcopy */
+ migrate_postcopy_start(from, to);
+
+ /*
+ * Wait until postcopy is really started; we can only run the
+ * migrate-pause command during a postcopy
+ */
+ wait_for_migration_status(from, "postcopy-active");
+
+ /*
+ * Manually stop the postcopy migration. This emulates a network
+ * failure with the migration socket
+ */
+ migrate_pause(from);
+
+ /*
+ * Wait for destination side to reach postcopy-paused state. The
+ * migrate-recover command can only succeed if destination machine
+ * is in the paused state
+ */
+ wait_for_migration_status(to, "postcopy-paused");
+
+ /*
+ * Create a new socket to emulate a new channel that is different
+ * from the broken migration channel; tell the destination to
+ * listen to the new port
+ */
+ uri = g_strdup_printf("unix:%s/migsocket-recover", tmpfs);
+ migrate_recover(to, uri);
+
+ /*
+ * Try to rebuild the migration channel using the resume flag and
+ * the newly created channel
+ */
+ wait_for_migration_status(from, "postcopy-paused");
+ migrate(from, uri, ", 'resume': true");
+ g_free(uri);
+
+ /* Restore the postcopy bandwidth to unlimited */
+ migrate_set_parameter(from, "max-postcopy-bandwidth", "0");
+
+ migrate_postcopy_complete(from, to);
+}
+
static void test_baddest(void)
{
QTestState *from, *to;
@@ -683,6 +760,7 @@ int main(int argc, char **argv)
module_call_init(MODULE_INIT_QOM);
qtest_add_func("/migration/postcopy/unix", test_postcopy);
+ qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
qtest_add_func("/migration/deprecated", test_deprecated);
qtest_add_func("/migration/bad_dest", test_baddest);
qtest_add_func("/migration/precopy/unix", test_precopy_unix);
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH for-3.0 v2 10/10] tests: hide stderr for postcopy recovery test
2018-07-10 9:18 [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes Peter Xu
` (8 preceding siblings ...)
2018-07-10 9:19 ` [Qemu-devel] [PATCH for-3.0 v2 09/10] tests: add postcopy recovery test Peter Xu
@ 2018-07-10 9:19 ` Peter Xu
2018-07-10 12:18 ` [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes Dr. David Alan Gilbert
10 siblings, 0 replies; 15+ messages in thread
From: Peter Xu @ 2018-07-10 9:19 UTC (permalink / raw)
To: qemu-devel; +Cc: peterx, Juan Quintela, Dr . David Alan Gilbert, Balamuruhan S
We dumped something when network failure happens. We should avoid those
messages to be dumped when running the tests:
$ ./tests/migration-test -p /x86_64/migration/postcopy/recovery
/x86_64/migration/postcopy/recovery: qemu-system-x86_64: check_section_footer: Read section footer failed: -5
qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
qemu-system-x86_64: Detected IO failure for postcopy. Migration paused.
OK
After the patch:
$ ./tests/migration-test -p /x86_64/migration/postcopy/recovery
/x86_64/migration/postcopy/recovery: OK
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
tests/migration-test.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tests/migration-test.c b/tests/migration-test.c
index e952d94529..45558446f1 100644
--- a/tests/migration-test.c
+++ b/tests/migration-test.c
@@ -567,12 +567,13 @@ static void test_deprecated(void)
}
static void migrate_postcopy_prepare(QTestState **from_ptr,
- QTestState **to_ptr)
+ QTestState **to_ptr,
+ bool hide_error)
{
char *uri = g_strdup_printf("unix:%s/migsocket", tmpfs);
QTestState *from, *to;
- test_migrate_start(&from, &to, uri, false);
+ test_migrate_start(&from, &to, uri, hide_error);
migrate_set_capability(from, "postcopy-ram", "true");
migrate_set_capability(to, "postcopy-ram", "true");
@@ -615,7 +616,7 @@ static void test_postcopy(void)
{
QTestState *from, *to;
- migrate_postcopy_prepare(&from, &to);
+ migrate_postcopy_prepare(&from, &to, false);
migrate_postcopy_start(from, to);
migrate_postcopy_complete(from, to);
}
@@ -625,7 +626,7 @@ static void test_postcopy_recovery(void)
QTestState *from, *to;
char *uri;
- migrate_postcopy_prepare(&from, &to);
+ migrate_postcopy_prepare(&from, &to, true);
/* Turn postcopy speed down, 4K/s is slow enough on any machines */
migrate_set_parameter(from, "max-postcopy-bandwidth", "4096");
--
2.17.1
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH for-3.0 v2 04/10] migration: show pause/recover state on dst host
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 04/10] migration: show pause/recover state on dst host Peter Xu
@ 2018-07-10 11:20 ` Dr. David Alan Gilbert
0 siblings, 0 replies; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-10 11:20 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Juan Quintela, Balamuruhan S
* Peter Xu (peterx@redhat.com) wrote:
> These two states will be missing when doing "query-migrate" on
> destination VM. Add these states so that we can get the query results
> as expected.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> migration/migration.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 0404c53215..8d56d56930 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -911,6 +911,8 @@ static void fill_destination_migration_info(MigrationInfo *info)
> case MIGRATION_STATUS_CANCELLED:
> case MIGRATION_STATUS_ACTIVE:
> case MIGRATION_STATUS_POSTCOPY_ACTIVE:
> + case MIGRATION_STATUS_POSTCOPY_PAUSED:
> + case MIGRATION_STATUS_POSTCOPY_RECOVER:
> case MIGRATION_STATUS_FAILED:
> case MIGRATION_STATUS_COLO:
> info->has_status = true;
> --
> 2.17.1
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH for-3.0 v2 09/10] tests: add postcopy recovery test
2018-07-10 9:19 ` [Qemu-devel] [PATCH for-3.0 v2 09/10] tests: add postcopy recovery test Peter Xu
@ 2018-07-10 11:25 ` Dr. David Alan Gilbert
2018-07-10 12:23 ` Juan Quintela
1 sibling, 0 replies; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-10 11:25 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Juan Quintela, Balamuruhan S
* Peter Xu (peterx@redhat.com) wrote:
> Test the postcopy recovery procedure by emulating a network failure
> using migrate-pause command.
>
> Tested-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
> tests/migration-test.c | 78 ++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 78 insertions(+)
>
> diff --git a/tests/migration-test.c b/tests/migration-test.c
> index 761bf62ffe..e952d94529 100644
> --- a/tests/migration-test.c
> +++ b/tests/migration-test.c
> @@ -352,6 +352,29 @@ static void migrate_set_parameter(QTestState *who, const char *parameter,
> migrate_check_parameter(who, parameter, value);
> }
>
> +static void migrate_pause(QTestState *who)
> +{
> + QDict *rsp;
> +
> + rsp = wait_command(who, "{ 'execute': 'migrate-pause' }");
> + g_assert(qdict_haskey(rsp, "return"));
> + qobject_unref(rsp);
> +}
> +
> +static void migrate_recover(QTestState *who, const char *uri)
> +{
> + QDict *rsp;
> + gchar *cmd = g_strdup_printf(
> + "{ 'execute': 'migrate-recover', "
> + " 'id': 'recover-cmd', "
> + " 'arguments': { 'uri': '%s' } }", uri);
> +
> + rsp = wait_command(who, cmd);
> + g_assert(qdict_haskey(rsp, "return"));
> + g_free(cmd);
> + qobject_unref(rsp);
> +}
> +
> static void migrate_set_capability(QTestState *who, const char *capability,
> const char *value)
> {
> @@ -597,6 +620,60 @@ static void test_postcopy(void)
> migrate_postcopy_complete(from, to);
> }
>
> +static void test_postcopy_recovery(void)
> +{
> + QTestState *from, *to;
> + char *uri;
> +
> + migrate_postcopy_prepare(&from, &to);
> +
> + /* Turn postcopy speed down, 4K/s is slow enough on any machines */
> + migrate_set_parameter(from, "max-postcopy-bandwidth", "4096");
> +
> + /* Now we start the postcopy */
> + migrate_postcopy_start(from, to);
> +
> + /*
> + * Wait until postcopy is really started; we can only run the
> + * migrate-pause command during a postcopy
> + */
> + wait_for_migration_status(from, "postcopy-active");
> +
> + /*
> + * Manually stop the postcopy migration. This emulates a network
> + * failure with the migration socket
> + */
> + migrate_pause(from);
> +
> + /*
> + * Wait for destination side to reach postcopy-paused state. The
> + * migrate-recover command can only succeed if destination machine
> + * is in the paused state
> + */
> + wait_for_migration_status(to, "postcopy-paused");
> +
> + /*
> + * Create a new socket to emulate a new channel that is different
> + * from the broken migration channel; tell the destination to
> + * listen to the new port
> + */
> + uri = g_strdup_printf("unix:%s/migsocket-recover", tmpfs);
> + migrate_recover(to, uri);
> +
> + /*
> + * Try to rebuild the migration channel using the resume flag and
> + * the newly created channel
> + */
> + wait_for_migration_status(from, "postcopy-paused");
> + migrate(from, uri, ", 'resume': true");
> + g_free(uri);
> +
> + /* Restore the postcopy bandwidth to unlimited */
> + migrate_set_parameter(from, "max-postcopy-bandwidth", "0");
> +
> + migrate_postcopy_complete(from, to);
> +}
> +
> static void test_baddest(void)
> {
> QTestState *from, *to;
> @@ -683,6 +760,7 @@ int main(int argc, char **argv)
> module_call_init(MODULE_INIT_QOM);
>
> qtest_add_func("/migration/postcopy/unix", test_postcopy);
> + qtest_add_func("/migration/postcopy/recovery", test_postcopy_recovery);
> qtest_add_func("/migration/deprecated", test_deprecated);
> qtest_add_func("/migration/bad_dest", test_baddest);
> qtest_add_func("/migration/precopy/unix", test_precopy_unix);
> --
> 2.17.1
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes
2018-07-10 9:18 [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes Peter Xu
` (9 preceding siblings ...)
2018-07-10 9:19 ` [Qemu-devel] [PATCH for-3.0 v2 10/10] tests: hide stderr for " Peter Xu
@ 2018-07-10 12:18 ` Dr. David Alan Gilbert
10 siblings, 0 replies; 15+ messages in thread
From: Dr. David Alan Gilbert @ 2018-07-10 12:18 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Juan Quintela, Balamuruhan S
* Peter Xu (peterx@redhat.com) wrote:
> Based-on: <20180627132246.5576-1-peterx@redhat.com>
>
> Based on the series to unbreak postcopy:
> Subject: [PATCH v3 0/4] migation: unbreak postcopy recovery
> Message-Id: <20180627132246.5576-1-peterx@redhat.com>
Queued
(Took a bit of manual merging relative to my skip-tcg ppc patch)
> v2:
> - collect r-bs, and t-b for Balamuruhan
> - rename matching_target_page_size to matches_target_page_size [Dave]
> - fixup the race that Dave reported: the 1st patch was added as
> a new patch (patch 4); the 2nd patch was squashed into the unit test
> patch (patch 9)
>
> Please review. Thanks,
>
> Peter Xu (10):
> migration: simplify check to use qemu file buffer
> migration: loosen recovery check when load vm
> migration: fix incorrect bitmap size calculation
> migration: show pause/recover state on dst host
> tests: introduce migrate_postcopy_* helpers
> tests: allow migrate() to take extra flags
> tests: introduce migrate_query*() helpers
> tests: introduce wait_for_migration_status()
> tests: add postcopy recovery test
> tests: hide stderr for postcopy recovery test
>
> migration/migration.c | 2 +
> migration/ram.c | 21 +++--
> migration/savevm.c | 16 ++--
> tests/migration-test.c | 205 ++++++++++++++++++++++++++++++++---------
> 4 files changed, 185 insertions(+), 59 deletions(-)
>
> --
> 2.17.1
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH for-3.0 v2 09/10] tests: add postcopy recovery test
2018-07-10 9:19 ` [Qemu-devel] [PATCH for-3.0 v2 09/10] tests: add postcopy recovery test Peter Xu
2018-07-10 11:25 ` Dr. David Alan Gilbert
@ 2018-07-10 12:23 ` Juan Quintela
1 sibling, 0 replies; 15+ messages in thread
From: Juan Quintela @ 2018-07-10 12:23 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, Dr . David Alan Gilbert, Balamuruhan S
Peter Xu <peterx@redhat.com> wrote:
> Test the postcopy recovery procedure by emulating a network failure
> using migrate-pause command.
>
> Tested-by: Balamuruhan S <bala24@linux.vnet.ibm.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2018-07-10 12:23 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-10 9:18 [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes Peter Xu
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 01/10] migration: simplify check to use qemu file buffer Peter Xu
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 02/10] migration: loosen recovery check when load vm Peter Xu
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 03/10] migration: fix incorrect bitmap size calculation Peter Xu
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 04/10] migration: show pause/recover state on dst host Peter Xu
2018-07-10 11:20 ` Dr. David Alan Gilbert
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 05/10] tests: introduce migrate_postcopy_* helpers Peter Xu
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 06/10] tests: allow migrate() to take extra flags Peter Xu
2018-07-10 9:18 ` [Qemu-devel] [PATCH for-3.0 v2 07/10] tests: introduce migrate_query*() helpers Peter Xu
2018-07-10 9:19 ` [Qemu-devel] [PATCH for-3.0 v2 08/10] tests: introduce wait_for_migration_status() Peter Xu
2018-07-10 9:19 ` [Qemu-devel] [PATCH for-3.0 v2 09/10] tests: add postcopy recovery test Peter Xu
2018-07-10 11:25 ` Dr. David Alan Gilbert
2018-07-10 12:23 ` Juan Quintela
2018-07-10 9:19 ` [Qemu-devel] [PATCH for-3.0 v2 10/10] tests: hide stderr for " Peter Xu
2018-07-10 12:18 ` [Qemu-devel] [PATCH for-3.0 v2 00/10] migration: postcopy recovery unit test, bug fixes Dr. David Alan Gilbert
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).