qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: qemu-devel@nongnu.org
Cc: Thomas Huth <thuth@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	Laurent Vivier <lvivier@redhat.com>,
	Fabiano Rosas <farosas@suse.de>, Eric Blake <eblake@redhat.com>,
	Prasad Pandit <ppandit@redhat.com>,
	peterx@redhat.com, Jiri Denemark <jdenemar@redhat.com>,
	Bandan Das <bdas@redhat.com>
Subject: [PATCH v2 10/10] tests/migration-tests: Cover postcopy failure on reconnect
Date: Mon, 17 Jun 2024 14:15:34 -0400	[thread overview]
Message-ID: <20240617181534.1425179-11-peterx@redhat.com> (raw)
In-Reply-To: <20240617181534.1425179-1-peterx@redhat.com>

Make sure there will be an event for postcopy recovery, irrelevant of
whether the reconnect will success, or when the failure happens.

The added new case is to fail early in postcopy recovery, in which case it
didn't even reach RECOVER stage on src (and in real life it'll be the same
to dest, but the test case is just slightly more involved due to the dual
socketpair setup).

To do that, rename the postcopy_recovery_test_fail to reflect either stage
to fail, instead of a boolean.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 tests/qtest/migration-test.c | 89 ++++++++++++++++++++++++++++++------
 1 file changed, 74 insertions(+), 15 deletions(-)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index a16b1a4824..3e237a1499 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -70,6 +70,17 @@ static QTestMigrationState dst_state;
 #define QEMU_ENV_SRC "QTEST_QEMU_BINARY_SRC"
 #define QEMU_ENV_DST "QTEST_QEMU_BINARY_DST"
 
+typedef enum PostcopyRecoveryFailStage {
+    /*
+     * "no failure" must be 0 as it's the default.  OTOH, real failure
+     * cases must be >0 to make sure they trigger by a "if" test.
+     */
+    POSTCOPY_FAIL_NONE = 0,
+    POSTCOPY_FAIL_CHANNEL_ESTABLISH,
+    POSTCOPY_FAIL_RECOVERY,
+    POSTCOPY_FAIL_MAX
+} PostcopyRecoveryFailStage;
+
 #if defined(__linux__)
 #include <sys/syscall.h>
 #include <sys/vfs.h>
@@ -680,7 +691,7 @@ typedef struct {
     /* Postcopy specific fields */
     void *postcopy_data;
     bool postcopy_preempt;
-    bool postcopy_recovery_test_fail;
+    PostcopyRecoveryFailStage postcopy_recovery_fail_stage;
 } MigrateCommon;
 
 static int test_migrate_start(QTestState **from, QTestState **to,
@@ -1360,12 +1371,16 @@ static void wait_for_postcopy_status(QTestState *one, const char *status)
                                                   "completed", NULL });
 }
 
-static void postcopy_recover_fail(QTestState *from, QTestState *to)
+static void postcopy_recover_fail(QTestState *from, QTestState *to,
+                                  PostcopyRecoveryFailStage stage)
 {
 #ifndef _WIN32
+    bool fail_early = (stage == POSTCOPY_FAIL_CHANNEL_ESTABLISH);
     int ret, pair1[2], pair2[2];
     char c;
 
+    g_assert(stage > POSTCOPY_FAIL_NONE && stage < POSTCOPY_FAIL_MAX);
+
     /* Create two unrelated socketpairs */
     ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair1);
     g_assert_cmpint(ret, ==, 0);
@@ -1399,6 +1414,14 @@ static void postcopy_recover_fail(QTestState *from, QTestState *to)
     ret = send(pair2[1], &c, 1, 0);
     g_assert_cmpint(ret, ==, 1);
 
+    if (stage == POSTCOPY_FAIL_CHANNEL_ESTABLISH) {
+        /*
+         * This will make src QEMU to fail at an early stage when trying to
+         * resume later, where it shouldn't reach RECOVER stage at all.
+         */
+        close(pair1[1]);
+    }
+
     migrate_recover(to, "fd:fd-mig");
     migrate_qmp(from, to, "fd:fd-mig", NULL, "{'resume': true}");
 
@@ -1408,28 +1431,53 @@ static void postcopy_recover_fail(QTestState *from, QTestState *to)
      */
     migration_event_wait(from, "postcopy-recover-setup");
 
+    if (fail_early) {
+        /*
+         * When fails at reconnection, src QEMU will automatically goes
+         * back to PAUSED state.  Making sure there is an event in this
+         * case: Libvirt relies on this to detect early reconnection
+         * errors.
+         */
+        migration_event_wait(from, "postcopy-paused");
+    } else {
+        /*
+         * We want to test "fail later" at RECOVER stage here.  Make sure
+         * both QEMU instances will go into RECOVER stage first, then test
+         * kicking them out using migrate-pause.
+         *
+         * Explicitly check the RECOVER event on src, that's what Libvirt
+         * relies on, rather than polling.
+         */
+        migration_event_wait(from, "postcopy-recover");
+        wait_for_postcopy_status(from, "postcopy-recover");
+
+        /* Need an explicit kick on src QEMU in this case */
+        migrate_pause(from);
+    }
+
     /*
-     * Make sure both QEMU instances will go into RECOVER stage, then test
-     * kicking them out using migrate-pause.
+     * For all failure cases, we'll reach such states on both sides now.
+     * Check them.
      */
-    wait_for_postcopy_status(from, "postcopy-recover");
+    wait_for_postcopy_status(from, "postcopy-paused");
     wait_for_postcopy_status(to, "postcopy-recover");
 
     /*
-     * This would be issued by the admin upon noticing the hang, we should
-     * make sure we're able to kick this out.
+     * Kick dest QEMU out too. This is normally not needed in reality
+     * because when the channel is shutdown it should also happens on src.
+     * However here we used separate socket pairs so we need to do that
+     * explicitly.
      */
-    migrate_pause(from);
-    wait_for_postcopy_status(from, "postcopy-paused");
-
-    /* Do the same test on dest */
     migrate_pause(to);
     wait_for_postcopy_status(to, "postcopy-paused");
 
     close(pair1[0]);
-    close(pair1[1]);
     close(pair2[0]);
     close(pair2[1]);
+
+    if (stage != POSTCOPY_FAIL_CHANNEL_ESTABLISH) {
+        close(pair1[1]);
+    }
 #endif
 }
 
@@ -1471,12 +1519,12 @@ static void test_postcopy_recovery_common(MigrateCommon *args)
     wait_for_postcopy_status(to, "postcopy-paused");
     wait_for_postcopy_status(from, "postcopy-paused");
 
-    if (args->postcopy_recovery_test_fail) {
+    if (args->postcopy_recovery_fail_stage) {
         /*
          * Test when a wrong socket specified for recover, and then the
          * ability to kick it out, and continue with a correct socket.
          */
-        postcopy_recover_fail(from, to);
+        postcopy_recover_fail(from, to, args->postcopy_recovery_fail_stage);
         /* continue with a good recovery */
     }
 
@@ -1510,7 +1558,16 @@ static void test_postcopy_recovery(void)
 static void test_postcopy_recovery_double_fail(void)
 {
     MigrateCommon args = {
-        .postcopy_recovery_test_fail = true,
+        .postcopy_recovery_fail_stage = POSTCOPY_FAIL_RECOVERY,
+    };
+
+    test_postcopy_recovery_common(&args);
+}
+
+static void test_postcopy_recovery_channel_reconnect(void)
+{
+    MigrateCommon args = {
+        .postcopy_recovery_fail_stage = POSTCOPY_FAIL_CHANNEL_ESTABLISH,
     };
 
     test_postcopy_recovery_common(&args);
@@ -3497,6 +3554,8 @@ int main(int argc, char **argv)
                            test_postcopy_preempt_recovery);
         migration_test_add("/migration/postcopy/recovery/double-failures",
                            test_postcopy_recovery_double_fail);
+        migration_test_add("/migration/postcopy/recovery/channel-reconnect",
+                           test_postcopy_recovery_channel_reconnect);
         if (is_x86) {
             migration_test_add("/migration/postcopy/suspend",
                                test_postcopy_suspend);
-- 
2.45.0



  parent reply	other threads:[~2024-06-17 18:17 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-17 18:15 [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
2024-06-17 18:15 ` [PATCH v2 01/10] migration/multifd: Avoid the final FLUSH in complete() Peter Xu
2024-06-17 18:15 ` [PATCH v2 02/10] migration: Rename thread debug names Peter Xu
2024-06-19  1:05   ` Zhijian Li (Fujitsu) via
2024-06-17 18:15 ` [PATCH v2 03/10] migration: Use MigrationStatus instead of int Peter Xu
2024-06-17 19:38   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 04/10] migration: Cleanup incoming migration setup state change Peter Xu
2024-06-17 19:41   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 05/10] migration/postcopy: Add postcopy-recover-setup phase Peter Xu
2024-06-17 19:45   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 06/10] migration/docs: Update postcopy recover session for SETUP phase Peter Xu
2024-06-17 19:47   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 07/10] tests/migration-tests: Drop most WIN32 ifdefs for postcopy failure tests Peter Xu
2024-06-17 19:49   ` Fabiano Rosas
2024-06-17 18:15 ` [PATCH v2 08/10] tests/migration-tests: Always enable migration events Peter Xu
2024-06-17 19:51   ` Fabiano Rosas
2024-06-17 21:23     ` Peter Xu
2024-06-19 20:39       ` Peter Xu
2024-06-17 18:15 ` [PATCH v2 09/10] tests/migration-tests: Verify postcopy-recover-setup status Peter Xu
2024-06-17 19:53   ` Fabiano Rosas
2024-06-17 18:15 ` Peter Xu [this message]
2024-06-17 20:07   ` [PATCH v2 10/10] tests/migration-tests: Cover postcopy failure on reconnect Fabiano Rosas
2024-06-17 19:34 ` [PATCH v2 00/10] migration: New postcopy state, and some cleanups Peter Xu
2024-06-17 20:12   ` Fabiano Rosas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240617181534.1425179-11-peterx@redhat.com \
    --to=peterx@redhat.com \
    --cc=armbru@redhat.com \
    --cc=bdas@redhat.com \
    --cc=eblake@redhat.com \
    --cc=farosas@suse.de \
    --cc=jdenemar@redhat.com \
    --cc=lvivier@redhat.com \
    --cc=ppandit@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).