qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: "Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com>
Cc: Fabiano Rosas <farosas@suse.de>,
	Li Zhijian via <qemu-devel@nongnu.org>,
	Laurent Vivier <lvivier@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Subject: Re: [PATCH 2/2] [NOT-FOR-MERGE] Add qtest for migration over RDMA
Date: Wed, 19 Feb 2025 07:47:44 -0500	[thread overview]
Message-ID: <Z7XS8JmtxivALM92@x1.local> (raw)
In-Reply-To: <ea265434-7842-4556-9a99-98ce42b6c1f1@fujitsu.com>

On Wed, Feb 19, 2025 at 05:33:26AM +0000, Zhijian Li (Fujitsu) wrote:
> 
> 
> On 19/02/2025 06:40, Peter Xu wrote:
> > On Tue, Feb 18, 2025 at 06:03:48PM -0300, Fabiano Rosas wrote:
> >> Li Zhijian via <qemu-devel@nongnu.org> writes:
> >>
> >>> This qtest requirs there is RXE link in the host.
> >>>
> >>> Here is an example to show how to add this RXE link:
> >>> $ ./new-rdma-link.sh
> >>> 192.168.22.93
> >>>
> >>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> >>> ---
> >>> The RDMA migration was broken again...due to lack of sufficient test/qtest.
> >>>
> >>> It's urgly to add and execute a script to establish an RDMA link in
> >>> the C program. If anyone has a better suggestion, please let me know.
> >>>
> >>> $ cat ./new-rdma-link.sh
> >>> get_ipv4_addr() {
> >>>          ip -4 -o addr show dev "$1" |
> >>>                  sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p'
> >>> }
> >>>
> >>> has_soft_rdma() {
> >>>          rdma link | grep -q " netdev $1[[:blank:]]*\$"
> >>> }
> >>>
> >>> start_soft_rdma() {
> >>>          local type
> >>>
> >>>          modprobe rdma_rxe || return $?
> >>>          type=rxe
> >>>          (
> >>>                  cd /sys/class/net &&
> >>>                          for i in *; do
> >>>                                  [ -e "$i" ] || continue
> >>>                                  [ "$i" = "lo" ] && continue
> >>>                                  [ "$(<"$i/addr_len")" = 6 ] || continue
> >>>                                  [ "$(<"$i/carrier")" = 1 ] || continue
> >>>                                  has_soft_rdma "$i" && break
> >>>                                  rdma link add "${i}_$type" type $type netdev "$i" && break
> >>>                          done
> >>>                  has_soft_rdma "$i" && echo $i
> >>>          )
> >>>
> >>> }
> >>>
> >>> rxe_link=$(start_soft_rdma)
> >>> [[ "$rxe_link" ]] && get_ipv4_addr $rxe_link
> >>>
> >>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> >>> ---
> >>>   tests/qtest/migration/new-rdma-link.sh |  34 ++++++++
> >>>   tests/qtest/migration/precopy-tests.c  | 103 +++++++++++++++++++++++++
> >>>   2 files changed, 137 insertions(+)
> >>>   create mode 100644 tests/qtest/migration/new-rdma-link.sh
> >>>
> >>> diff --git a/tests/qtest/migration/new-rdma-link.sh b/tests/qtest/migration/new-rdma-link.sh
> >>> new file mode 100644
> >>> index 00000000000..ca20594eaae
> >>> --- /dev/null
> >>> +++ b/tests/qtest/migration/new-rdma-link.sh
> >>> @@ -0,0 +1,34 @@
> >>> +#!/bin/bash
> >>> +
> >>> +# Copied from blktests
> >>> +get_ipv4_addr() {
> >>> +	ip -4 -o addr show dev "$1" |
> >>> +		sed -n 's/.*[[:blank:]]inet[[:blank:]]*\([^[:blank:]/]*\).*/\1/p'
> >>> +}
> >>> +
> >>> +has_soft_rdma() {
> >>> +	rdma link | grep -q " netdev $1[[:blank:]]*\$"
> >>> +}
> >>> +
> >>> +start_soft_rdma() {
> >>> +	local type
> >>> +
> >>> +	modprobe rdma_rxe || return $?
> >>> +	type=rxe
> >>> +	(
> >>> +		cd /sys/class/net &&
> >>> +			for i in *; do
> >>> +				[ -e "$i" ] || continue
> >>> +				[ "$i" = "lo" ] && continue
> >>> +				[ "$(<"$i/addr_len")" = 6 ] || continue
> >>> +				[ "$(<"$i/carrier")" = 1 ] || continue
> >>> +				has_soft_rdma "$i" && break
> >>> +				rdma link add "${i}_$type" type $type netdev "$i" && break
> >>> +			done
> >>> +		has_soft_rdma "$i" && echo $i
> >>> +	)
> >>> +
> >>> +}
> >>> +
> >>> +rxe_link=$(start_soft_rdma)
> >>> +[[ "$rxe_link" ]] && get_ipv4_addr $rxe_link
> >>> diff --git a/tests/qtest/migration/precopy-tests.c b/tests/qtest/migration/precopy-tests.c
> >>> index 162fa695318..d2a1c9c9438 100644
> >>> --- a/tests/qtest/migration/precopy-tests.c
> >>> +++ b/tests/qtest/migration/precopy-tests.c
> >>> @@ -98,6 +98,105 @@ static void test_precopy_unix_dirty_ring(void)
> >>>       test_precopy_common(&args);
> >>>   }
> >>>   
> >>> +static int new_rdma_link(char *buffer) {
> >>> +    // Copied from blktests
> >>> +    const char *script =
> >>> +        "#!/bin/bash\n"
> >>> +        "\n"
> >>> +        "get_ipv4_addr() {\n"
> >>> +        "    ip -4 -o addr show dev \"$1\" |\n"
> >>> +        "    sed -n 's/.*[[:blank:]]inet[[:blank:]]*\\([^[:blank:]/]*\\).*/\\1/p'\n"
> >>> +        "}\n"
> >>> +        "\n"
> >>> +        "has_soft_rdma() {\n"
> >>> +        "    rdma link | grep -q \" netdev $1[[:blank:]]*\\$\"\n"
> >>> +        "}\n"
> >>> +        "\n"
> >>> +        "start_soft_rdma() {\n"
> >>> +        "    local type\n"
> >>> +        "\n"
> >>> +        "    modprobe rdma_rxe || return $?\n"
> >>> +        "    type=rxe\n"
> >>> +        "    (\n"
> >>> +        "        cd /sys/class/net &&\n"
> >>> +        "        for i in *; do\n"
> >>> +        "            [ -e \"$i\" ] || continue\n"
> >>> +        "            [ \"$i\" = \"lo\" ] && continue\n"
> >>> +        "            [ \"$(<$i/addr_len)\" = 6 ] || continue\n"
> >>> +        "            [ \"$(<$i/carrier)\" = 1 ] || continue\n"
> >>> +        "            has_soft_rdma \"$i\" && break\n"
> >>> +        "            rdma link add \"${i}_$type\" type $type netdev \"$i\" && break\n"
> >>> +        "        done\n"
> >>> +        "        has_soft_rdma \"$i\" && echo $i\n"
> >>> +        "    )\n"
> >>> +        "}\n"
> >>> +        "\n"
> >>> +        "rxe_link=$(start_soft_rdma)\n"
> >>> +        "[[ \"$rxe_link\" ]] && get_ipv4_addr $rxe_link\n";
> >>> +
> >>> +    char script_filename[] = "/tmp/temp_scriptXXXXXX";
> >>> +    int fd = mkstemp(script_filename);
> >>> +    if (fd == -1) {
> >>> +        perror("Failed to create temporary file");
> >>> +        return 1;
> >>> +    }
> >>> +
> >>> +    FILE *fp = fdopen(fd, "w");
> >>> +    if (fp == NULL) {
> >>> +        perror("Failed to open file stream");
> >>> +        close(fd);
> >>> +        return 1;
> >>> +    }
> >>> +    fprintf(fp, "%s", script);
> >>> +    fclose(fp);
> >>> +
> >>> +    if (chmod(script_filename, 0700) == -1) {
> >>> +        perror("Failed to set execute permission");
> >>> +        return 1;
> >>> +    }
> >>> +
> >>> +    FILE *pipe = popen(script_filename, "r");
> >>> +    if (pipe == NULL) {
> >>> +        perror("Failed to run script");
> >>> +        return 1;
> >>> +    }
> >>> +
> >>> +    int idx = 0;
> >>> +    while (fgets(buffer + idx, 128 - idx, pipe) != NULL) {
> >>> +        idx += strlen(buffer);
> >>> +    }
> >>> +    if (buffer[idx - 1] == '\n')
> >>> +        buffer[idx - 1] = 0;
> >>> +
> >>> +    int status = pclose(pipe);
> >>> +    if (status == -1) {
> >>> +        perror("Error reported by pclose()");
> >>> +    } else if (!WIFEXITED(status)) {
> >>> +        printf("Script did not terminate normally\n");
> >>> +    }
> >>> +
> >>> +    remove(script_filename);
> > 
> > The script can be put separately instead if hard-coded here, right?
> 
> 
> Sure, If so, I wonder whether the migration-test program is able to know where is this script?
> 
> 
> > 
> >>> +
> >>> +    return 0;
> >>> +}
> >>> +
> >>> +static void test_precopy_rdma_plain(void)
> >>> +{
> >>> +    char buffer[128] = {};
> >>> +
> >>> +    if (new_rdma_link(buffer))
> >>> +        return;
> >>> +
> >>> +    g_autofree char *uri = g_strdup_printf("rdma:%s:7777", buffer);
> >>> +
> >>> +    MigrateCommon args = {
> >>> +        .listen_uri = uri,
> >>> +        .connect_uri = uri,
> >>> +    };
> >>> +
> >>> +    test_precopy_common(&args);
> >>> +}
> >>> +
> >>>   static void test_precopy_tcp_plain(void)
> >>>   {
> >>>       MigrateCommon args = {
> >>> @@ -968,6 +1067,10 @@ static void migration_test_add_precopy_smoke(MigrationTestEnv *env)
> >>>                          test_multifd_tcp_uri_none);
> >>>       migration_test_add("/migration/multifd/tcp/plain/cancel",
> >>>                          test_multifd_tcp_cancel);
> >>> +#ifdef CONFIG_RDMA
> >>> +    migration_test_add("/migration/precopy/rdma/plain",
> >>> +                       test_precopy_rdma_plain);
> >>> +#endif
> >>>   }
> >>>   
> >>>   void migration_test_add_precopy(MigrationTestEnv *env)
> >>
> >> Thanks, that's definitely better than nothing. I'll experiment with this
> >> locally, see if I can at least run it before sending a pull request.
> > 
> > With your newly added --full, IIUC we can add whatever we want there.
> > E.g. we can add --rdma and iff specified, migration-test adds the rdma test.
> > 
> > Or.. skip the test when the rdma link isn't available.
> > 
> > If we could separate the script into a file, it'll be better.  We could
> > create scripts/migration dir and put all migration scripts over there,
> 
> We have any other existing script? I didn't find it in current QEMU tree.

We have a few that I'm aware of:

  - analyze-migration.py
  - vmstate-static-checker.py
  - userfaultfd-wrlat.py

> 
> 
> > then
> > in the test it tries to detect rdma link and fetch the ip only
> 
> It should work without root permission if we just *detect* and *fetch ip*.
> 
> Do you also mean we can split new-rdma-link.sh to 2 separate scripts
> - add-rdma-link.sh # optionally, execute by user before the test (require root permission)
> - detect-fetch-rdma.sh # execute from the migration-test

Hmm indeed we still need a script to scan over all the ports..

If having --rdma is a good idea, maybe we can further make it a parameter
to --rdma?

  $ migration-test --rdma $RDMA_IP

Or:

  $ migration-test --rdma-ip $RDMA_IP

Then maybe migration-test can directly take that IP and run the tests,
assuming the admin setup the rdma link.  Then we keep that one script.

Or I assume it's still ok that the test requires root only for --rdma, then
invoke the script directly in the test.  If so, we'd better also remove the
rdma link after test finished, so no side effect of the test (modprobe is
probably fine).

We can wait and see how far Fabiano went with this, and also his opinion.

-- 
Peter Xu



  reply	other threads:[~2025-02-19 12:48 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-18  7:43 [PATCH 1/2] migration: Prioritize RDMA in ram_save_target_page() Li Zhijian via
2025-02-18  7:43 ` [PATCH 2/2] [NOT-FOR-MERGE] Add qtest for migration over RDMA Li Zhijian via
2025-02-18 21:03   ` Fabiano Rosas
2025-02-18 22:40     ` Peter Xu
2025-02-19  5:33       ` Zhijian Li (Fujitsu) via
2025-02-19 12:47         ` Peter Xu [this message]
2025-02-19 13:20           ` Fabiano Rosas
2025-02-19 14:11             ` Peter Xu
2025-02-20  9:40               ` Li Zhijian via
2025-02-20 15:55                 ` Peter Xu
2025-02-21  1:32                   ` Zhijian Li (Fujitsu) via
2025-02-18 20:30 ` [PATCH 1/2] migration: Prioritize RDMA in ram_save_target_page() Fabiano Rosas
2025-02-18 22:03   ` Peter Xu
2025-02-19  9:39     ` Zhijian Li (Fujitsu) via
2025-02-19 13:23       ` Peter Xu
2025-02-20  1:21         ` Zhijian Li (Fujitsu) via

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z7XS8JmtxivALM92@x1.local \
    --to=peterx@redhat.com \
    --cc=farosas@suse.de \
    --cc=lizhijian@fujitsu.com \
    --cc=lvivier@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).