qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/8] virtio-net: live-TAP local migration
@ 2025-09-03 13:36 Vladimir Sementsov-Ogievskiy
  2025-09-03 13:36 ` [PATCH v2 1/8] net/tap: add some trace points Vladimir Sementsov-Ogievskiy
                   ` (8 more replies)
  0 siblings, 9 replies; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-03 13:36 UTC (permalink / raw)
  To: jasowang
  Cc: qemu-devel, vsementsov, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd, berrange

Hi all!

Here is a  new migration capability "local-tap", which
allows local migration of TAP device, including its properties
and open fds.

With this new option, management software doesn't need to
initialize new TAP and do a switch to it. Nothing should be
done around virtio-net in local migration: it just migrates
and continues to use same TAP device. So we avoid extra logic
in management software, extra allocations in kenel (for new TAP),
and corresponding extra delay in migration downtime.

Note that patch 07 is reused from
[PATCH 00/33] vhost-user-blk: live-backend local migration
to not create extra dependency on a big series.

Based on [PATCH v3 00/19] TAP initialization refactoring, or in other
words:
Based-on: <20250903124934.1169899-1-vsementsov@yandex-team.ru>

Vladimir Sementsov-Ogievskiy (8):
  net/tap: add some trace points
  net/tap: keep exit notifier only when downscript set
  net/tap: refactor net_tap_setup_vhost()
  qapi: add interface for local TAP migration
  net/tap: implement interfaces for local migration
  virtio-net: support local tap migration
  test/functional: exec_command_and_wait_for_pattern: add vm arg
  tests/functional: add test_x86_64_tap_fd_migration

 hw/net/virtio-net.c                           | 100 ++++-
 include/hw/virtio/virtio-net.h                |   2 +
 include/net/tap.h                             |   4 +
 migration/options.c                           |   7 +
 migration/options.h                           |   1 +
 net/tap.c                                     | 323 ++++++++++++----
 net/trace-events                              |   7 +
 qapi/migration.json                           |   9 +-
 qapi/net.json                                 |  12 +-
 tests/functional/qemu_test/cmd.py             |   7 +-
 .../test_x86_64_tap_fd_migration.py           | 347 ++++++++++++++++++
 11 files changed, 734 insertions(+), 85 deletions(-)
 create mode 100644 tests/functional/test_x86_64_tap_fd_migration.py

-- 
2.48.1



^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/8] net/tap: add some trace points
  2025-09-03 13:36 [PATCH v2 0/8] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
@ 2025-09-03 13:36 ` Vladimir Sementsov-Ogievskiy
  2025-09-03 14:11   ` Daniel P. Berrangé
  2025-09-03 13:36 ` [PATCH v2 2/8] net/tap: keep exit notifier only when downscript set Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-03 13:36 UTC (permalink / raw)
  To: jasowang
  Cc: qemu-devel, vsementsov, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd, berrange

Add trace points to simplify debugging migration.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 net/tap.c        | 58 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/trace-events |  7 ++++++
 2 files changed, 65 insertions(+)

diff --git a/net/tap.c b/net/tap.c
index 2530627a9a..224fb37310 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -43,6 +43,7 @@
 #include "qemu/main-loop.h"
 #include "qemu/sockets.h"
 #include "hw/virtio/vhost.h"
+#include "trace.h"
 
 #include "net/tap.h"
 
@@ -148,6 +149,9 @@ static ssize_t tap_receive_iov(NetClientState *nc, const struct iovec *iov,
     g_autofree struct iovec *iov_copy = NULL;
     struct virtio_net_hdr hdr = { };
 
+    trace_tap_receive_iov(s->using_vnet_hdr, s->host_vnet_hdr_len, iovcnt,
+                          iov->iov_len);
+
     if (s->host_vnet_hdr_len && !s->using_vnet_hdr) {
         iov_copy = g_new(struct iovec, iovcnt + 1);
         iov_copy[0].iov_base = &hdr;
@@ -183,6 +187,49 @@ static void tap_send_completed(NetClientState *nc, ssize_t len)
     tap_read_poll(s, true);
 }
 
+static char *tap_dump_packet(const uint8_t *buf, int size)
+{
+    int i, j;
+    char hex_line[80];  /* Enough space for hex pairs + spaces */
+    char ascii_line[17]; /* 16 + 1 for null terminator */
+    GString *dump_str = g_string_new(NULL);
+
+    g_string_append_printf(dump_str, "Packet dump (%d bytes):\n", size);
+
+    for (i = 0; i < size; i += 16) {
+        memset(hex_line, 0, sizeof(hex_line));
+        memset(ascii_line, 0, sizeof(ascii_line));
+
+        /* Build hex line in groups of 2 bytes (4 hex chars) */
+        int hex_pos = 0;
+        for (j = 0; j < 16 && (i + j) < size; j += 2) {
+            if (i + j + 1 < size) {
+                /* Two bytes available */
+                hex_pos += snprintf(hex_line + hex_pos,
+                                    sizeof(hex_line) - hex_pos,
+                                    "%02x%02x ", buf[i + j], buf[i + j + 1]);
+            } else {
+                /* Only one byte left */
+                hex_pos += snprintf(hex_line + hex_pos,
+                                    sizeof(hex_line) - hex_pos,
+                                    "%02x   ", buf[i + j]);
+            }
+        }
+
+        /* Build ASCII line */
+        for (j = 0; j < 16 && (i + j) < size; j++) {
+            uint8_t byte = buf[i + j];
+            ascii_line[j] = (byte >= 32 && byte <= 126) ? byte : '.';
+        }
+
+        /* Add the line in tcpdump-like format */
+        g_string_append_printf(dump_str, "\t0x%04x:  %-40s %s\n",
+                               i, hex_line, ascii_line);
+    }
+
+    return g_string_free(dump_str, false);
+}
+
 static void tap_send(void *opaque)
 {
     TAPState *s = opaque;
@@ -199,6 +246,13 @@ static void tap_send(void *opaque)
             break;
         }
 
+        if (trace_event_get_state_backends(TRACE_TAP_PACKET_DUMP)) {
+            g_autofree char *dump = tap_dump_packet(s->buf, size);
+            trace_tap_packet_dump(dump);
+        }
+
+        trace_tap_send_packet(s->using_vnet_hdr, s->host_vnet_hdr_len, size);
+
         if (s->host_vnet_hdr_len && size <= s->host_vnet_hdr_len) {
             /* Invalid packet */
             break;
@@ -980,6 +1034,8 @@ int tap_enable(NetClientState *nc)
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
     int ret;
 
+    trace_tap_enable();
+
     if (s->enabled) {
         return 0;
     } else {
@@ -997,6 +1053,8 @@ int tap_disable(NetClientState *nc)
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
     int ret;
 
+    trace_tap_disable();
+
     if (s->enabled == 0) {
         return 0;
     } else {
diff --git a/net/trace-events b/net/trace-events
index cda960f42b..b51427f539 100644
--- a/net/trace-events
+++ b/net/trace-events
@@ -29,3 +29,10 @@ vhost_vdpa_set_address_space_id(void *v, unsigned vq_group, unsigned asid_num) "
 vhost_vdpa_net_load_cmd(void *s, uint8_t class, uint8_t cmd, int data_num, int data_size) "vdpa state: %p class: %u cmd: %u sg_num: %d size: %d"
 vhost_vdpa_net_load_cmd_retval(void *s, uint8_t class, uint8_t cmd, int r) "vdpa state: %p class: %u cmd: %u retval: %d"
 vhost_vdpa_net_load_mq(void *s, int ncurqps) "vdpa state: %p current_qpairs: %d"
+
+# tap.c
+tap_receive_iov(bool using_vnet_hdr, uint32_t host_vnet_hdr_len, int iovcnt, size_t iov_len) "using_vnet_hdr:%d host_vnet_hdr_len:%u iovcnt:%d iov_len:%zu"
+tap_send_packet(bool using_vnet_hdr, uint32_t host_vnet_hdr_len, int size) "using_vnet_hdr:%d host_vnet_hdr_len:%u size:%d"
+tap_enable(void) "tap enabled"
+tap_disable(void) "tap disabled"
+tap_packet_dump(const char *dump_str) "%s"
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/8] net/tap: keep exit notifier only when downscript set
  2025-09-03 13:36 [PATCH v2 0/8] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
  2025-09-03 13:36 ` [PATCH v2 1/8] net/tap: add some trace points Vladimir Sementsov-Ogievskiy
@ 2025-09-03 13:36 ` Vladimir Sementsov-Ogievskiy
  2025-09-03 13:37 ` [PATCH v2 3/8] net/tap: refactor net_tap_setup_vhost() Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-03 13:36 UTC (permalink / raw)
  To: jasowang
  Cc: qemu-devel, vsementsov, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd, berrange

The notifier is used only to call the downscript. Avoid extra
notifier for other cases.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 net/tap.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 224fb37310..7dc8af1e57 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -384,8 +384,10 @@ static void tap_cleanup(NetClientState *nc)
 
     qemu_purge_queued_packets(nc);
 
-    tap_exit_notify(&s->exit, NULL);
-    qemu_remove_exit_notifier(&s->exit);
+    if (s->exit.notify) {
+        tap_exit_notify(&s->exit, NULL);
+        qemu_remove_exit_notifier(&s->exit);
+    }
 
     tap_read_poll(s, false);
     tap_write_poll(s, false);
@@ -775,9 +777,7 @@ static int net_tap_fd_init_common(const Netdev *netdev, NetClientState *peer,
     }
     tap_read_poll(s, true);
     s->vhost_net = NULL;
-
-    s->exit.notify = tap_exit_notify;
-    qemu_add_exit_notifier(&s->exit);
+    s->exit.notify = NULL;
 
     if (netdev->type == NET_CLIENT_DRIVER_BRIDGE) {
         const NetdevBridgeOptions *bridge = &netdev->u.bridge;
@@ -805,6 +805,8 @@ static int net_tap_fd_init_common(const Netdev *netdev, NetClientState *peer,
             snprintf(s->down_script, sizeof(s->down_script), "%s", downscript);
             snprintf(s->down_script_arg, sizeof(s->down_script_arg),
                      "%s", ifname);
+            s->exit.notify = tap_exit_notify;
+            qemu_add_exit_notifier(&s->exit);
         }
     }
 
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 3/8] net/tap: refactor net_tap_setup_vhost()
  2025-09-03 13:36 [PATCH v2 0/8] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
  2025-09-03 13:36 ` [PATCH v2 1/8] net/tap: add some trace points Vladimir Sementsov-Ogievskiy
  2025-09-03 13:36 ` [PATCH v2 2/8] net/tap: keep exit notifier only when downscript set Vladimir Sementsov-Ogievskiy
@ 2025-09-03 13:37 ` Vladimir Sementsov-Ogievskiy
  2025-09-03 13:37 ` [PATCH v2 4/8] qapi: add interface for local TAP migration Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-03 13:37 UTC (permalink / raw)
  To: jasowang
  Cc: qemu-devel, vsementsov, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd, berrange

1. Move function higher. (we anyway want to split it into two functions,
and change the code a lot, so take an opportunity to get rid of extra
declaration).

2. Split into _set_options() and _init_vhost(): we'll need it later
to implement TAP local migration feature.

3. Split requires to store options somewhere, and to be able to call
_init_vhost() later (from migriation _load() in later commit), store
options in the TAPState.

4. So, take an opportunity to zero-initialize options:
 - be safe (avoid uninitialized options)
 - don't care to initialize half of the options to zero by hand

5. Also, don't worry too much about poll_us: absent option should be
zero-initialized anyway.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 net/tap.c | 123 +++++++++++++++++++++++++++++-------------------------
 1 file changed, 66 insertions(+), 57 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 7dc8af1e57..a9d955ac5f 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -78,6 +78,7 @@ typedef struct TAPState {
     bool has_ufo;
     bool has_uso;
     bool enabled;
+    VhostNetOptions *vhost_options;
     VHostNetState *vhost_net;
     unsigned host_vnet_hdr_len;
     Notifier exit;
@@ -95,8 +96,6 @@ static int net_tap_fd_init_common(const Netdev *netdev, NetClientState *peer,
                                   const char *downscript,
                                   const char *vhostfdname,
                                   int vnet_hdr, int fd, Error **errp);
-static int net_tap_setup_vhost(TAPState *s, const NetdevTapOptions *tap,
-                               const char *vhostfdname, Error **errp);
 
 static void tap_update_fd_handler(TAPState *s)
 {
@@ -382,6 +381,8 @@ static void tap_cleanup(NetClientState *nc)
         s->vhost_net = NULL;
     }
 
+    g_free(s->vhost_options);
+
     qemu_purge_queued_packets(nc);
 
     if (s->exit.notify) {
@@ -745,6 +746,63 @@ static int net_tap_open_one(const Netdev *netdev,
 
 #define MAX_TAP_QUEUES 1024
 
+static int net_tap_set_vhost_options(TAPState *s, const NetdevTapOptions *tap,
+                                     const char *vhostfdname, Error **errp)
+{
+    int vhostfd;
+    bool vhost_on = tap->has_vhost ? tap->vhost :
+        vhostfdname || (tap->has_vhostforce && tap->vhostforce);
+
+    if (!vhost_on) {
+        return 0;
+    }
+
+    if (vhostfdname) {
+        vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, errp);
+        if (vhostfd == -1) {
+            return -1;
+        }
+    } else {
+        vhostfd = open("/dev/vhost-net", O_RDWR);
+        if (vhostfd < 0) {
+            error_setg_errno(errp, errno,
+                             "tap: open vhost char device failed");
+            return -1;
+        }
+    }
+
+    if (!qemu_set_blocking(vhostfd, false, errp)) {
+        return -1;
+    }
+
+    s->vhost_options = g_new(VhostNetOptions, 1);
+    *s->vhost_options = (VhostNetOptions) {
+        .backend_type = VHOST_BACKEND_TYPE_KERNEL,
+        .net_backend = &s->nc,
+        .busyloop_timeout = tap->poll_us,
+        .opaque = (void *)(uintptr_t)vhostfd,
+        .nvqs = 2,
+        .feature_bits = kernel_feature_bits,
+    };
+
+    return 0;
+}
+
+static int net_tap_init_vhost(TAPState *s, Error **errp) {
+    if (!s->vhost_options) {
+        return 0;
+    }
+
+    s->vhost_net = vhost_net_init(s->vhost_options);
+    if (!s->vhost_net) {
+        error_setg(errp,
+                   "vhost-net requested but could not be initialized");
+        return -1;
+    }
+
+    return 0;
+}
+
 static int net_tap_fd_init_common(const Netdev *netdev, NetClientState *peer,
                                   const char *model, const char *name,
                                   const char *ifname, const char *script,
@@ -810,7 +868,12 @@ static int net_tap_fd_init_common(const Netdev *netdev, NetClientState *peer,
         }
     }
 
-    ret = net_tap_setup_vhost(s, tap, vhostfdname, errp);
+    ret = net_tap_set_vhost_options(s, tap, vhostfdname, errp);
+    if (ret < 0) {
+        goto failed;
+    }
+
+    ret = net_tap_init_vhost(s, errp);
     if (ret < 0) {
         goto failed;
     }
@@ -822,60 +885,6 @@ failed:
     return -1;
 }
 
-static int net_tap_setup_vhost(TAPState *s, const NetdevTapOptions *tap,
-                               const char *vhostfdname, Error **errp)
-{
-    if (tap->has_vhost ? tap->vhost :
-        vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
-        VhostNetOptions options;
-        int vhostfd;
-
-        options.backend_type = VHOST_BACKEND_TYPE_KERNEL;
-        options.net_backend = &s->nc;
-        if (tap->has_poll_us) {
-            options.busyloop_timeout = tap->poll_us;
-        } else {
-            options.busyloop_timeout = 0;
-        }
-
-        if (vhostfdname) {
-            vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, errp);
-            if (vhostfd == -1) {
-                return -1;
-            }
-            if (!qemu_set_blocking(vhostfd, false, errp)) {
-                return -1;
-            }
-        } else {
-            vhostfd = open("/dev/vhost-net", O_RDWR);
-            if (vhostfd < 0) {
-                error_setg_errno(errp, errno,
-                                 "tap: open vhost char device failed");
-                return -1;
-            }
-            if (!qemu_set_blocking(vhostfd, false, errp)) {
-                return -1;
-            }
-        }
-        options.opaque = (void *)(uintptr_t)vhostfd;
-        options.nvqs = 2;
-        options.feature_bits = kernel_feature_bits;
-        options.get_acked_features = NULL;
-        options.save_acked_features = NULL;
-        options.max_tx_queue_size = 0;
-        options.is_vhost_user = false;
-
-        s->vhost_net = vhost_net_init(&options);
-        if (!s->vhost_net) {
-            error_setg(errp,
-                       "vhost-net requested but could not be initialized");
-            return -1;
-        }
-    }
-
-    return 0;
-}
-
 static int net_tap_from_monitor_fd(const Netdev *netdev, NetClientState *peer,
                                    const char *name, const char *vhostfdname,
                                    int *pvnet_hdr, const char *fdname,
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 4/8] qapi: add interface for local TAP migration
  2025-09-03 13:36 [PATCH v2 0/8] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2025-09-03 13:37 ` [PATCH v2 3/8] net/tap: refactor net_tap_setup_vhost() Vladimir Sementsov-Ogievskiy
@ 2025-09-03 13:37 ` Vladimir Sementsov-Ogievskiy
  2025-09-10  6:28   ` Markus Armbruster
  2025-09-03 13:37 ` [PATCH v2 5/8] net/tap: implement interfaces for local migration Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-03 13:37 UTC (permalink / raw)
  To: jasowang
  Cc: qemu-devel, vsementsov, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd, berrange

To migrate TAP device (including open fds) locally, user should:

1. enable local-tap migration capability both on source and target
2. use additional local-incoming=true option for tap on target

Why capability is not enough? We need an option to modify early
initialization of TAP, to avoid opening new fds. The capability
may not be set at the moment of netdev initialization.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 migration/options.c |  7 +++++++
 migration/options.h |  1 +
 qapi/migration.json |  9 ++++++++-
 qapi/net.json       | 12 +++++++++++-
 4 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/migration/options.c b/migration/options.c
index 4e923a2e07..6f5af52a5c 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -262,6 +262,13 @@ bool migrate_mapped_ram(void)
     return s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM];
 }
 
+bool migrate_local_tap(void)
+{
+    MigrationState *s = migrate_get_current();
+
+    return s->capabilities[MIGRATION_CAPABILITY_LOCAL_TAP];
+}
+
 bool migrate_ignore_shared(void)
 {
     MigrationState *s = migrate_get_current();
diff --git a/migration/options.h b/migration/options.h
index 82d839709e..08c0606072 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -30,6 +30,7 @@ bool migrate_colo(void);
 bool migrate_dirty_bitmaps(void);
 bool migrate_events(void);
 bool migrate_mapped_ram(void);
+bool migrate_local_tap(void);
 bool migrate_ignore_shared(void);
 bool migrate_late_block_activate(void);
 bool migrate_multifd(void);
diff --git a/qapi/migration.json b/qapi/migration.json
index 2387c21e9c..992a5b1e2b 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -517,6 +517,12 @@
 #     each RAM page.  Requires a migration URI that supports seeking,
 #     such as a file.  (since 9.0)
 #
+# @local-tap: Migrate TAPs locally, keeping backend alive. Open file
+#     descriptors and TAP-related state are migrated. Only may be
+#     used when migration channel is unix socket. For target device
+#     also @local-incoming option must be specified (since 10.2)
+#     (since 10.2)
+#
 # Features:
 #
 # @unstable: Members @x-colo and @x-ignore-shared are experimental.
@@ -536,7 +542,8 @@
            { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
            'validate-uuid', 'background-snapshot',
            'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
-           'dirty-limit', 'mapped-ram'] }
+           'dirty-limit', 'mapped-ram',
+           { 'name': 'local-tap', 'features': [ 'unstable' ] } ] }
 
 ##
 # @MigrationCapabilityStatus:
diff --git a/qapi/net.json b/qapi/net.json
index 78bcc9871e..8f53549d58 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -353,6 +353,15 @@
 # @poll-us: maximum number of microseconds that could be spent on busy
 #     polling for tap (since 2.7)
 #
+# @local-incoming: Do load open file descriptor for that TAP
+#     on incoming migration. May be used only if QEMU is started
+#     for incoming migration. Will work only together with local-tap
+#     migration capability enabled (default: false) (Since: 10.2)
+#
+# Features:
+#
+# @unstable: Member @local-incoming is experimental
+#
 # Since: 1.2
 ##
 { 'struct': 'NetdevTapOptions',
@@ -371,7 +380,8 @@
     '*vhostfds':   'str',
     '*vhostforce': 'bool',
     '*queues':     'uint32',
-    '*poll-us':    'uint32'} }
+    '*poll-us':    'uint32',
+    '*local-incoming': { 'type': 'bool', 'features': [ 'unstable' ] } } }
 
 ##
 # @NetdevSocketOptions:
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 5/8] net/tap: implement interfaces for local migration
  2025-09-03 13:36 [PATCH v2 0/8] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2025-09-03 13:37 ` [PATCH v2 4/8] qapi: add interface for local TAP migration Vladimir Sementsov-Ogievskiy
@ 2025-09-03 13:37 ` Vladimir Sementsov-Ogievskiy
  2025-09-03 14:34   ` Daniel P. Berrangé
  2025-09-03 13:37 ` [PATCH v2 6/8] virtio-net: support local tap migration Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-03 13:37 UTC (permalink / raw)
  To: jasowang
  Cc: qemu-devel, vsementsov, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd, berrange

Handle local-incoming option:

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 include/net/tap.h |   4 ++
 net/tap.c         | 136 +++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 119 insertions(+), 21 deletions(-)

diff --git a/include/net/tap.h b/include/net/tap.h
index 6f34f13eae..3ef2e2dbae 100644
--- a/include/net/tap.h
+++ b/include/net/tap.h
@@ -30,7 +30,11 @@
 
 int tap_enable(NetClientState *nc);
 int tap_disable(NetClientState *nc);
+bool tap_local_incoming(NetClientState *nc);
 
 int tap_get_fd(NetClientState *nc);
 
+int tap_load(NetClientState *nc, QEMUFile *f);
+int tap_save(NetClientState *nc, QEMUFile *f);
+
 #endif /* QEMU_NET_TAP_H */
diff --git a/net/tap.c b/net/tap.c
index a9d955ac5f..499db756ea 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -35,6 +35,8 @@
 #include "net/eth.h"
 #include "net/net.h"
 #include "clients.h"
+#include "migration/migration.h"
+#include "migration/qemu-file.h"
 #include "monitor/monitor.h"
 #include "system/system.h"
 #include "qapi/error.h"
@@ -82,6 +84,7 @@ typedef struct TAPState {
     VHostNetState *vhost_net;
     unsigned host_vnet_hdr_len;
     Notifier exit;
+    bool local_incoming;
 } TAPState;
 
 static void launch_script(const char *setup_script, const char *ifname,
@@ -803,6 +806,40 @@ static int net_tap_init_vhost(TAPState *s, Error **errp) {
     return 0;
 }
 
+int tap_save(NetClientState *nc, QEMUFile *f)
+{
+    TAPState *s = DO_UPCAST(TAPState, nc, nc);
+
+    qemu_file_put_fd(f, s->fd);
+    qemu_put_byte(f, s->using_vnet_hdr);
+    qemu_put_byte(f, s->has_ufo);
+    qemu_put_byte(f, s->has_uso);
+    qemu_put_byte(f, s->enabled);
+    qemu_put_be32(f, s->host_vnet_hdr_len);
+
+    return 0;
+}
+
+int tap_load(NetClientState *nc, QEMUFile *f)
+{
+    TAPState *s = DO_UPCAST(TAPState, nc, nc);
+
+    s->fd = qemu_file_get_fd(f);
+    if (s->fd < 0) {
+        return -1;
+    }
+
+    s->using_vnet_hdr = qemu_get_byte(f);
+    s->has_ufo = qemu_get_byte(f);
+    s->has_uso = qemu_get_byte(f);
+    s->enabled = qemu_get_byte(f);
+    qemu_get_be32s(f, &s->host_vnet_hdr_len);
+
+    tap_read_poll(s, true);
+
+    return net_tap_init_vhost(s, NULL);
+}
+
 static int net_tap_fd_init_common(const Netdev *netdev, NetClientState *peer,
                                   const char *model, const char *name,
                                   const char *ifname, const char *script,
@@ -810,30 +847,40 @@ static int net_tap_fd_init_common(const Netdev *netdev, NetClientState *peer,
                                   const char *vhostfdname,
                                   int vnet_hdr, int fd, Error **errp)
 {
-    const NetdevTapOptions *tap;
+    const NetdevTapOptions *tap = NULL;
     int ret;
     NetClientState *nc;
     TAPState *s;
+    bool local_incoming = false;
+
+    if (netdev->type == NET_CLIENT_DRIVER_TAP) {
+        tap = &netdev->u.tap;
+        local_incoming = tap->local_incoming;
+    }
 
     nc = qemu_new_net_client(&net_tap_info, peer, model, name);
 
     s = DO_UPCAST(TAPState, nc, nc);
-
-    s->fd = fd;
-    s->host_vnet_hdr_len = vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
-    s->using_vnet_hdr = false;
-    s->has_ufo = tap_probe_has_ufo(s->fd);
-    s->has_uso = tap_probe_has_uso(s->fd);
-    s->enabled = true;
-    tap_set_offload(&s->nc, 0, 0, 0, 0, 0, 0, 0);
-    /*
-     * Make sure host header length is set correctly in tap:
-     * it might have been modified by another instance of qemu.
-     */
-    if (vnet_hdr) {
-        tap_fd_set_vnet_hdr_len(s->fd, s->host_vnet_hdr_len);
+    s->local_incoming = local_incoming;
+
+    if (!local_incoming) {
+        s->fd = fd;
+        s->host_vnet_hdr_len = vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
+        s->using_vnet_hdr = false;
+        s->has_ufo = tap_probe_has_ufo(s->fd);
+        s->has_uso = tap_probe_has_uso(s->fd);
+        s->enabled = true;
+        tap_set_offload(&s->nc, 0, 0, 0, 0, 0, 0, 0);
+        /*
+         * Make sure host header length is set correctly in tap:
+         * it might have been modified by another instance of qemu.
+         */
+        if (vnet_hdr) {
+            tap_fd_set_vnet_hdr_len(s->fd, s->host_vnet_hdr_len);
+        }
+        tap_read_poll(s, true);
     }
-    tap_read_poll(s, true);
+
     s->vhost_net = NULL;
     s->exit.notify = NULL;
 
@@ -845,9 +892,8 @@ static int net_tap_fd_init_common(const Netdev *netdev, NetClientState *peer,
     }
 
     assert(netdev->type == NET_CLIENT_DRIVER_TAP);
-    tap = &netdev->u.tap;
 
-    if (tap_set_sndbuf(s->fd, tap, errp) < 0) {
+    if (!local_incoming && tap_set_sndbuf(s->fd, tap, errp) < 0) {
         goto failed;
     }
 
@@ -873,9 +919,11 @@ static int net_tap_fd_init_common(const Netdev *netdev, NetClientState *peer,
         goto failed;
     }
 
-    ret = net_tap_init_vhost(s, errp);
-    if (ret < 0) {
-        goto failed;
+    if (!local_incoming) {
+        ret = net_tap_init_vhost(s, errp);
+        if (ret < 0) {
+            goto failed;
+        }
     }
 
     return 0;
@@ -942,6 +990,38 @@ static int net_tap_open(const Netdev *netdev,
     return 0;
 }
 
+static int net_init_local_incoming(const Netdev *netdev,
+                                   const char *name,
+                                   NetClientState *peer,
+                                   Error **errp)
+{
+    const NetdevTapOptions *tap = &netdev->u.tap;
+    const char *downscript = tap->downscript;
+    int queues = tap->has_queues ? tap->queues : 1;
+    g_autofree char *default_downscript = NULL;
+    int i, ret;
+
+    assert(netdev->type == NET_CLIENT_DRIVER_TAP);
+    assert(!tap->script);
+
+    if (!downscript) {
+        downscript = default_downscript =
+                             get_relocated_path(DEFAULT_NETWORK_DOWN_SCRIPT);
+    }
+
+    for (i = 0; i < queues; i++) {
+        ret = net_tap_fd_init_common(netdev, peer, "tap", name,
+                                     tap->ifname ?: "",
+                                     "no", downscript,
+                                     tap->vhostfd, -1, -1, errp);
+        if (ret < 0) {
+            return -1;
+        }
+    }
+
+    return 0;
+}
+
 static int net_init_tap_fds(const Netdev *netdev, const char *name,
                             NetClientState *peer, Error **errp)
 {
@@ -1030,6 +1110,13 @@ int net_init_tap(const Netdev *netdev, const char *name,
         }
 
         return net_init_bridge(netdev, name, peer, errp);
+    } else if (tap->local_incoming) {
+        if (tap->script) {
+            error_setg(errp, "script= is invalid with local-incoming");
+            return -1;
+        }
+
+        return net_init_local_incoming(netdev, name, peer, errp);
     }
 
     if (tap->vhostfds) {
@@ -1078,3 +1165,10 @@ int tap_disable(NetClientState *nc)
         return ret;
     }
 }
+
+bool tap_local_incoming(NetClientState *nc)
+{
+    TAPState *s = DO_UPCAST(TAPState, nc, nc);
+
+    return s->local_incoming && runstate_check(RUN_STATE_INMIGRATE);
+}
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 6/8] virtio-net: support local tap migration
  2025-09-03 13:36 [PATCH v2 0/8] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2025-09-03 13:37 ` [PATCH v2 5/8] net/tap: implement interfaces for local migration Vladimir Sementsov-Ogievskiy
@ 2025-09-03 13:37 ` Vladimir Sementsov-Ogievskiy
  2025-09-03 13:37 ` [PATCH v2 7/8] test/functional: exec_command_and_wait_for_pattern: add vm arg Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-03 13:37 UTC (permalink / raw)
  To: jasowang
  Cc: qemu-devel, vsementsov, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd, berrange

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 hw/net/virtio-net.c            | 100 ++++++++++++++++++++++++++++++++-
 include/hw/virtio/virtio-net.h |   2 +
 2 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 6b5b5dace3..874e349fee 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -38,6 +38,8 @@
 #include "qapi/qapi-events-migration.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/misc.h"
+#include "migration/migration.h"
+#include "migration/options.h"
 #include "standard-headers/linux/ethtool.h"
 #include "system/system.h"
 #include "system/replay.h"
@@ -2999,7 +3001,13 @@ static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue)
     n->multiqueue = multiqueue;
     virtio_net_change_num_queues(n, max * 2 + 1);
 
-    virtio_net_set_queue_pairs(n);
+    /*
+     * Called from set_features(0) on reset, when on target we
+     * doesn't have fds yet
+     */
+    if (!n->tap_wait_incoming) {
+        virtio_net_set_queue_pairs(n);
+    }
 }
 
 static int virtio_net_pre_load_queues(VirtIODevice *vdev, uint32_t n)
@@ -3009,6 +3017,19 @@ static int virtio_net_pre_load_queues(VirtIODevice *vdev, uint32_t n)
     return 0;
 }
 
+static int virtio_net_pre_save_device(void *opaque)
+{
+    VirtIONet *n = opaque;
+    int i, r;
+
+    for (i = 0; i < n->curr_queue_pairs; i++) {
+        r = peer_detach(n, i);
+        assert(!r);
+    }
+
+    return 0;
+}
+
 static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
                                         Error **errp)
 {
@@ -3028,6 +3049,11 @@ static uint64_t virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
 
     virtio_add_feature(&features, VIRTIO_NET_F_MAC);
 
+    if (n->tap_wait_incoming) {
+        /* Excessive feature set is OK for early initialization */
+        return features;
+    }
+
     if (!peer_has_vnet_hdr(n)) {
         virtio_clear_feature(&features, VIRTIO_NET_F_CSUM);
         virtio_clear_feature(&features, VIRTIO_NET_F_HOST_TSO4);
@@ -3494,11 +3520,69 @@ static const VMStateDescription vhost_user_net_backend_state = {
     }
 };
 
+static int virtio_net_tap_save(QEMUFile *f, void *pv, size_t size,
+                                     const VMStateField *field,
+                                     JSONWriter *vmdesc)
+{
+    VirtIONet *n = pv;
+    int i;
+
+    for (i = 0; i < n->max_queue_pairs; i++) {
+        NetClientState *nc = qemu_get_subqueue(n->nic, i);
+        assert(nc->peer->info->type == NET_CLIENT_DRIVER_TAP);
+
+        tap_save(nc->peer, f);
+    }
+
+    return 0;
+}
+
+static int virtio_net_tap_load(QEMUFile *f, void *pv, size_t size,
+                                     const VMStateField *field)
+{
+    VirtIONet *n = pv;
+    VirtIODevice *vdev = VIRTIO_DEVICE(n);
+    VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
+    Error *local_err = NULL;
+    int i;
+
+    for (i = 0; i < n->max_queue_pairs; i++) {
+        NetClientState *nc = qemu_get_subqueue(n->nic, i);
+        assert(nc->peer->info->type == NET_CLIENT_DRIVER_TAP);
+
+        tap_load(nc->peer, f);
+    }
+
+    peer_test_vnet_hdr(n);
+    n->tap_wait_incoming = false;
+
+    vdev->host_features = vdc->get_features(vdev, vdev->host_features,
+                                            &local_err);
+    if (local_err) {
+        error_report_err(local_err);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static bool virtio_net_is_tap_local(void *opaque, int version_id)
+{
+    VirtIONet *n = opaque;
+    NetClientState *nc;
+
+    nc = qemu_get_queue(n->nic);
+
+    return migrate_local_tap() && nc->peer &&
+        nc->peer->info->type == NET_CLIENT_DRIVER_TAP;
+}
+
 static const VMStateDescription vmstate_virtio_net_device = {
     .name = "virtio-net-device",
     .version_id = VIRTIO_NET_VM_VERSION,
     .minimum_version_id = VIRTIO_NET_VM_VERSION,
     .post_load = virtio_net_post_load_device,
+    .pre_save = virtio_net_pre_save_device,
     .fields = (const VMStateField[]) {
         VMSTATE_UINT8_ARRAY(mac, VirtIONet, ETH_ALEN),
         VMSTATE_STRUCT_POINTER(vqs, VirtIONet,
@@ -3525,6 +3609,15 @@ static const VMStateDescription vmstate_virtio_net_device = {
          * but based on the uint.
          */
         VMSTATE_BUFFER_POINTER_UNSAFE(vlans, VirtIONet, 0, MAX_VLAN >> 3),
+        {
+            .name = "tap",
+            .info = &(const VMStateInfo) {
+                .name = "virtio-net vhost-user backend state",
+                .get = virtio_net_tap_load,
+                .put = virtio_net_tap_save,
+            },
+            .field_exists = virtio_net_is_tap_local,
+        },
         VMSTATE_WITH_TMP(VirtIONet, struct VirtIONetMigTmp,
                          vmstate_virtio_net_has_vnet),
         VMSTATE_UINT8(mac_table.multi_overflow, VirtIONet),
@@ -3954,6 +4047,11 @@ static void virtio_net_device_realize(DeviceState *dev, Error **errp)
         vhost_net_set_config(get_vhost_net(nc->peer),
             (uint8_t *)&netcfg, 0, ETH_ALEN, VHOST_SET_CONFIG_TYPE_FRONTEND);
     }
+
+    if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_TAP) {
+        n->tap_wait_incoming = tap_local_incoming(nc->peer);
+    }
+
     QTAILQ_INIT(&n->rsc_chains);
     n->qdev = dev;
 
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 73fdefc0dc..04ae0e4c06 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -231,6 +231,8 @@ struct VirtIONet {
     struct EBPFRSSContext ebpf_rss;
     uint32_t nr_ebpf_rss_fds;
     char **ebpf_rss_fds;
+
+    bool tap_wait_incoming;
 };
 
 size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 7/8] test/functional: exec_command_and_wait_for_pattern: add vm arg
  2025-09-03 13:36 [PATCH v2 0/8] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2025-09-03 13:37 ` [PATCH v2 6/8] virtio-net: support local tap migration Vladimir Sementsov-Ogievskiy
@ 2025-09-03 13:37 ` Vladimir Sementsov-Ogievskiy
  2025-09-03 13:37 ` [PATCH v2 8/8] tests/functional: add test_x86_64_tap_fd_migration Vladimir Sementsov-Ogievskiy
  2025-09-04 14:42 ` [PATCH v2 0/8] virtio-net: live-TAP local migration Lei Yang
  8 siblings, 0 replies; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-03 13:37 UTC (permalink / raw)
  To: jasowang
  Cc: qemu-devel, vsementsov, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd, berrange

Allow to specify non default vm for the command.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Thomas Huth <thuth@redhat.com>
---
 tests/functional/qemu_test/cmd.py | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/tests/functional/qemu_test/cmd.py b/tests/functional/qemu_test/cmd.py
index dc5f422b77..28b36a3a54 100644
--- a/tests/functional/qemu_test/cmd.py
+++ b/tests/functional/qemu_test/cmd.py
@@ -172,7 +172,8 @@ def exec_command(test, command):
     _console_interaction(test, None, None, command + '\r')
 
 def exec_command_and_wait_for_pattern(test, command,
-                                      success_message, failure_message=None):
+                                      success_message, failure_message=None,
+                                      vm=None):
     """
     Send a command to a console (appending CRLF characters), then wait
     for success_message to appear on the console, while logging the.
@@ -184,9 +185,11 @@ def exec_command_and_wait_for_pattern(test, command,
     :param command: the command to send
     :param success_message: if this message appears, test succeeds
     :param failure_message: if this message appears, test fails
+    :param vm: the VM to use (defaults to test.vm if None)
     """
     assert success_message
-    _console_interaction(test, success_message, failure_message, command + '\r')
+    _console_interaction(test, success_message, failure_message, command + '\r',
+                         vm=vm)
 
 def get_qemu_img(test):
     test.log.debug('Looking for and selecting a qemu-img binary')
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 8/8] tests/functional: add test_x86_64_tap_fd_migration
  2025-09-03 13:36 [PATCH v2 0/8] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2025-09-03 13:37 ` [PATCH v2 7/8] test/functional: exec_command_and_wait_for_pattern: add vm arg Vladimir Sementsov-Ogievskiy
@ 2025-09-03 13:37 ` Vladimir Sementsov-Ogievskiy
  2025-09-03 14:47   ` Daniel P. Berrangé
  2025-09-04 14:42 ` [PATCH v2 0/8] virtio-net: live-TAP local migration Lei Yang
  8 siblings, 1 reply; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-03 13:37 UTC (permalink / raw)
  To: jasowang
  Cc: qemu-devel, vsementsov, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd, berrange

Add test for a new feature of local TAP migration with fd passing
through unix socket.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 .../test_x86_64_tap_fd_migration.py           | 347 ++++++++++++++++++
 1 file changed, 347 insertions(+)
 create mode 100644 tests/functional/test_x86_64_tap_fd_migration.py

diff --git a/tests/functional/test_x86_64_tap_fd_migration.py b/tests/functional/test_x86_64_tap_fd_migration.py
new file mode 100644
index 0000000000..f6d18fe39f
--- /dev/null
+++ b/tests/functional/test_x86_64_tap_fd_migration.py
@@ -0,0 +1,347 @@
+#!/usr/bin/env python3
+#
+# Functional test that tests TAP local migration
+# with fd passing
+#
+# Copyright (c) Yandex
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import os
+import time
+import subprocess
+import signal
+from typing import Tuple
+
+from qemu_test import (
+    LinuxKernelTest,
+    Asset,
+    exec_command_and_wait_for_pattern,
+)
+
+GUEST_IP = "10.0.1.2"
+GUEST_IP_MASK = f"{GUEST_IP}/24"
+GUEST_MAC = "d6:0d:75:f8:0f:b7"
+HOST_IP = "10.0.1.1"
+HOST_IP_MASK = f"{HOST_IP}/24"
+TAP_ID = "tap0"
+TAP_MAC = "e6:1d:44:b5:03:5d"
+
+
+def run(cmd: str, check: bool = True) -> None:
+    subprocess.run(cmd, check=check, shell=True)
+
+
+def fetch(cmd: str, check: bool = True) -> str:
+    return subprocess.run(
+        cmd, check=check, shell=True, stdout=subprocess.PIPE, text=True
+    ).stdout
+
+
+def del_tap() -> None:
+    run(f"ip tuntap del {TAP_ID} mode tap multi_queue", check=False)
+
+
+def init_tap() -> None:
+    run(f"ip tuntap add dev {TAP_ID} mode tap multi_queue")
+    run(f"ip link set dev {TAP_ID} address {TAP_MAC}")
+    run(f"ip addr add {HOST_IP_MASK} dev {TAP_ID}")
+    run(f"ip link set {TAP_ID} up")
+
+
+def parse_ping_line(line: str) -> float:
+    # suspect lines like
+    # [1748524876.590509] 64 bytes from 94.245.155.3 \
+    #      (94.245.155.3): icmp_seq=1 ttl=250 time=101 ms
+    spl = line.split()
+    return float(spl[0][1:-1])
+
+
+def parse_ping_output(out) -> Tuple[bool, float, float]:
+    lines = [x for x in out.split("\n") if x.startswith("[")]
+
+    try:
+        first_no_ans = next(
+            (ind for ind in range(len(lines)) if lines[ind][20:26] == "no ans")
+        )
+    except StopIteration:
+        return False, parse_ping_line(lines[0]), parse_ping_line(lines[-1])
+
+    last_no_ans = next(
+        (ind for ind in range(len(lines) - 1, -1, -1) if lines[ind][20:26] == "no ans")
+    )
+
+    return (
+        True,
+        parse_ping_line(lines[first_no_ans]),
+        parse_ping_line(lines[last_no_ans]),
+    )
+
+
+def wait_migration_finish(source_vm, target_vm):
+    migr_events = (
+        ("MIGRATION", {"data": {"status": "completed"}}),
+        ("MIGRATION", {"data": {"status": "failed"}}),
+    )
+
+    source_e = source_vm.events_wait(migr_events)["data"]
+    target_e = target_vm.events_wait(migr_events)["data"]
+
+    source_s = source_vm.cmd("query-status")["status"]
+    target_s = target_vm.cmd("query-status")["status"]
+
+    assert (
+        source_e["status"] == "completed"
+        and target_e["status"] == "completed"
+        and source_s == "postmigrate"
+        and target_s == "paused"
+    ), f"""Migration failed:
+    SRC status: {source_s}
+    SRC event: {source_e}
+    TGT status: {target_s}
+    TGT event:{target_e}"""
+
+
+class VhostUserBlkFdMigration(LinuxKernelTest):
+
+    ASSET_KERNEL = Asset(
+        (
+            "https://archives.fedoraproject.org/pub/archive/fedora/linux/releases"
+            "/31/Server/x86_64/os/images/pxeboot/vmlinuz"
+        ),
+        "d4738d03dbbe083ca610d0821d0a8f1488bebbdccef54ce33e3adb35fda00129",
+    )
+
+    ASSET_INITRD = Asset(
+        (
+            "https://archives.fedoraproject.org/pub/archive/fedora/linux/releases"
+            "/31/Server/x86_64/os/images/pxeboot/initrd.img"
+        ),
+        "277cd6c7adf77c7e63d73bbb2cded8ef9e2d3a2f100000e92ff1f8396513cd8b",
+    )
+
+    ASSET_ALPINE_ISO = Asset(
+        (
+            "https://dl-cdn.alpinelinux.org/"
+            "alpine/v3.22/releases/x86_64/alpine-standard-3.22.1-x86_64.iso"
+        ),
+        "96d1b44ea1b8a5a884f193526d92edb4676054e9fa903ad2f016441a0fe13089",
+    )
+
+    def setUp(self):
+        super().setUp()
+
+        init_tap()
+
+        self.outer_ping_proc = None
+
+    def tearDown(self):
+        del_tap()
+
+        if self.outer_ping_proc:
+            self.stop_outer_ping()
+
+        super().tearDown()
+
+    def start_outer_ping(self) -> None:
+        assert self.outer_ping_proc is None
+        self.outer_ping_log = open("/tmp/ping.log", "w")
+        self.outer_ping_proc = subprocess.Popen(
+            ["ping", "-i", "0", "-O", "-D", GUEST_IP],
+            text=True,
+            stdout=self.outer_ping_log,
+        )
+
+    def stop_outer_ping(self) -> str:
+        assert self.outer_ping_proc
+        self.outer_ping_proc.send_signal(signal.SIGINT)
+
+        self.outer_ping_proc.communicate(timeout=5)
+        self.outer_ping_proc = None
+        self.outer_ping_log.close()
+
+        # We need the start, the end and several lines around "no answer"
+        cmd = "cat /tmp/ping.log | grep -A 4 -B 4 'PING\\|packets\\|no ans'"
+        return fetch(cmd)
+
+    def stop_ping_and_check(self, stop_time, resume_time):
+        ping_res = self.stop_outer_ping()
+
+        discon, a, b = parse_ping_output(ping_res)
+
+        if not discon:
+            text = f"STOP: {stop_time}, RESUME: {resume_time}," f"PING: {a} - {b}"
+            if a > stop_time or b < resume_time:
+                self.fail(f"PING failed: {text}")
+            self.log.info(f"PING: no packets lost: {text}")
+            return
+
+        text = (
+            f"STOP: {stop_time}, RESUME: {resume_time}," f"PING: disconnect: {a} - {b}"
+        )
+        self.log.info(text)
+        eps = 0.01
+        if a < stop_time - eps or b > resume_time + eps:
+            self.fail(text)
+
+    def one_ping_from_guest(self, vm) -> None:
+        exec_command_and_wait_for_pattern(
+            self,
+            f"ping -c 1 -W 1 {HOST_IP}",
+            "1 packets transmitted, 1 packets received",
+            "1 packets transmitted, 0 packets received",
+            vm=vm,
+        )
+        self.wait_for_console_pattern("# ", vm=vm)
+
+    def one_ping_from_host(self) -> None:
+        run(f"ping -c 1 -W 1 {GUEST_IP}")
+
+    def setup_shared_memory(self):
+        shm_path = f"/dev/shm/qemu_test_{os.getpid()}"
+
+        try:
+            with open(shm_path, "wb") as f:
+                f.write(b"\0" * (1024 * 1024 * 1024))  # 1GB
+        except Exception as e:
+            self.fail(f"Failed to create shared memory file: {e}")
+
+        return shm_path
+
+    def prepare_and_launch_vm(self, shm_path, vhost, incoming=False, vm=None):
+        if not vm:
+            vm = self.vm
+
+        vm.set_console()
+        vm.add_args("-accel", "kvm")
+        vm.add_args("-device", "pcie-pci-bridge,id=pci.1,bus=pcie.0")
+        vm.add_args("-m", "1G")
+
+        vm.add_args(
+            "-object",
+            f"memory-backend-file,id=ram0,size=1G,mem-path={shm_path},share=on",
+        )
+        vm.add_args("-machine", "memory-backend=ram0")
+
+        vm.add_args(
+            "-drive", f"file={self.ASSET_ALPINE_ISO.fetch()},media=cdrom,format=raw"
+        )
+
+        vm.add_args("-S")
+
+        if incoming:
+            vm.add_args("-incoming", "defer")
+
+        vm_s = "target" if incoming else "source"
+        self.log.info(f"Launching {vm_s} VM")
+        vm.launch()
+
+        self.set_migration_capabilities(vm)
+        self.add_virtio_net(vm, vhost, incoming)
+
+    def add_virtio_net(self, vm, vhost: bool, incoming: bool = False):
+        netdev_params = {
+            "id": "netdev.1",
+            "vhost": vhost,
+            "type": "tap",
+            "ifname": "tap0",
+            "downscript": "no",
+            "queues": 4,
+        }
+
+        if incoming:
+            netdev_params["local-incoming"] = True
+        else:
+            netdev_params["script"] = "no"
+
+        vm.cmd("netdev_add", netdev_params)
+
+        vm.cmd(
+            "device_add",
+            driver="virtio-net-pci",
+            romfile="",
+            id="vnet.1",
+            netdev="netdev.1",
+            mq=True,
+            vectors=18,
+            bus="pci.1",
+            mac=GUEST_MAC,
+            disable_legacy="off",
+        )
+
+    def set_migration_capabilities(self, vm):
+        capabilities = [
+            {"capability": "events", "state": True},
+            {"capability": "x-ignore-shared", "state": True},
+            {"capability": "local-tap", "state": True},
+        ]
+        vm.cmd("migrate-set-capabilities", {"capabilities": capabilities})
+
+    def setup_guest_network(self) -> None:
+        exec_command_and_wait_for_pattern(self, "ip addr", "# ")
+        exec_command_and_wait_for_pattern(
+            self,
+            f"ip addr add {GUEST_IP_MASK} dev eth0 && ip link set eth0 up && echo OK",
+            "OK",
+        )
+        self.wait_for_console_pattern("# ")
+
+    def do_test_tap_fd_migration(self, vhost):
+        self.require_accelerator("kvm")
+        self.set_machine("q35")
+
+        socket_dir = self.socket_dir()
+        migration_socket = os.path.join(socket_dir.name, "migration.sock")
+
+        shm_path = self.setup_shared_memory()
+
+        self.prepare_and_launch_vm(shm_path, vhost)
+        self.vm.cmd("cont")
+        self.wait_for_console_pattern("login:")
+        exec_command_and_wait_for_pattern(self, "root", "# ")
+
+        self.setup_guest_network()
+
+        self.one_ping_from_guest(self.vm)
+        self.one_ping_from_host()
+        self.start_outer_ping()
+
+        # Get some successful pings before migration
+        time.sleep(0.5)
+
+        target_vm = self.get_vm(name="target")
+        self.prepare_and_launch_vm(shm_path, vhost, incoming=True, vm=target_vm)
+
+        target_vm.cmd("migrate-incoming", {"uri": f"unix:{migration_socket}"})
+
+        self.log.info("Starting migration")
+        freeze_start = time.time()
+        self.vm.cmd("migrate", {"uri": f"unix:{migration_socket}"})
+
+        self.log.info("Waiting for migration completion")
+        wait_migration_finish(self.vm, target_vm)
+
+        target_vm.cmd("cont")
+        freeze_end = time.time()
+
+        self.vm.shutdown()
+
+        self.log.info("Verifying PING on target VM after migration")
+        self.one_ping_from_guest(target_vm)
+        self.one_ping_from_host()
+
+        # And a bit more pings after source shutdown
+        time.sleep(0.3)
+        self.stop_ping_and_check(freeze_start, freeze_end)
+
+        target_vm.shutdown()
+
+    def test_tap_fd_migration(self):
+        self.do_test_tap_fd_migration(False)
+
+    def test_tap_fd_migration_vhost(self):
+        self.do_test_tap_fd_migration(True)
+
+
+if __name__ == "__main__":
+    LinuxKernelTest.main()
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/8] net/tap: add some trace points
  2025-09-03 13:36 ` [PATCH v2 1/8] net/tap: add some trace points Vladimir Sementsov-Ogievskiy
@ 2025-09-03 14:11   ` Daniel P. Berrangé
  2025-09-03 14:26     ` Vladimir Sementsov-Ogievskiy
  2025-09-05 12:33     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 2 replies; 26+ messages in thread
From: Daniel P. Berrangé @ 2025-09-03 14:11 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: jasowang, qemu-devel, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd

On Wed, Sep 03, 2025 at 04:36:58PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Add trace points to simplify debugging migration.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
>  net/tap.c        | 58 ++++++++++++++++++++++++++++++++++++++++++++++++
>  net/trace-events |  7 ++++++
>  2 files changed, 65 insertions(+)
> 
> diff --git a/net/tap.c b/net/tap.c
> index 2530627a9a..224fb37310 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -43,6 +43,7 @@
>  #include "qemu/main-loop.h"
>  #include "qemu/sockets.h"
>  #include "hw/virtio/vhost.h"
> +#include "trace.h"
>  
>  #include "net/tap.h"
>  
> @@ -148,6 +149,9 @@ static ssize_t tap_receive_iov(NetClientState *nc, const struct iovec *iov,
>      g_autofree struct iovec *iov_copy = NULL;
>      struct virtio_net_hdr hdr = { };
>  
> +    trace_tap_receive_iov(s->using_vnet_hdr, s->host_vnet_hdr_len, iovcnt,
> +                          iov->iov_len);
> +
>      if (s->host_vnet_hdr_len && !s->using_vnet_hdr) {
>          iov_copy = g_new(struct iovec, iovcnt + 1);
>          iov_copy[0].iov_base = &hdr;
> @@ -183,6 +187,49 @@ static void tap_send_completed(NetClientState *nc, ssize_t len)
>      tap_read_poll(s, true);
>  }
>  
> +static char *tap_dump_packet(const uint8_t *buf, int size)
> +{
> +    int i, j;
> +    char hex_line[80];  /* Enough space for hex pairs + spaces */
> +    char ascii_line[17]; /* 16 + 1 for null terminator */
> +    GString *dump_str = g_string_new(NULL);
> +
> +    g_string_append_printf(dump_str, "Packet dump (%d bytes):\n", size);

It occurrs to me that this ability to dump a buffer is likely useful for
other areas of QEMU. How about moving this to somewhere in util/ and
renaming it to something like "dump_buffer".

You can drop the 'Packet dump' prefix, and bytes count, as you can
include that info in the trace-events format string instead.

> +
> +    for (i = 0; i < size; i += 16) {
> +        memset(hex_line, 0, sizeof(hex_line));
> +        memset(ascii_line, 0, sizeof(ascii_line));
> +
> +        /* Build hex line in groups of 2 bytes (4 hex chars) */
> +        int hex_pos = 0;
> +        for (j = 0; j < 16 && (i + j) < size; j += 2) {
> +            if (i + j + 1 < size) {
> +                /* Two bytes available */
> +                hex_pos += snprintf(hex_line + hex_pos,
> +                                    sizeof(hex_line) - hex_pos,
> +                                    "%02x%02x ", buf[i + j], buf[i + j + 1]);
> +            } else {
> +                /* Only one byte left */
> +                hex_pos += snprintf(hex_line + hex_pos,
> +                                    sizeof(hex_line) - hex_pos,
> +                                    "%02x   ", buf[i + j]);
> +            }
> +        }
> +
> +        /* Build ASCII line */
> +        for (j = 0; j < 16 && (i + j) < size; j++) {
> +            uint8_t byte = buf[i + j];
> +            ascii_line[j] = (byte >= 32 && byte <= 126) ? byte : '.';
> +        }
> +
> +        /* Add the line in tcpdump-like format */
> +        g_string_append_printf(dump_str, "\t0x%04x:  %-40s %s\n",
> +                               i, hex_line, ascii_line);
> +    }
> +
> +    return g_string_free(dump_str, false);
> +}

> diff --git a/net/trace-events b/net/trace-events
> index cda960f42b..b51427f539 100644
> --- a/net/trace-events
> +++ b/net/trace-events
> @@ -29,3 +29,10 @@ vhost_vdpa_set_address_space_id(void *v, unsigned vq_group, unsigned asid_num) "
>  vhost_vdpa_net_load_cmd(void *s, uint8_t class, uint8_t cmd, int data_num, int data_size) "vdpa state: %p class: %u cmd: %u sg_num: %d size: %d"
>  vhost_vdpa_net_load_cmd_retval(void *s, uint8_t class, uint8_t cmd, int r) "vdpa state: %p class: %u cmd: %u retval: %d"
>  vhost_vdpa_net_load_mq(void *s, int ncurqps) "vdpa state: %p current_qpairs: %d"
> +
> +# tap.c
> +tap_receive_iov(bool using_vnet_hdr, uint32_t host_vnet_hdr_len, int iovcnt, size_t iov_len) "using_vnet_hdr:%d host_vnet_hdr_len:%u iovcnt:%d iov_len:%zu"
> +tap_send_packet(bool using_vnet_hdr, uint32_t host_vnet_hdr_len, int size) "using_vnet_hdr:%d host_vnet_hdr_len:%u size:%d"
> +tap_enable(void) "tap enabled"
> +tap_disable(void) "tap disabled"
> +tap_packet_dump(const char *dump_str) "%s"

This could be 

     tap_packet_dump(size_t bytes, const char *dump_str) "Package dump (%zu bytes) %s"

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/8] net/tap: add some trace points
  2025-09-03 14:11   ` Daniel P. Berrangé
@ 2025-09-03 14:26     ` Vladimir Sementsov-Ogievskiy
  2025-09-05 12:33     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-03 14:26 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: jasowang, qemu-devel, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd

On 03.09.25 17:11, Daniel P. Berrangé wrote:
> On Wed, Sep 03, 2025 at 04:36:58PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> Add trace points to simplify debugging migration.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> ---
>>   net/tap.c        | 58 ++++++++++++++++++++++++++++++++++++++++++++++++
>>   net/trace-events |  7 ++++++
>>   2 files changed, 65 insertions(+)
>>
>> diff --git a/net/tap.c b/net/tap.c
>> index 2530627a9a..224fb37310 100644
>> --- a/net/tap.c
>> +++ b/net/tap.c
>> @@ -43,6 +43,7 @@
>>   #include "qemu/main-loop.h"
>>   #include "qemu/sockets.h"
>>   #include "hw/virtio/vhost.h"
>> +#include "trace.h"
>>   
>>   #include "net/tap.h"
>>   
>> @@ -148,6 +149,9 @@ static ssize_t tap_receive_iov(NetClientState *nc, const struct iovec *iov,
>>       g_autofree struct iovec *iov_copy = NULL;
>>       struct virtio_net_hdr hdr = { };
>>   
>> +    trace_tap_receive_iov(s->using_vnet_hdr, s->host_vnet_hdr_len, iovcnt,
>> +                          iov->iov_len);
>> +
>>       if (s->host_vnet_hdr_len && !s->using_vnet_hdr) {
>>           iov_copy = g_new(struct iovec, iovcnt + 1);
>>           iov_copy[0].iov_base = &hdr;
>> @@ -183,6 +187,49 @@ static void tap_send_completed(NetClientState *nc, ssize_t len)
>>       tap_read_poll(s, true);
>>   }
>>   
>> +static char *tap_dump_packet(const uint8_t *buf, int size)
>> +{
>> +    int i, j;
>> +    char hex_line[80];  /* Enough space for hex pairs + spaces */
>> +    char ascii_line[17]; /* 16 + 1 for null terminator */
>> +    GString *dump_str = g_string_new(NULL);
>> +
>> +    g_string_append_printf(dump_str, "Packet dump (%d bytes):\n", size);
> 
> It occurrs to me that this ability to dump a buffer is likely useful for
> other areas of QEMU. How about moving this to somewhere in util/ and
> renaming it to something like "dump_buffer".

Agreed, will do.

> 
> You can drop the 'Packet dump' prefix, and bytes count, as you can
> include that info in the trace-events format string instead.
> 
>> +
>> +    for (i = 0; i < size; i += 16) {
>> +        memset(hex_line, 0, sizeof(hex_line));
>> +        memset(ascii_line, 0, sizeof(ascii_line));
>> +
>> +        /* Build hex line in groups of 2 bytes (4 hex chars) */
>> +        int hex_pos = 0;
>> +        for (j = 0; j < 16 && (i + j) < size; j += 2) {
>> +            if (i + j + 1 < size) {
>> +                /* Two bytes available */
>> +                hex_pos += snprintf(hex_line + hex_pos,
>> +                                    sizeof(hex_line) - hex_pos,
>> +                                    "%02x%02x ", buf[i + j], buf[i + j + 1]);
>> +            } else {
>> +                /* Only one byte left */
>> +                hex_pos += snprintf(hex_line + hex_pos,
>> +                                    sizeof(hex_line) - hex_pos,
>> +                                    "%02x   ", buf[i + j]);
>> +            }
>> +        }
>> +
>> +        /* Build ASCII line */
>> +        for (j = 0; j < 16 && (i + j) < size; j++) {
>> +            uint8_t byte = buf[i + j];
>> +            ascii_line[j] = (byte >= 32 && byte <= 126) ? byte : '.';
>> +        }
>> +
>> +        /* Add the line in tcpdump-like format */
>> +        g_string_append_printf(dump_str, "\t0x%04x:  %-40s %s\n",
>> +                               i, hex_line, ascii_line);
>> +    }
>> +
>> +    return g_string_free(dump_str, false);
>> +}
> 
>> diff --git a/net/trace-events b/net/trace-events
>> index cda960f42b..b51427f539 100644
>> --- a/net/trace-events
>> +++ b/net/trace-events
>> @@ -29,3 +29,10 @@ vhost_vdpa_set_address_space_id(void *v, unsigned vq_group, unsigned asid_num) "
>>   vhost_vdpa_net_load_cmd(void *s, uint8_t class, uint8_t cmd, int data_num, int data_size) "vdpa state: %p class: %u cmd: %u sg_num: %d size: %d"
>>   vhost_vdpa_net_load_cmd_retval(void *s, uint8_t class, uint8_t cmd, int r) "vdpa state: %p class: %u cmd: %u retval: %d"
>>   vhost_vdpa_net_load_mq(void *s, int ncurqps) "vdpa state: %p current_qpairs: %d"
>> +
>> +# tap.c
>> +tap_receive_iov(bool using_vnet_hdr, uint32_t host_vnet_hdr_len, int iovcnt, size_t iov_len) "using_vnet_hdr:%d host_vnet_hdr_len:%u iovcnt:%d iov_len:%zu"
>> +tap_send_packet(bool using_vnet_hdr, uint32_t host_vnet_hdr_len, int size) "using_vnet_hdr:%d host_vnet_hdr_len:%u size:%d"
>> +tap_enable(void) "tap enabled"
>> +tap_disable(void) "tap disabled"
>> +tap_packet_dump(const char *dump_str) "%s"
> 
> This could be
> 
>       tap_packet_dump(size_t bytes, const char *dump_str) "Package dump (%zu bytes) %s"
> 
> With regards,
> Daniel


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/8] net/tap: implement interfaces for local migration
  2025-09-03 13:37 ` [PATCH v2 5/8] net/tap: implement interfaces for local migration Vladimir Sementsov-Ogievskiy
@ 2025-09-03 14:34   ` Daniel P. Berrangé
  2025-09-03 15:31     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrangé @ 2025-09-03 14:34 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: jasowang, qemu-devel, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd

On Wed, Sep 03, 2025 at 04:37:02PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Handle local-incoming option:
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
>  include/net/tap.h |   4 ++
>  net/tap.c         | 136 +++++++++++++++++++++++++++++++++++++++-------
>  2 files changed, 119 insertions(+), 21 deletions(-)
> 
> diff --git a/include/net/tap.h b/include/net/tap.h
> index 6f34f13eae..3ef2e2dbae 100644
> --- a/include/net/tap.h
> +++ b/include/net/tap.h
> @@ -30,7 +30,11 @@
>  
>  int tap_enable(NetClientState *nc);
>  int tap_disable(NetClientState *nc);
> +bool tap_local_incoming(NetClientState *nc);
>  
>  int tap_get_fd(NetClientState *nc);
>  
> +int tap_load(NetClientState *nc, QEMUFile *f);
> +int tap_save(NetClientState *nc, QEMUFile *f);
> +
>  #endif /* QEMU_NET_TAP_H */
> diff --git a/net/tap.c b/net/tap.c
> index a9d955ac5f..499db756ea 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -35,6 +35,8 @@
>  #include "net/eth.h"
>  #include "net/net.h"
>  #include "clients.h"
> +#include "migration/migration.h"
> +#include "migration/qemu-file.h"
>  #include "monitor/monitor.h"
>  #include "system/system.h"
>  #include "qapi/error.h"
> @@ -82,6 +84,7 @@ typedef struct TAPState {
>      VHostNetState *vhost_net;
>      unsigned host_vnet_hdr_len;
>      Notifier exit;
> +    bool local_incoming;
>  } TAPState;
>  
>  static void launch_script(const char *setup_script, const char *ifname,
> @@ -803,6 +806,40 @@ static int net_tap_init_vhost(TAPState *s, Error **errp) {
>      return 0;
>  }
>  
> +int tap_save(NetClientState *nc, QEMUFile *f)
> +{
> +    TAPState *s = DO_UPCAST(TAPState, nc, nc);
> +
> +    qemu_file_put_fd(f, s->fd);
> +    qemu_put_byte(f, s->using_vnet_hdr);
> +    qemu_put_byte(f, s->has_ufo);
> +    qemu_put_byte(f, s->has_uso);
> +    qemu_put_byte(f, s->enabled);
> +    qemu_put_be32(f, s->host_vnet_hdr_len);


Is it neccessary to transfer that metadata, or is there perhaps a way
for the other side to query the TAP FD configuration from the kernel
to detect this ?

I'm concerned that this code / wire format is not extensible if we ever
add another similar field to TAPState in the future.

> +
> +    return 0;
> +}
> +
> +int tap_load(NetClientState *nc, QEMUFile *f)
> +{
> +    TAPState *s = DO_UPCAST(TAPState, nc, nc);
> +
> +    s->fd = qemu_file_get_fd(f);
> +    if (s->fd < 0) {
> +        return -1;
> +    }
> +
> +    s->using_vnet_hdr = qemu_get_byte(f);
> +    s->has_ufo = qemu_get_byte(f);
> +    s->has_uso = qemu_get_byte(f);
> +    s->enabled = qemu_get_byte(f);
> +    qemu_get_be32s(f, &s->host_vnet_hdr_len);
> +
> +    tap_read_poll(s, true);
> +
> +    return net_tap_init_vhost(s, NULL);
> +}
> +
>  static int net_tap_fd_init_common(const Netdev *netdev, NetClientState *peer,
>                                    const char *model, const char *name,
>                                    const char *ifname, const char *script,

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 8/8] tests/functional: add test_x86_64_tap_fd_migration
  2025-09-03 13:37 ` [PATCH v2 8/8] tests/functional: add test_x86_64_tap_fd_migration Vladimir Sementsov-Ogievskiy
@ 2025-09-03 14:47   ` Daniel P. Berrangé
  2025-09-03 15:14     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrangé @ 2025-09-03 14:47 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: jasowang, qemu-devel, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd

On Wed, Sep 03, 2025 at 04:37:05PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Add test for a new feature of local TAP migration with fd passing
> through unix socket.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
>  .../test_x86_64_tap_fd_migration.py           | 347 ++++++++++++++++++
>  1 file changed, 347 insertions(+)
>  create mode 100644 tests/functional/test_x86_64_tap_fd_migration.py
> 
> diff --git a/tests/functional/test_x86_64_tap_fd_migration.py b/tests/functional/test_x86_64_tap_fd_migration.py
> new file mode 100644
> index 0000000000..f6d18fe39f
> --- /dev/null
> +++ b/tests/functional/test_x86_64_tap_fd_migration.py
> @@ -0,0 +1,347 @@
> +#!/usr/bin/env python3
> +#
> +# Functional test that tests TAP local migration
> +# with fd passing
> +#
> +# Copyright (c) Yandex
> +#
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +import os
> +import time
> +import subprocess
> +import signal
> +from typing import Tuple
> +
> +from qemu_test import (
> +    LinuxKernelTest,
> +    Asset,
> +    exec_command_and_wait_for_pattern,
> +)
> +
> +GUEST_IP = "10.0.1.2"
> +GUEST_IP_MASK = f"{GUEST_IP}/24"
> +GUEST_MAC = "d6:0d:75:f8:0f:b7"
> +HOST_IP = "10.0.1.1"
> +HOST_IP_MASK = f"{HOST_IP}/24"
> +TAP_ID = "tap0"
> +TAP_MAC = "e6:1d:44:b5:03:5d"
> +
> +
> +def run(cmd: str, check: bool = True) -> None:
> +    subprocess.run(cmd, check=check, shell=True)

I don't see need to be using shell here - just use
"check_call()" and pass the argv as an array instead
of a string at which point this 'run' helper can be
removed.

> +
> +
> +def fetch(cmd: str, check: bool = True) -> str:
> +    return subprocess.run(
> +        cmd, check=check, shell=True, stdout=subprocess.PIPE, text=True
> +    ).stdout
> +
> +
> +def del_tap() -> None:
> +    run(f"ip tuntap del {TAP_ID} mode tap multi_queue", check=False)
> +
> +
> +def init_tap() -> None:
> +    run(f"ip tuntap add dev {TAP_ID} mode tap multi_queue")
> +    run(f"ip link set dev {TAP_ID} address {TAP_MAC}")
> +    run(f"ip addr add {HOST_IP_MASK} dev {TAP_ID}")
> +    run(f"ip link set {TAP_ID} up")

$ ip tuntap add dev foo mode tap multi_queue
ioctl(TUNSETIFF): Operation not permitted


The functional tests run as the developer's normal unprivileged user
account, so it doesn't look like this can work ?

Were you testing this as root ?

> +
> +
> +def parse_ping_line(line: str) -> float:
> +    # suspect lines like
> +    # [1748524876.590509] 64 bytes from 94.245.155.3 \
> +    #      (94.245.155.3): icmp_seq=1 ttl=250 time=101 ms
> +    spl = line.split()
> +    return float(spl[0][1:-1])
> +
> +
> +def parse_ping_output(out) -> Tuple[bool, float, float]:
> +    lines = [x for x in out.split("\n") if x.startswith("[")]
> +
> +    try:
> +        first_no_ans = next(
> +            (ind for ind in range(len(lines)) if lines[ind][20:26] == "no ans")
> +        )
> +    except StopIteration:
> +        return False, parse_ping_line(lines[0]), parse_ping_line(lines[-1])
> +
> +    last_no_ans = next(
> +        (ind for ind in range(len(lines) - 1, -1, -1) if lines[ind][20:26] == "no ans")
> +    )
> +
> +    return (
> +        True,
> +        parse_ping_line(lines[first_no_ans]),
> +        parse_ping_line(lines[last_no_ans]),
> +    )
> +
> +
> +def wait_migration_finish(source_vm, target_vm):
> +    migr_events = (
> +        ("MIGRATION", {"data": {"status": "completed"}}),
> +        ("MIGRATION", {"data": {"status": "failed"}}),
> +    )
> +
> +    source_e = source_vm.events_wait(migr_events)["data"]
> +    target_e = target_vm.events_wait(migr_events)["data"]
> +
> +    source_s = source_vm.cmd("query-status")["status"]
> +    target_s = target_vm.cmd("query-status")["status"]
> +
> +    assert (
> +        source_e["status"] == "completed"
> +        and target_e["status"] == "completed"
> +        and source_s == "postmigrate"
> +        and target_s == "paused"
> +    ), f"""Migration failed:
> +    SRC status: {source_s}
> +    SRC event: {source_e}
> +    TGT status: {target_s}
> +    TGT event:{target_e}"""
> +
> +
> +class VhostUserBlkFdMigration(LinuxKernelTest):
> +
> +    ASSET_KERNEL = Asset(
> +        (
> +            "https://archives.fedoraproject.org/pub/archive/fedora/linux/releases"
> +            "/31/Server/x86_64/os/images/pxeboot/vmlinuz"
> +        ),
> +        "d4738d03dbbe083ca610d0821d0a8f1488bebbdccef54ce33e3adb35fda00129",
> +    )
> +
> +    ASSET_INITRD = Asset(
> +        (
> +            "https://archives.fedoraproject.org/pub/archive/fedora/linux/releases"
> +            "/31/Server/x86_64/os/images/pxeboot/initrd.img"
> +        ),
> +        "277cd6c7adf77c7e63d73bbb2cded8ef9e2d3a2f100000e92ff1f8396513cd8b",
> +    )
> +
> +    ASSET_ALPINE_ISO = Asset(
> +        (
> +            "https://dl-cdn.alpinelinux.org/"
> +            "alpine/v3.22/releases/x86_64/alpine-standard-3.22.1-x86_64.iso"
> +        ),
> +        "96d1b44ea1b8a5a884f193526d92edb4676054e9fa903ad2f016441a0fe13089",
> +    )
> +
> +    def setUp(self):
> +        super().setUp()
> +
> +        init_tap()
> +
> +        self.outer_ping_proc = None
> +
> +    def tearDown(self):
> +        del_tap()
> +
> +        if self.outer_ping_proc:
> +            self.stop_outer_ping()
> +
> +        super().tearDown()
> +
> +    def start_outer_ping(self) -> None:
> +        assert self.outer_ping_proc is None
> +        self.outer_ping_log = open("/tmp/ping.log", "w")
> +        self.outer_ping_proc = subprocess.Popen(
> +            ["ping", "-i", "0", "-O", "-D", GUEST_IP],
> +            text=True,
> +            stdout=self.outer_ping_log,
> +        )

Surely outer_ping_log can be closed immediately as the child
process will keep the FD open ?

> +
> +    def stop_outer_ping(self) -> str:
> +        assert self.outer_ping_proc
> +        self.outer_ping_proc.send_signal(signal.SIGINT)
> +
> +        self.outer_ping_proc.communicate(timeout=5)
> +        self.outer_ping_proc = None
> +        self.outer_ping_log.close()
> +
> +        # We need the start, the end and several lines around "no answer"
> +        cmd = "cat /tmp/ping.log | grep -A 4 -B 4 'PING\\|packets\\|no ans'"
> +        return fetch(cmd)

IMHO this can just read the whole of /tmp/ping.log directly into memory
and then the parse_ping_output can jjust match on it with a regex,
avoiding the pre-processing with grep.


> +
> +    def stop_ping_and_check(self, stop_time, resume_time):
> +        ping_res = self.stop_outer_ping()
> +
> +        discon, a, b = parse_ping_output(ping_res)
> +
> +        if not discon:
> +            text = f"STOP: {stop_time}, RESUME: {resume_time}," f"PING: {a} - {b}"
> +            if a > stop_time or b < resume_time:
> +                self.fail(f"PING failed: {text}")
> +            self.log.info(f"PING: no packets lost: {text}")
> +            return
> +
> +        text = (
> +            f"STOP: {stop_time}, RESUME: {resume_time}," f"PING: disconnect: {a} - {b}"
> +        )
> +        self.log.info(text)
> +        eps = 0.01
> +        if a < stop_time - eps or b > resume_time + eps:
> +            self.fail(text)
> +
> +    def one_ping_from_guest(self, vm) -> None:
> +        exec_command_and_wait_for_pattern(
> +            self,
> +            f"ping -c 1 -W 1 {HOST_IP}",
> +            "1 packets transmitted, 1 packets received",
> +            "1 packets transmitted, 0 packets received",
> +            vm=vm,
> +        )
> +        self.wait_for_console_pattern("# ", vm=vm)
> +
> +    def one_ping_from_host(self) -> None:
> +        run(f"ping -c 1 -W 1 {GUEST_IP}")
> +
> +    def setup_shared_memory(self):
> +        shm_path = f"/dev/shm/qemu_test_{os.getpid()}"
> +
> +        try:
> +            with open(shm_path, "wb") as f:
> +                f.write(b"\0" * (1024 * 1024 * 1024))  # 1GB
> +        except Exception as e:
> +            self.fail(f"Failed to create shared memory file: {e}")
> +
> +        return shm_path
> +
> +    def prepare_and_launch_vm(self, shm_path, vhost, incoming=False, vm=None):
> +        if not vm:
> +            vm = self.vm
> +
> +        vm.set_console()
> +        vm.add_args("-accel", "kvm")
> +        vm.add_args("-device", "pcie-pci-bridge,id=pci.1,bus=pcie.0")
> +        vm.add_args("-m", "1G")
> +
> +        vm.add_args(
> +            "-object",
> +            f"memory-backend-file,id=ram0,size=1G,mem-path={shm_path},share=on",
> +        )
> +        vm.add_args("-machine", "memory-backend=ram0")
> +
> +        vm.add_args(
> +            "-drive", f"file={self.ASSET_ALPINE_ISO.fetch()},media=cdrom,format=raw"
> +        )
> +
> +        vm.add_args("-S")
> +
> +        if incoming:
> +            vm.add_args("-incoming", "defer")
> +
> +        vm_s = "target" if incoming else "source"
> +        self.log.info(f"Launching {vm_s} VM")
> +        vm.launch()
> +
> +        self.set_migration_capabilities(vm)
> +        self.add_virtio_net(vm, vhost, incoming)
> +
> +    def add_virtio_net(self, vm, vhost: bool, incoming: bool = False):
> +        netdev_params = {
> +            "id": "netdev.1",
> +            "vhost": vhost,
> +            "type": "tap",
> +            "ifname": "tap0",
> +            "downscript": "no",
> +            "queues": 4,
> +        }
> +
> +        if incoming:
> +            netdev_params["local-incoming"] = True
> +        else:
> +            netdev_params["script"] = "no"
> +
> +        vm.cmd("netdev_add", netdev_params)
> +
> +        vm.cmd(
> +            "device_add",
> +            driver="virtio-net-pci",
> +            romfile="",
> +            id="vnet.1",
> +            netdev="netdev.1",
> +            mq=True,
> +            vectors=18,
> +            bus="pci.1",
> +            mac=GUEST_MAC,
> +            disable_legacy="off",
> +        )
> +
> +    def set_migration_capabilities(self, vm):
> +        capabilities = [
> +            {"capability": "events", "state": True},
> +            {"capability": "x-ignore-shared", "state": True},
> +            {"capability": "local-tap", "state": True},
> +        ]
> +        vm.cmd("migrate-set-capabilities", {"capabilities": capabilities})
> +
> +    def setup_guest_network(self) -> None:
> +        exec_command_and_wait_for_pattern(self, "ip addr", "# ")
> +        exec_command_and_wait_for_pattern(
> +            self,
> +            f"ip addr add {GUEST_IP_MASK} dev eth0 && ip link set eth0 up && echo OK",
> +            "OK",
> +        )
> +        self.wait_for_console_pattern("# ")
> +
> +    def do_test_tap_fd_migration(self, vhost):
> +        self.require_accelerator("kvm")
> +        self.set_machine("q35")
> +
> +        socket_dir = self.socket_dir()
> +        migration_socket = os.path.join(socket_dir.name, "migration.sock")
> +
> +        shm_path = self.setup_shared_memory()
> +
> +        self.prepare_and_launch_vm(shm_path, vhost)
> +        self.vm.cmd("cont")
> +        self.wait_for_console_pattern("login:")
> +        exec_command_and_wait_for_pattern(self, "root", "# ")
> +
> +        self.setup_guest_network()
> +
> +        self.one_ping_from_guest(self.vm)
> +        self.one_ping_from_host()
> +        self.start_outer_ping()
> +
> +        # Get some successful pings before migration
> +        time.sleep(0.5)
> +
> +        target_vm = self.get_vm(name="target")
> +        self.prepare_and_launch_vm(shm_path, vhost, incoming=True, vm=target_vm)
> +
> +        target_vm.cmd("migrate-incoming", {"uri": f"unix:{migration_socket}"})
> +
> +        self.log.info("Starting migration")
> +        freeze_start = time.time()
> +        self.vm.cmd("migrate", {"uri": f"unix:{migration_socket}"})
> +
> +        self.log.info("Waiting for migration completion")
> +        wait_migration_finish(self.vm, target_vm)
> +
> +        target_vm.cmd("cont")
> +        freeze_end = time.time()
> +
> +        self.vm.shutdown()
> +
> +        self.log.info("Verifying PING on target VM after migration")
> +        self.one_ping_from_guest(target_vm)
> +        self.one_ping_from_host()
> +
> +        # And a bit more pings after source shutdown
> +        time.sleep(0.3)
> +        self.stop_ping_and_check(freeze_start, freeze_end)
> +
> +        target_vm.shutdown()
> +
> +    def test_tap_fd_migration(self):
> +        self.do_test_tap_fd_migration(False)
> +
> +    def test_tap_fd_migration_vhost(self):
> +        self.do_test_tap_fd_migration(True)
> +
> +
> +if __name__ == "__main__":
> +    LinuxKernelTest.main()
> -- 
> 2.48.1
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 8/8] tests/functional: add test_x86_64_tap_fd_migration
  2025-09-03 14:47   ` Daniel P. Berrangé
@ 2025-09-03 15:14     ` Vladimir Sementsov-Ogievskiy
  2025-09-03 15:19       ` Daniel P. Berrangé
  0 siblings, 1 reply; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-03 15:14 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: jasowang, qemu-devel, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd

On 03.09.25 17:47, Daniel P. Berrangé wrote:
> On Wed, Sep 03, 2025 at 04:37:05PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> Add test for a new feature of local TAP migration with fd passing
>> through unix socket.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> ---
>>   .../test_x86_64_tap_fd_migration.py           | 347 ++++++++++++++++++
>>   1 file changed, 347 insertions(+)
>>   create mode 100644 tests/functional/test_x86_64_tap_fd_migration.py
>>
>> diff --git a/tests/functional/test_x86_64_tap_fd_migration.py b/tests/functional/test_x86_64_tap_fd_migration.py
>> new file mode 100644
>> index 0000000000..f6d18fe39f
>> --- /dev/null
>> +++ b/tests/functional/test_x86_64_tap_fd_migration.py
>> @@ -0,0 +1,347 @@
>> +#!/usr/bin/env python3
>> +#
>> +# Functional test that tests TAP local migration
>> +# with fd passing
>> +#
>> +# Copyright (c) Yandex
>> +#
>> +# SPDX-License-Identifier: GPL-2.0-or-later
>> +
>> +import os
>> +import time
>> +import subprocess
>> +import signal
>> +from typing import Tuple
>> +
>> +from qemu_test import (
>> +    LinuxKernelTest,
>> +    Asset,
>> +    exec_command_and_wait_for_pattern,
>> +)
>> +
>> +GUEST_IP = "10.0.1.2"
>> +GUEST_IP_MASK = f"{GUEST_IP}/24"
>> +GUEST_MAC = "d6:0d:75:f8:0f:b7"
>> +HOST_IP = "10.0.1.1"
>> +HOST_IP_MASK = f"{HOST_IP}/24"
>> +TAP_ID = "tap0"
>> +TAP_MAC = "e6:1d:44:b5:03:5d"
>> +
>> +
>> +def run(cmd: str, check: bool = True) -> None:
>> +    subprocess.run(cmd, check=check, shell=True)
> 
> I don't see need to be using shell here - just use
> "check_call()" and pass the argv as an array instead
> of a string at which point this 'run' helper can be
> removed.

I just like how commands look in one string more than array. But array is probably better choice for production. Not problem to change, if we I understand what to do with access denied)

> 
>> +
>> +
>> +def fetch(cmd: str, check: bool = True) -> str:
>> +    return subprocess.run(
>> +        cmd, check=check, shell=True, stdout=subprocess.PIPE, text=True
>> +    ).stdout
>> +
>> +
>> +def del_tap() -> None:
>> +    run(f"ip tuntap del {TAP_ID} mode tap multi_queue", check=False)
>> +
>> +
>> +def init_tap() -> None:
>> +    run(f"ip tuntap add dev {TAP_ID} mode tap multi_queue")
>> +    run(f"ip link set dev {TAP_ID} address {TAP_MAC}")
>> +    run(f"ip addr add {HOST_IP_MASK} dev {TAP_ID}")
>> +    run(f"ip link set {TAP_ID} up")
> 
> $ ip tuntap add dev foo mode tap multi_queue
> ioctl(TUNSETIFF): Operation not permitted
> 
> 
> The functional tests run as the developer's normal unprivileged user
> account, so it doesn't look like this can work ?
> 
> Were you testing this as root ?

Yes I run the test with sudo..

Do we have any tests in QEMU that requires root? Or anyway to do this for functional tests? I'm afraid there is no way to setup TAP interface in unprivileged user account :(

> 
>> +
>> +
>> +def parse_ping_line(line: str) -> float:
>> +    # suspect lines like
>> +    # [1748524876.590509] 64 bytes from 94.245.155.3 \
>> +    #      (94.245.155.3): icmp_seq=1 ttl=250 time=101 ms
>> +    spl = line.split()
>> +    return float(spl[0][1:-1])
>> +
>> +
>> +def parse_ping_output(out) -> Tuple[bool, float, float]:
>> +    lines = [x for x in out.split("\n") if x.startswith("[")]
>> +
>> +    try:
>> +        first_no_ans = next(
>> +            (ind for ind in range(len(lines)) if lines[ind][20:26] == "no ans")
>> +        )
>> +    except StopIteration:
>> +        return False, parse_ping_line(lines[0]), parse_ping_line(lines[-1])
>> +
>> +    last_no_ans = next(
>> +        (ind for ind in range(len(lines) - 1, -1, -1) if lines[ind][20:26] == "no ans")
>> +    )
>> +
>> +    return (
>> +        True,
>> +        parse_ping_line(lines[first_no_ans]),
>> +        parse_ping_line(lines[last_no_ans]),
>> +    )
>> +
>> +
>> +def wait_migration_finish(source_vm, target_vm):
>> +    migr_events = (
>> +        ("MIGRATION", {"data": {"status": "completed"}}),
>> +        ("MIGRATION", {"data": {"status": "failed"}}),
>> +    )
>> +
>> +    source_e = source_vm.events_wait(migr_events)["data"]
>> +    target_e = target_vm.events_wait(migr_events)["data"]
>> +
>> +    source_s = source_vm.cmd("query-status")["status"]
>> +    target_s = target_vm.cmd("query-status")["status"]
>> +
>> +    assert (
>> +        source_e["status"] == "completed"
>> +        and target_e["status"] == "completed"
>> +        and source_s == "postmigrate"
>> +        and target_s == "paused"
>> +    ), f"""Migration failed:
>> +    SRC status: {source_s}
>> +    SRC event: {source_e}
>> +    TGT status: {target_s}
>> +    TGT event:{target_e}"""
>> +
>> +
>> +class VhostUserBlkFdMigration(LinuxKernelTest):
>> +
>> +    ASSET_KERNEL = Asset(
>> +        (
>> +            "https://archives.fedoraproject.org/pub/archive/fedora/linux/releases"
>> +            "/31/Server/x86_64/os/images/pxeboot/vmlinuz"
>> +        ),
>> +        "d4738d03dbbe083ca610d0821d0a8f1488bebbdccef54ce33e3adb35fda00129",
>> +    )
>> +
>> +    ASSET_INITRD = Asset(
>> +        (
>> +            "https://archives.fedoraproject.org/pub/archive/fedora/linux/releases"
>> +            "/31/Server/x86_64/os/images/pxeboot/initrd.img"
>> +        ),
>> +        "277cd6c7adf77c7e63d73bbb2cded8ef9e2d3a2f100000e92ff1f8396513cd8b",
>> +    )
>> +
>> +    ASSET_ALPINE_ISO = Asset(
>> +        (
>> +            "https://dl-cdn.alpinelinux.org/"
>> +            "alpine/v3.22/releases/x86_64/alpine-standard-3.22.1-x86_64.iso"
>> +        ),
>> +        "96d1b44ea1b8a5a884f193526d92edb4676054e9fa903ad2f016441a0fe13089",
>> +    )
>> +
>> +    def setUp(self):
>> +        super().setUp()
>> +
>> +        init_tap()
>> +
>> +        self.outer_ping_proc = None
>> +
>> +    def tearDown(self):
>> +        del_tap()
>> +
>> +        if self.outer_ping_proc:
>> +            self.stop_outer_ping()
>> +
>> +        super().tearDown()
>> +
>> +    def start_outer_ping(self) -> None:
>> +        assert self.outer_ping_proc is None
>> +        self.outer_ping_log = open("/tmp/ping.log", "w")
>> +        self.outer_ping_proc = subprocess.Popen(
>> +            ["ping", "-i", "0", "-O", "-D", GUEST_IP],
>> +            text=True,
>> +            stdout=self.outer_ping_log,
>> +        )
> 
> Surely outer_ping_log can be closed immediately as the child
> process will keep the FD open ?

Will try.

> 
>> +
>> +    def stop_outer_ping(self) -> str:
>> +        assert self.outer_ping_proc
>> +        self.outer_ping_proc.send_signal(signal.SIGINT)
>> +
>> +        self.outer_ping_proc.communicate(timeout=5)
>> +        self.outer_ping_proc = None
>> +        self.outer_ping_log.close()
>> +
>> +        # We need the start, the end and several lines around "no answer"
>> +        cmd = "cat /tmp/ping.log | grep -A 4 -B 4 'PING\\|packets\\|no ans'"
>> +        return fetch(cmd)
> 
> IMHO this can just read the whole of /tmp/ping.log directly into memory
> and then the parse_ping_output can jjust match on it with a regex,
> avoiding the pre-processing with grep.
> 

Agree, that would be better. Will do.

> 
>> +
>> +    def stop_ping_and_check(self, stop_time, resume_time):
>> +        ping_res = self.stop_outer_ping()
>> +
>> +        discon, a, b = parse_ping_output(ping_res)
>> +
>> +        if not discon:
>> +            text = f"STOP: {stop_time}, RESUME: {resume_time}," f"PING: {a} - {b}"
>> +            if a > stop_time or b < resume_time:
>> +                self.fail(f"PING failed: {text}")
>> +            self.log.info(f"PING: no packets lost: {text}")
>> +            return
>> +
>> +        text = (
>> +            f"STOP: {stop_time}, RESUME: {resume_time}," f"PING: disconnect: {a} - {b}"
>> +        )
>> +        self.log.info(text)
>> +        eps = 0.01
>> +        if a < stop_time - eps or b > resume_time + eps:
>> +            self.fail(text)
>> +
>> +    def one_ping_from_guest(self, vm) -> None:
>> +        exec_command_and_wait_for_pattern(
>> +            self,
>> +            f"ping -c 1 -W 1 {HOST_IP}",
>> +            "1 packets transmitted, 1 packets received",
>> +            "1 packets transmitted, 0 packets received",
>> +            vm=vm,
>> +        )
>> +        self.wait_for_console_pattern("# ", vm=vm)
>> +
>> +    def one_ping_from_host(self) -> None:
>> +        run(f"ping -c 1 -W 1 {GUEST_IP}")
>> +
>> +    def setup_shared_memory(self):
>> +        shm_path = f"/dev/shm/qemu_test_{os.getpid()}"
>> +
>> +        try:
>> +            with open(shm_path, "wb") as f:
>> +                f.write(b"\0" * (1024 * 1024 * 1024))  # 1GB
>> +        except Exception as e:
>> +            self.fail(f"Failed to create shared memory file: {e}")
>> +
>> +        return shm_path
>> +
>> +    def prepare_and_launch_vm(self, shm_path, vhost, incoming=False, vm=None):
>> +        if not vm:
>> +            vm = self.vm
>> +
>> +        vm.set_console()
>> +        vm.add_args("-accel", "kvm")
>> +        vm.add_args("-device", "pcie-pci-bridge,id=pci.1,bus=pcie.0")
>> +        vm.add_args("-m", "1G")
>> +
>> +        vm.add_args(
>> +            "-object",
>> +            f"memory-backend-file,id=ram0,size=1G,mem-path={shm_path},share=on",
>> +        )
>> +        vm.add_args("-machine", "memory-backend=ram0")
>> +
>> +        vm.add_args(
>> +            "-drive", f"file={self.ASSET_ALPINE_ISO.fetch()},media=cdrom,format=raw"
>> +        )
>> +
>> +        vm.add_args("-S")
>> +
>> +        if incoming:
>> +            vm.add_args("-incoming", "defer")
>> +
>> +        vm_s = "target" if incoming else "source"
>> +        self.log.info(f"Launching {vm_s} VM")
>> +        vm.launch()
>> +
>> +        self.set_migration_capabilities(vm)
>> +        self.add_virtio_net(vm, vhost, incoming)
>> +
>> +    def add_virtio_net(self, vm, vhost: bool, incoming: bool = False):
>> +        netdev_params = {
>> +            "id": "netdev.1",
>> +            "vhost": vhost,
>> +            "type": "tap",
>> +            "ifname": "tap0",
>> +            "downscript": "no",
>> +            "queues": 4,
>> +        }
>> +
>> +        if incoming:
>> +            netdev_params["local-incoming"] = True
>> +        else:
>> +            netdev_params["script"] = "no"
>> +
>> +        vm.cmd("netdev_add", netdev_params)
>> +
>> +        vm.cmd(
>> +            "device_add",
>> +            driver="virtio-net-pci",
>> +            romfile="",
>> +            id="vnet.1",
>> +            netdev="netdev.1",
>> +            mq=True,
>> +            vectors=18,
>> +            bus="pci.1",
>> +            mac=GUEST_MAC,
>> +            disable_legacy="off",
>> +        )
>> +
>> +    def set_migration_capabilities(self, vm):
>> +        capabilities = [
>> +            {"capability": "events", "state": True},
>> +            {"capability": "x-ignore-shared", "state": True},
>> +            {"capability": "local-tap", "state": True},
>> +        ]
>> +        vm.cmd("migrate-set-capabilities", {"capabilities": capabilities})
>> +
>> +    def setup_guest_network(self) -> None:
>> +        exec_command_and_wait_for_pattern(self, "ip addr", "# ")
>> +        exec_command_and_wait_for_pattern(
>> +            self,
>> +            f"ip addr add {GUEST_IP_MASK} dev eth0 && ip link set eth0 up && echo OK",
>> +            "OK",
>> +        )
>> +        self.wait_for_console_pattern("# ")
>> +
>> +    def do_test_tap_fd_migration(self, vhost):
>> +        self.require_accelerator("kvm")
>> +        self.set_machine("q35")
>> +
>> +        socket_dir = self.socket_dir()
>> +        migration_socket = os.path.join(socket_dir.name, "migration.sock")
>> +
>> +        shm_path = self.setup_shared_memory()
>> +
>> +        self.prepare_and_launch_vm(shm_path, vhost)
>> +        self.vm.cmd("cont")
>> +        self.wait_for_console_pattern("login:")
>> +        exec_command_and_wait_for_pattern(self, "root", "# ")
>> +
>> +        self.setup_guest_network()
>> +
>> +        self.one_ping_from_guest(self.vm)
>> +        self.one_ping_from_host()
>> +        self.start_outer_ping()
>> +
>> +        # Get some successful pings before migration
>> +        time.sleep(0.5)
>> +
>> +        target_vm = self.get_vm(name="target")
>> +        self.prepare_and_launch_vm(shm_path, vhost, incoming=True, vm=target_vm)
>> +
>> +        target_vm.cmd("migrate-incoming", {"uri": f"unix:{migration_socket}"})
>> +
>> +        self.log.info("Starting migration")
>> +        freeze_start = time.time()
>> +        self.vm.cmd("migrate", {"uri": f"unix:{migration_socket}"})
>> +
>> +        self.log.info("Waiting for migration completion")
>> +        wait_migration_finish(self.vm, target_vm)
>> +
>> +        target_vm.cmd("cont")
>> +        freeze_end = time.time()
>> +
>> +        self.vm.shutdown()
>> +
>> +        self.log.info("Verifying PING on target VM after migration")
>> +        self.one_ping_from_guest(target_vm)
>> +        self.one_ping_from_host()
>> +
>> +        # And a bit more pings after source shutdown
>> +        time.sleep(0.3)
>> +        self.stop_ping_and_check(freeze_start, freeze_end)
>> +
>> +        target_vm.shutdown()
>> +
>> +    def test_tap_fd_migration(self):
>> +        self.do_test_tap_fd_migration(False)
>> +
>> +    def test_tap_fd_migration_vhost(self):
>> +        self.do_test_tap_fd_migration(True)
>> +
>> +
>> +if __name__ == "__main__":
>> +    LinuxKernelTest.main()
>> -- 
>> 2.48.1
>>
> 
> With regards,
> Daniel


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 8/8] tests/functional: add test_x86_64_tap_fd_migration
  2025-09-03 15:14     ` Vladimir Sementsov-Ogievskiy
@ 2025-09-03 15:19       ` Daniel P. Berrangé
  2025-09-03 15:24         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 26+ messages in thread
From: Daniel P. Berrangé @ 2025-09-03 15:19 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: jasowang, qemu-devel, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd

On Wed, Sep 03, 2025 at 06:14:36PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 03.09.25 17:47, Daniel P. Berrangé wrote:
> > On Wed, Sep 03, 2025 at 04:37:05PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > Add test for a new feature of local TAP migration with fd passing
> > > through unix socket.
> > > 
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> > > ---
> > >   .../test_x86_64_tap_fd_migration.py           | 347 ++++++++++++++++++
> > >   1 file changed, 347 insertions(+)
> > >   create mode 100644 tests/functional/test_x86_64_tap_fd_migration.py
> > > 
> > > diff --git a/tests/functional/test_x86_64_tap_fd_migration.py b/tests/functional/test_x86_64_tap_fd_migration.py
> > > new file mode 100644
> > > index 0000000000..f6d18fe39f
> > > --- /dev/null
> > > +++ b/tests/functional/test_x86_64_tap_fd_migration.py


> > > +def init_tap() -> None:
> > > +    run(f"ip tuntap add dev {TAP_ID} mode tap multi_queue")
> > > +    run(f"ip link set dev {TAP_ID} address {TAP_MAC}")
> > > +    run(f"ip addr add {HOST_IP_MASK} dev {TAP_ID}")
> > > +    run(f"ip link set {TAP_ID} up")
> > 
> > $ ip tuntap add dev foo mode tap multi_queue
> > ioctl(TUNSETIFF): Operation not permitted
> > 
> > 
> > The functional tests run as the developer's normal unprivileged user
> > account, so it doesn't look like this can work ?
> > 
> > Were you testing this as root ?
> 
> Yes I run the test with sudo..
> 
> Do we have any tests in QEMU that requires root? Or anyway to do this for functional tests? I'm afraid there is no way to setup TAP interface in unprivileged user account :(

There are a variety of iotests that include calls to sudo.

These automatically mark themselves as skipped if
password-less sudo fails to work. So in practice the tests
will rarely get run, but its better than nothing I guess.

We should add a skipUnlesPasswordlessSudo() method to the
tests/functional/qemu_test/decorators.py class, and you
then then annotate this test with that. Then just add sudo
to the "ip" calls.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 8/8] tests/functional: add test_x86_64_tap_fd_migration
  2025-09-03 15:19       ` Daniel P. Berrangé
@ 2025-09-03 15:24         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-03 15:24 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: jasowang, qemu-devel, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd

On 03.09.25 18:19, Daniel P. Berrangé wrote:
> On Wed, Sep 03, 2025 at 06:14:36PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 03.09.25 17:47, Daniel P. Berrangé wrote:
>>> On Wed, Sep 03, 2025 at 04:37:05PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> Add test for a new feature of local TAP migration with fd passing
>>>> through unix socket.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>> ---
>>>>    .../test_x86_64_tap_fd_migration.py           | 347 ++++++++++++++++++
>>>>    1 file changed, 347 insertions(+)
>>>>    create mode 100644 tests/functional/test_x86_64_tap_fd_migration.py
>>>>
>>>> diff --git a/tests/functional/test_x86_64_tap_fd_migration.py b/tests/functional/test_x86_64_tap_fd_migration.py
>>>> new file mode 100644
>>>> index 0000000000..f6d18fe39f
>>>> --- /dev/null
>>>> +++ b/tests/functional/test_x86_64_tap_fd_migration.py
> 
> 
>>>> +def init_tap() -> None:
>>>> +    run(f"ip tuntap add dev {TAP_ID} mode tap multi_queue")
>>>> +    run(f"ip link set dev {TAP_ID} address {TAP_MAC}")
>>>> +    run(f"ip addr add {HOST_IP_MASK} dev {TAP_ID}")
>>>> +    run(f"ip link set {TAP_ID} up")
>>>
>>> $ ip tuntap add dev foo mode tap multi_queue
>>> ioctl(TUNSETIFF): Operation not permitted
>>>
>>>
>>> The functional tests run as the developer's normal unprivileged user
>>> account, so it doesn't look like this can work ?
>>>
>>> Were you testing this as root ?
>>
>> Yes I run the test with sudo..
>>
>> Do we have any tests in QEMU that requires root? Or anyway to do this for functional tests? I'm afraid there is no way to setup TAP interface in unprivileged user account :(
> 
> There are a variety of iotests that include calls to sudo.
> 
> These automatically mark themselves as skipped if
> password-less sudo fails to work. So in practice the tests
> will rarely get run, but its better than nothing I guess.
> 
> We should add a skipUnlesPasswordlessSudo() method to the
> tests/functional/qemu_test/decorators.py class, and you
> then then annotate this test with that. Then just add sudo
> to the "ip" calls.
> 

Good, that works for me.

Thanks!

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/8] net/tap: implement interfaces for local migration
  2025-09-03 14:34   ` Daniel P. Berrangé
@ 2025-09-03 15:31     ` Vladimir Sementsov-Ogievskiy
  2025-09-03 16:09       ` Steven Sistare
  0 siblings, 1 reply; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-03 15:31 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: jasowang, qemu-devel, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd

On 03.09.25 17:34, Daniel P. Berrangé wrote:
> On Wed, Sep 03, 2025 at 04:37:02PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> Handle local-incoming option:
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> ---
>>   include/net/tap.h |   4 ++
>>   net/tap.c         | 136 +++++++++++++++++++++++++++++++++++++++-------
>>   2 files changed, 119 insertions(+), 21 deletions(-)
>>
>> diff --git a/include/net/tap.h b/include/net/tap.h
>> index 6f34f13eae..3ef2e2dbae 100644
>> --- a/include/net/tap.h
>> +++ b/include/net/tap.h
>> @@ -30,7 +30,11 @@
>>   
>>   int tap_enable(NetClientState *nc);
>>   int tap_disable(NetClientState *nc);
>> +bool tap_local_incoming(NetClientState *nc);
>>   
>>   int tap_get_fd(NetClientState *nc);
>>   
>> +int tap_load(NetClientState *nc, QEMUFile *f);
>> +int tap_save(NetClientState *nc, QEMUFile *f);
>> +
>>   #endif /* QEMU_NET_TAP_H */
>> diff --git a/net/tap.c b/net/tap.c
>> index a9d955ac5f..499db756ea 100644
>> --- a/net/tap.c
>> +++ b/net/tap.c
>> @@ -35,6 +35,8 @@
>>   #include "net/eth.h"
>>   #include "net/net.h"
>>   #include "clients.h"
>> +#include "migration/migration.h"
>> +#include "migration/qemu-file.h"
>>   #include "monitor/monitor.h"
>>   #include "system/system.h"
>>   #include "qapi/error.h"
>> @@ -82,6 +84,7 @@ typedef struct TAPState {
>>       VHostNetState *vhost_net;
>>       unsigned host_vnet_hdr_len;
>>       Notifier exit;
>> +    bool local_incoming;
>>   } TAPState;
>>   
>>   static void launch_script(const char *setup_script, const char *ifname,
>> @@ -803,6 +806,40 @@ static int net_tap_init_vhost(TAPState *s, Error **errp) {
>>       return 0;
>>   }
>>   
>> +int tap_save(NetClientState *nc, QEMUFile *f)
>> +{
>> +    TAPState *s = DO_UPCAST(TAPState, nc, nc);
>> +
>> +    qemu_file_put_fd(f, s->fd);
>> +    qemu_put_byte(f, s->using_vnet_hdr);
>> +    qemu_put_byte(f, s->has_ufo);
>> +    qemu_put_byte(f, s->has_uso);
>> +    qemu_put_byte(f, s->enabled);
>> +    qemu_put_be32(f, s->host_vnet_hdr_len);
> 
> 
> Is it neccessary to transfer that metadata, or is there perhaps a way
> for the other side to query the TAP FD configuration from the kernel
> to detect this ?

Oh, good question, thanks for it. I just added everything and then I was debugging other places.

for hdr_len we have TUNGETVNETHDRSZ, so it's possible.

using_vnet_hdr, seems is equal to initial vnet_hdr option (with default to 1 if not specified), will doublecheck

for ufo/uso, which are set through TUNSETOFFLOAD, we don't have direct way to
get the state. But we can use the fact, that qemu tries to set them once,
and these variables are unchanged after initialization. So we can try set
same flags on target the same way, to understand what we have. Still,
this doesn't seem absolutely safe.. Kernel may behave differently than
for previous initialization, probably due to some changed settings.

for enabled it seems not possible, but we handle it in virtio layer.. Oops,
probably I always migrate enabled=false with this code, will check.

---

On the other hand, calling extra ioctls to learn something lead to extra downtime
(should be measured to be a good argument).

Also, just architecturally: seems better not ask third agent about metadata that we already know.

---

About forward-compatibility (if we add new fields here) - agree.

Maybe turn several boolean fields into one flags field. This way we'll get several "reserved" bits for future changes.

> 
> I'm concerned that this code / wire format is not extensible if we ever
> add another similar field to TAPState in the future.
> 
>> +
>> +    return 0;
>> +}
>> +
>> +int tap_load(NetClientState *nc, QEMUFile *f)
>> +{
>> +    TAPState *s = DO_UPCAST(TAPState, nc, nc);
>> +
>> +    s->fd = qemu_file_get_fd(f);
>> +    if (s->fd < 0) {
>> +        return -1;
>> +    }
>> +
>> +    s->using_vnet_hdr = qemu_get_byte(f);
>> +    s->has_ufo = qemu_get_byte(f);
>> +    s->has_uso = qemu_get_byte(f);
>> +    s->enabled = qemu_get_byte(f);
>> +    qemu_get_be32s(f, &s->host_vnet_hdr_len);
>> +
>> +    tap_read_poll(s, true);
>> +
>> +    return net_tap_init_vhost(s, NULL);
>> +}
>> +
>>   static int net_tap_fd_init_common(const Netdev *netdev, NetClientState *peer,
>>                                     const char *model, const char *name,
>>                                     const char *ifname, const char *script,
> 
> With regards,
> Daniel


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/8] net/tap: implement interfaces for local migration
  2025-09-03 15:31     ` Vladimir Sementsov-Ogievskiy
@ 2025-09-03 16:09       ` Steven Sistare
  2025-09-04  7:41         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 26+ messages in thread
From: Steven Sistare @ 2025-09-03 16:09 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Daniel P. Berrangé
  Cc: jasowang, qemu-devel, leiyang, yc-core, peterx, mst, farosas,
	eblake, armbru, thuth, philmd

On 9/3/2025 11:31 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 03.09.25 17:34, Daniel P. Berrangé wrote:
>> On Wed, Sep 03, 2025 at 04:37:02PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> Handle local-incoming option:
>>>
>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>> ---
>>>   include/net/tap.h |   4 ++
>>>   net/tap.c         | 136 +++++++++++++++++++++++++++++++++++++++-------
>>>   2 files changed, 119 insertions(+), 21 deletions(-)
>>>
>>> diff --git a/include/net/tap.h b/include/net/tap.h
>>> index 6f34f13eae..3ef2e2dbae 100644
>>> --- a/include/net/tap.h
>>> +++ b/include/net/tap.h
>>> @@ -30,7 +30,11 @@
>>>   int tap_enable(NetClientState *nc);
>>>   int tap_disable(NetClientState *nc);
>>> +bool tap_local_incoming(NetClientState *nc);
>>>   int tap_get_fd(NetClientState *nc);
>>> +int tap_load(NetClientState *nc, QEMUFile *f);
>>> +int tap_save(NetClientState *nc, QEMUFile *f);
>>> +
>>>   #endif /* QEMU_NET_TAP_H */
>>> diff --git a/net/tap.c b/net/tap.c
>>> index a9d955ac5f..499db756ea 100644
>>> --- a/net/tap.c
>>> +++ b/net/tap.c
>>> @@ -35,6 +35,8 @@
>>>   #include "net/eth.h"
>>>   #include "net/net.h"
>>>   #include "clients.h"
>>> +#include "migration/migration.h"
>>> +#include "migration/qemu-file.h"
>>>   #include "monitor/monitor.h"
>>>   #include "system/system.h"
>>>   #include "qapi/error.h"
>>> @@ -82,6 +84,7 @@ typedef struct TAPState {
>>>       VHostNetState *vhost_net;
>>>       unsigned host_vnet_hdr_len;
>>>       Notifier exit;
>>> +    bool local_incoming;
>>>   } TAPState;
>>>   static void launch_script(const char *setup_script, const char *ifname,
>>> @@ -803,6 +806,40 @@ static int net_tap_init_vhost(TAPState *s, Error **errp) {
>>>       return 0;
>>>   }
>>> +int tap_save(NetClientState *nc, QEMUFile *f)
>>> +{
>>> +    TAPState *s = DO_UPCAST(TAPState, nc, nc);
>>> +
>>> +    qemu_file_put_fd(f, s->fd);
>>> +    qemu_put_byte(f, s->using_vnet_hdr);
>>> +    qemu_put_byte(f, s->has_ufo);
>>> +    qemu_put_byte(f, s->has_uso);
>>> +    qemu_put_byte(f, s->enabled);
>>> +    qemu_put_be32(f, s->host_vnet_hdr_len);
>>
>>
>> Is it neccessary to transfer that metadata, or is there perhaps a way
>> for the other side to query the TAP FD configuration from the kernel
>> to detect this ?
> 
> Oh, good question, thanks for it. I just added everything and then I was debugging other places.
> 
> for hdr_len we have TUNGETVNETHDRSZ, so it's possible.
> 
> using_vnet_hdr, seems is equal to initial vnet_hdr option (with default to 1 if not specified), will doublecheck
> 
> for ufo/uso, which are set through TUNSETOFFLOAD, we don't have direct way to
> get the state. But we can use the fact, that qemu tries to set them once,
> and these variables are unchanged after initialization. So we can try set
> same flags on target the same way, to understand what we have. Still,
> this doesn't seem absolutely safe.. Kernel may behave differently than
> for previous initialization, probably due to some changed settings.
> 
> for enabled it seems not possible, but we handle it in virtio layer.. Oops,
> probably I always migrate enabled=false with this code, will check.
> 
> ---
> 
> On the other hand, calling extra ioctls to learn something lead to extra downtime
> (should be measured to be a good argument).
> 
> Also, just architecturally: seems better not ask third agent about metadata that we already know.
> 
> ---
> 
> About forward-compatibility (if we add new fields here) - agree.
> 
> Maybe turn several boolean fields into one flags field. This way we'll get several "reserved" bits for future changes.
> 
>>
>> I'm concerned that this code / wire format is not extensible if we ever
>> add another similar field to TAPState in the future.

tap_save and tap_load should be replaced with a VMStateDescription for future
extensibility.  Use VMSTATE_FD for the fd.  Define a postload hook for
tap_read_poll and net_tap_init_vhost.

- Steve

>>> +
>>> +    return 0;
>>> +}
>>> +
>>> +int tap_load(NetClientState *nc, QEMUFile *f)
>>> +{
>>> +    TAPState *s = DO_UPCAST(TAPState, nc, nc);
>>> +
>>> +    s->fd = qemu_file_get_fd(f);
>>> +    if (s->fd < 0) {
>>> +        return -1;
>>> +    }
>>> +
>>> +    s->using_vnet_hdr = qemu_get_byte(f);
>>> +    s->has_ufo = qemu_get_byte(f);
>>> +    s->has_uso = qemu_get_byte(f);
>>> +    s->enabled = qemu_get_byte(f);
>>> +    qemu_get_be32s(f, &s->host_vnet_hdr_len);
>>> +
>>> +    tap_read_poll(s, true);
>>> +
>>> +    return net_tap_init_vhost(s, NULL);
>>> +}
>>> +
>>>   static int net_tap_fd_init_common(const Netdev *netdev, NetClientState *peer,
>>>                                     const char *model, const char *name,
>>>                                     const char *ifname, const char *script,
>>
>> With regards,
>> Daniel
> 
> 



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/8] net/tap: implement interfaces for local migration
  2025-09-03 16:09       ` Steven Sistare
@ 2025-09-04  7:41         ` Vladimir Sementsov-Ogievskiy
  2025-09-05 10:14           ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-04  7:41 UTC (permalink / raw)
  To: Steven Sistare, Daniel P. Berrangé
  Cc: jasowang, qemu-devel, leiyang, yc-core, peterx, mst, farosas,
	eblake, armbru, thuth, philmd

On 03.09.25 19:09, Steven Sistare wrote:
> On 9/3/2025 11:31 AM, Vladimir Sementsov-Ogievskiy wrote:
>> On 03.09.25 17:34, Daniel P. Berrangé wrote:
>>> On Wed, Sep 03, 2025 at 04:37:02PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> Handle local-incoming option:
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>> ---
>>>>   include/net/tap.h |   4 ++
>>>>   net/tap.c         | 136 +++++++++++++++++++++++++++++++++++++++-------
>>>>   2 files changed, 119 insertions(+), 21 deletions(-)
>>>>
>>>> diff --git a/include/net/tap.h b/include/net/tap.h
>>>> index 6f34f13eae..3ef2e2dbae 100644
>>>> --- a/include/net/tap.h
>>>> +++ b/include/net/tap.h
>>>> @@ -30,7 +30,11 @@
>>>>   int tap_enable(NetClientState *nc);
>>>>   int tap_disable(NetClientState *nc);
>>>> +bool tap_local_incoming(NetClientState *nc);
>>>>   int tap_get_fd(NetClientState *nc);
>>>> +int tap_load(NetClientState *nc, QEMUFile *f);
>>>> +int tap_save(NetClientState *nc, QEMUFile *f);
>>>> +
>>>>   #endif /* QEMU_NET_TAP_H */
>>>> diff --git a/net/tap.c b/net/tap.c
>>>> index a9d955ac5f..499db756ea 100644
>>>> --- a/net/tap.c
>>>> +++ b/net/tap.c
>>>> @@ -35,6 +35,8 @@
>>>>   #include "net/eth.h"
>>>>   #include "net/net.h"
>>>>   #include "clients.h"
>>>> +#include "migration/migration.h"
>>>> +#include "migration/qemu-file.h"
>>>>   #include "monitor/monitor.h"
>>>>   #include "system/system.h"
>>>>   #include "qapi/error.h"
>>>> @@ -82,6 +84,7 @@ typedef struct TAPState {
>>>>       VHostNetState *vhost_net;
>>>>       unsigned host_vnet_hdr_len;
>>>>       Notifier exit;
>>>> +    bool local_incoming;
>>>>   } TAPState;
>>>>   static void launch_script(const char *setup_script, const char *ifname,
>>>> @@ -803,6 +806,40 @@ static int net_tap_init_vhost(TAPState *s, Error **errp) {
>>>>       return 0;
>>>>   }
>>>> +int tap_save(NetClientState *nc, QEMUFile *f)
>>>> +{
>>>> +    TAPState *s = DO_UPCAST(TAPState, nc, nc);
>>>> +
>>>> +    qemu_file_put_fd(f, s->fd);
>>>> +    qemu_put_byte(f, s->using_vnet_hdr);
>>>> +    qemu_put_byte(f, s->has_ufo);
>>>> +    qemu_put_byte(f, s->has_uso);
>>>> +    qemu_put_byte(f, s->enabled);
>>>> +    qemu_put_be32(f, s->host_vnet_hdr_len);
>>>
>>>
>>> Is it neccessary to transfer that metadata, or is there perhaps a way
>>> for the other side to query the TAP FD configuration from the kernel
>>> to detect this ?
>>
>> Oh, good question, thanks for it. I just added everything and then I was debugging other places.
>>
>> for hdr_len we have TUNGETVNETHDRSZ, so it's possible.
>>
>> using_vnet_hdr, seems is equal to initial vnet_hdr option (with default to 1 if not specified), will doublecheck
>>
>> for ufo/uso, which are set through TUNSETOFFLOAD, we don't have direct way to
>> get the state. But we can use the fact, that qemu tries to set them once,
>> and these variables are unchanged after initialization. So we can try set
>> same flags on target the same way, to understand what we have. Still,
>> this doesn't seem absolutely safe.. Kernel may behave differently than
>> for previous initialization, probably due to some changed settings.
>>
>> for enabled it seems not possible, but we handle it in virtio layer.. Oops,
>> probably I always migrate enabled=false with this code, will check.
>>
>> ---
>>
>> On the other hand, calling extra ioctls to learn something lead to extra downtime
>> (should be measured to be a good argument).
>>
>> Also, just architecturally: seems better not ask third agent about metadata that we already know.
>>
>> ---
>>
>> About forward-compatibility (if we add new fields here) - agree.
>>
>> Maybe turn several boolean fields into one flags field. This way we'll get several "reserved" bits for future changes.
>>
>>>
>>> I'm concerned that this code / wire format is not extensible if we ever
>>> add another similar field to TAPState in the future.
> 
> tap_save and tap_load should be replaced with a VMStateDescription for future
> extensibility.  Use VMSTATE_FD for the fd.  Define a postload hook for
> tap_read_poll and net_tap_init_vhost.

How it works? I thought, if I add new field to vmsd, destination will try to load it anyway (as it loads them in a loop in vmstate_load_state()).. So, we'll have to add same new capabilities anyway to "enable" new fields (with help of .field_exists)? Same way we can add new field to current realization, with new migration capability and "if" in _load() function..

Still, seems using VMSD is better anyway, so I should do it.

> 
>>>> +
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +int tap_load(NetClientState *nc, QEMUFile *f)
>>>> +{
>>>> +    TAPState *s = DO_UPCAST(TAPState, nc, nc);
>>>> +
>>>> +    s->fd = qemu_file_get_fd(f);
>>>> +    if (s->fd < 0) {
>>>> +        return -1;
>>>> +    }
>>>> +
>>>> +    s->using_vnet_hdr = qemu_get_byte(f);
>>>> +    s->has_ufo = qemu_get_byte(f);
>>>> +    s->has_uso = qemu_get_byte(f);
>>>> +    s->enabled = qemu_get_byte(f);
>>>> +    qemu_get_be32s(f, &s->host_vnet_hdr_len);
>>>> +
>>>> +    tap_read_poll(s, true);
>>>> +
>>>> +    return net_tap_init_vhost(s, NULL);
>>>> +}
>>>> +
>>>>   static int net_tap_fd_init_common(const Netdev *netdev, NetClientState *peer,
>>>>                                     const char *model, const char *name,
>>>>                                     const char *ifname, const char *script,
>>>
>>> With regards,
>>> Daniel
>>
>>
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/8] virtio-net: live-TAP local migration
  2025-09-03 13:36 [PATCH v2 0/8] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2025-09-03 13:37 ` [PATCH v2 8/8] tests/functional: add test_x86_64_tap_fd_migration Vladimir Sementsov-Ogievskiy
@ 2025-09-04 14:42 ` Lei Yang
  2025-09-04 15:05   ` Vladimir Sementsov-Ogievskiy
  8 siblings, 1 reply; 26+ messages in thread
From: Lei Yang @ 2025-09-04 14:42 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: jasowang, qemu-devel, steven.sistare, yc-core, peterx, mst,
	farosas, eblake, armbru, thuth, philmd, berrange

Tested the current series of patches, mixed with patches from series
[1] and [2], and the virtio-net regression tests passed. I also tested
local VM migration under multiple NIC queues enabled and disabled, it
also passed.

[1] https://patchwork.ozlabs.org/project/qemu-devel/cover/20250903094411.1029449-1-vsementsov@yandex-team.ru/
[2] https://patchwork.ozlabs.org/project/qemu-devel/cover/20250903124934.1169899-1-vsementsov@yandex-team.ru/

Tested-by: Lei Yang <leiyang@redhat.com>

On Wed, Sep 3, 2025 at 9:37 PM Vladimir Sementsov-Ogievskiy
<vsementsov@yandex-team.ru> wrote:
>
> Hi all!
>
> Here is a  new migration capability "local-tap", which
> allows local migration of TAP device, including its properties
> and open fds.
>
> With this new option, management software doesn't need to
> initialize new TAP and do a switch to it. Nothing should be
> done around virtio-net in local migration: it just migrates
> and continues to use same TAP device. So we avoid extra logic
> in management software, extra allocations in kenel (for new TAP),
> and corresponding extra delay in migration downtime.
>
> Note that patch 07 is reused from
> [PATCH 00/33] vhost-user-blk: live-backend local migration
> to not create extra dependency on a big series.
>
> Based on [PATCH v3 00/19] TAP initialization refactoring, or in other
> words:
> Based-on: <20250903124934.1169899-1-vsementsov@yandex-team.ru>
>
> Vladimir Sementsov-Ogievskiy (8):
>   net/tap: add some trace points
>   net/tap: keep exit notifier only when downscript set
>   net/tap: refactor net_tap_setup_vhost()
>   qapi: add interface for local TAP migration
>   net/tap: implement interfaces for local migration
>   virtio-net: support local tap migration
>   test/functional: exec_command_and_wait_for_pattern: add vm arg
>   tests/functional: add test_x86_64_tap_fd_migration
>
>  hw/net/virtio-net.c                           | 100 ++++-
>  include/hw/virtio/virtio-net.h                |   2 +
>  include/net/tap.h                             |   4 +
>  migration/options.c                           |   7 +
>  migration/options.h                           |   1 +
>  net/tap.c                                     | 323 ++++++++++++----
>  net/trace-events                              |   7 +
>  qapi/migration.json                           |   9 +-
>  qapi/net.json                                 |  12 +-
>  tests/functional/qemu_test/cmd.py             |   7 +-
>  .../test_x86_64_tap_fd_migration.py           | 347 ++++++++++++++++++
>  11 files changed, 734 insertions(+), 85 deletions(-)
>  create mode 100644 tests/functional/test_x86_64_tap_fd_migration.py
>
> --
> 2.48.1
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/8] virtio-net: live-TAP local migration
  2025-09-04 14:42 ` [PATCH v2 0/8] virtio-net: live-TAP local migration Lei Yang
@ 2025-09-04 15:05   ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-04 15:05 UTC (permalink / raw)
  To: Lei Yang
  Cc: jasowang, qemu-devel, steven.sistare, yc-core, peterx, mst,
	farosas, eblake, armbru, thuth, philmd, berrange

On 04.09.25 17:42, Lei Yang wrote:
> Tested the current series of patches, mixed with patches from series
> [1] and [2], and the virtio-net regression tests passed. I also tested
> local VM migration under multiple NIC queues enabled and disabled, it
> also passed.
> 
> [1] https://patchwork.ozlabs.org/project/qemu-devel/cover/20250903094411.1029449-1-vsementsov@yandex-team.ru/
> [2] https://patchwork.ozlabs.org/project/qemu-devel/cover/20250903124934.1169899-1-vsementsov@yandex-team.ru/
> 
> Tested-by: Lei Yang <leiyang@redhat.com>
> 


Thanks a lot for testing!


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/8] net/tap: implement interfaces for local migration
  2025-09-04  7:41         ` Vladimir Sementsov-Ogievskiy
@ 2025-09-05 10:14           ` Vladimir Sementsov-Ogievskiy
  2025-09-05 12:23             ` Steven Sistare
  0 siblings, 1 reply; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-05 10:14 UTC (permalink / raw)
  To: Steven Sistare, Daniel P. Berrangé
  Cc: jasowang, qemu-devel, leiyang, yc-core, peterx, mst, farosas,
	eblake, armbru, thuth, philmd

On 04.09.25 10:41, Vladimir Sementsov-Ogievskiy wrote:
> On 03.09.25 19:09, Steven Sistare wrote:
>> On 9/3/2025 11:31 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> On 03.09.25 17:34, Daniel P. Berrangé wrote:
>>>> On Wed, Sep 03, 2025 at 04:37:02PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>> Handle local-incoming option:
>>>>>
>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>>> ---
>>>>>   include/net/tap.h |   4 ++
>>>>>   net/tap.c         | 136 +++++++++++++++++++++++++++++++++++++++-------
>>>>>   2 files changed, 119 insertions(+), 21 deletions(-)
>>>>>
>>>>> diff --git a/include/net/tap.h b/include/net/tap.h
>>>>> index 6f34f13eae..3ef2e2dbae 100644
>>>>> --- a/include/net/tap.h
>>>>> +++ b/include/net/tap.h
>>>>> @@ -30,7 +30,11 @@
>>>>>   int tap_enable(NetClientState *nc);
>>>>>   int tap_disable(NetClientState *nc);
>>>>> +bool tap_local_incoming(NetClientState *nc);
>>>>>   int tap_get_fd(NetClientState *nc);
>>>>> +int tap_load(NetClientState *nc, QEMUFile *f);
>>>>> +int tap_save(NetClientState *nc, QEMUFile *f);
>>>>> +
>>>>>   #endif /* QEMU_NET_TAP_H */
>>>>> diff --git a/net/tap.c b/net/tap.c
>>>>> index a9d955ac5f..499db756ea 100644
>>>>> --- a/net/tap.c
>>>>> +++ b/net/tap.c
>>>>> @@ -35,6 +35,8 @@
>>>>>   #include "net/eth.h"
>>>>>   #include "net/net.h"
>>>>>   #include "clients.h"
>>>>> +#include "migration/migration.h"
>>>>> +#include "migration/qemu-file.h"
>>>>>   #include "monitor/monitor.h"
>>>>>   #include "system/system.h"
>>>>>   #include "qapi/error.h"
>>>>> @@ -82,6 +84,7 @@ typedef struct TAPState {
>>>>>       VHostNetState *vhost_net;
>>>>>       unsigned host_vnet_hdr_len;
>>>>>       Notifier exit;
>>>>> +    bool local_incoming;
>>>>>   } TAPState;
>>>>>   static void launch_script(const char *setup_script, const char *ifname,
>>>>> @@ -803,6 +806,40 @@ static int net_tap_init_vhost(TAPState *s, Error **errp) {
>>>>>       return 0;
>>>>>   }
>>>>> +int tap_save(NetClientState *nc, QEMUFile *f)
>>>>> +{
>>>>> +    TAPState *s = DO_UPCAST(TAPState, nc, nc);
>>>>> +
>>>>> +    qemu_file_put_fd(f, s->fd);
>>>>> +    qemu_put_byte(f, s->using_vnet_hdr);
>>>>> +    qemu_put_byte(f, s->has_ufo);
>>>>> +    qemu_put_byte(f, s->has_uso);
>>>>> +    qemu_put_byte(f, s->enabled);
>>>>> +    qemu_put_be32(f, s->host_vnet_hdr_len);
>>>>
>>>>
>>>> Is it neccessary to transfer that metadata, or is there perhaps a way
>>>> for the other side to query the TAP FD configuration from the kernel
>>>> to detect this ?
>>>
>>> Oh, good question, thanks for it. I just added everything and then I was debugging other places.
>>>
>>> for hdr_len we have TUNGETVNETHDRSZ, so it's possible.
>>>
>>> using_vnet_hdr, seems is equal to initial vnet_hdr option (with default to 1 if not specified), will doublecheck
>>>
>>> for ufo/uso, which are set through TUNSETOFFLOAD, we don't have direct way to
>>> get the state. But we can use the fact, that qemu tries to set them once,
>>> and these variables are unchanged after initialization. So we can try set
>>> same flags on target the same way, to understand what we have. Still,
>>> this doesn't seem absolutely safe.. Kernel may behave differently than
>>> for previous initialization, probably due to some changed settings.
>>>
>>> for enabled it seems not possible, but we handle it in virtio layer.. Oops,
>>> probably I always migrate enabled=false with this code, will check.
>>>
>>> ---
>>>
>>> On the other hand, calling extra ioctls to learn something lead to extra downtime
>>> (should be measured to be a good argument).
>>>
>>> Also, just architecturally: seems better not ask third agent about metadata that we already know.
>>>
>>> ---
>>>
>>> About forward-compatibility (if we add new fields here) - agree.
>>>
>>> Maybe turn several boolean fields into one flags field. This way we'll get several "reserved" bits for future changes.
>>>
>>>>
>>>> I'm concerned that this code / wire format is not extensible if we ever
>>>> add another similar field to TAPState in the future.
>>
>> tap_save and tap_load should be replaced with a VMStateDescription for future
>> extensibility.  Use VMSTATE_FD for the fd.  Define a postload hook for
>> tap_read_poll and net_tap_init_vhost.
> 
> How it works? I thought, if I add new field to vmsd, destination will try to load it anyway (as it loads them in a loop in vmstate_load_state()).. So, we'll have to add same new capabilities anyway to "enable" new fields (with help of .field_exists)? Same way we can add new field to current realization, with new migration capability and "if" in _load() function..
> 
> Still, seems using VMSD is better anyway, so I should do it.
> 

answering myself: at least, fields has versioning feature, which is intended for future-compatibility.

>>
>>>>> +
>>>>> +    return 0;
>>>>> +}
>>>>> +
>>>>> +int tap_load(NetClientState *nc, QEMUFile *f)
>>>>> +{
>>>>> +    TAPState *s = DO_UPCAST(TAPState, nc, nc);
>>>>> +
>>>>> +    s->fd = qemu_file_get_fd(f);
>>>>> +    if (s->fd < 0) {
>>>>> +        return -1;
>>>>> +    }
>>>>> +
>>>>> +    s->using_vnet_hdr = qemu_get_byte(f);
>>>>> +    s->has_ufo = qemu_get_byte(f);
>>>>> +    s->has_uso = qemu_get_byte(f);
>>>>> +    s->enabled = qemu_get_byte(f);
>>>>> +    qemu_get_be32s(f, &s->host_vnet_hdr_len);
>>>>> +
>>>>> +    tap_read_poll(s, true);
>>>>> +
>>>>> +    return net_tap_init_vhost(s, NULL);
>>>>> +}
>>>>> +
>>>>>   static int net_tap_fd_init_common(const Netdev *netdev, NetClientState *peer,
>>>>>                                     const char *model, const char *name,
>>>>>                                     const char *ifname, const char *script,
>>>>
>>>> With regards,
>>>> Daniel
>>>
>>>
>>
> 
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 5/8] net/tap: implement interfaces for local migration
  2025-09-05 10:14           ` Vladimir Sementsov-Ogievskiy
@ 2025-09-05 12:23             ` Steven Sistare
  0 siblings, 0 replies; 26+ messages in thread
From: Steven Sistare @ 2025-09-05 12:23 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy, Daniel P. Berrangé
  Cc: jasowang, qemu-devel, leiyang, yc-core, peterx, mst, farosas,
	eblake, armbru, thuth, philmd

On 9/5/2025 6:14 AM, Vladimir Sementsov-Ogievskiy wrote:
> On 04.09.25 10:41, Vladimir Sementsov-Ogievskiy wrote:
>> On 03.09.25 19:09, Steven Sistare wrote:
>>> On 9/3/2025 11:31 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> On 03.09.25 17:34, Daniel P. Berrangé wrote:
>>>>> On Wed, Sep 03, 2025 at 04:37:02PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> Handle local-incoming option:
>>>>>>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>>>> ---
>>>>>>   include/net/tap.h |   4 ++
>>>>>>   net/tap.c         | 136 +++++++++++++++++++++++++++++++++++++++-------
>>>>>>   2 files changed, 119 insertions(+), 21 deletions(-)
>>>>>>
>>>>>> diff --git a/include/net/tap.h b/include/net/tap.h
>>>>>> index 6f34f13eae..3ef2e2dbae 100644
>>>>>> --- a/include/net/tap.h
>>>>>> +++ b/include/net/tap.h
>>>>>> @@ -30,7 +30,11 @@
>>>>>>   int tap_enable(NetClientState *nc);
>>>>>>   int tap_disable(NetClientState *nc);
>>>>>> +bool tap_local_incoming(NetClientState *nc);
>>>>>>   int tap_get_fd(NetClientState *nc);
>>>>>> +int tap_load(NetClientState *nc, QEMUFile *f);
>>>>>> +int tap_save(NetClientState *nc, QEMUFile *f);
>>>>>> +
>>>>>>   #endif /* QEMU_NET_TAP_H */
>>>>>> diff --git a/net/tap.c b/net/tap.c
>>>>>> index a9d955ac5f..499db756ea 100644
>>>>>> --- a/net/tap.c
>>>>>> +++ b/net/tap.c
>>>>>> @@ -35,6 +35,8 @@
>>>>>>   #include "net/eth.h"
>>>>>>   #include "net/net.h"
>>>>>>   #include "clients.h"
>>>>>> +#include "migration/migration.h"
>>>>>> +#include "migration/qemu-file.h"
>>>>>>   #include "monitor/monitor.h"
>>>>>>   #include "system/system.h"
>>>>>>   #include "qapi/error.h"
>>>>>> @@ -82,6 +84,7 @@ typedef struct TAPState {
>>>>>>       VHostNetState *vhost_net;
>>>>>>       unsigned host_vnet_hdr_len;
>>>>>>       Notifier exit;
>>>>>> +    bool local_incoming;
>>>>>>   } TAPState;
>>>>>>   static void launch_script(const char *setup_script, const char *ifname,
>>>>>> @@ -803,6 +806,40 @@ static int net_tap_init_vhost(TAPState *s, Error **errp) {
>>>>>>       return 0;
>>>>>>   }
>>>>>> +int tap_save(NetClientState *nc, QEMUFile *f)
>>>>>> +{
>>>>>> +    TAPState *s = DO_UPCAST(TAPState, nc, nc);
>>>>>> +
>>>>>> +    qemu_file_put_fd(f, s->fd);
>>>>>> +    qemu_put_byte(f, s->using_vnet_hdr);
>>>>>> +    qemu_put_byte(f, s->has_ufo);
>>>>>> +    qemu_put_byte(f, s->has_uso);
>>>>>> +    qemu_put_byte(f, s->enabled);
>>>>>> +    qemu_put_be32(f, s->host_vnet_hdr_len);
>>>>>
>>>>>
>>>>> Is it neccessary to transfer that metadata, or is there perhaps a way
>>>>> for the other side to query the TAP FD configuration from the kernel
>>>>> to detect this ?
>>>>
>>>> Oh, good question, thanks for it. I just added everything and then I was debugging other places.
>>>>
>>>> for hdr_len we have TUNGETVNETHDRSZ, so it's possible.
>>>>
>>>> using_vnet_hdr, seems is equal to initial vnet_hdr option (with default to 1 if not specified), will doublecheck
>>>>
>>>> for ufo/uso, which are set through TUNSETOFFLOAD, we don't have direct way to
>>>> get the state. But we can use the fact, that qemu tries to set them once,
>>>> and these variables are unchanged after initialization. So we can try set
>>>> same flags on target the same way, to understand what we have. Still,
>>>> this doesn't seem absolutely safe.. Kernel may behave differently than
>>>> for previous initialization, probably due to some changed settings.
>>>>
>>>> for enabled it seems not possible, but we handle it in virtio layer.. Oops,
>>>> probably I always migrate enabled=false with this code, will check.
>>>>
>>>> ---
>>>>
>>>> On the other hand, calling extra ioctls to learn something lead to extra downtime
>>>> (should be measured to be a good argument).
>>>>
>>>> Also, just architecturally: seems better not ask third agent about metadata that we already know.
>>>>
>>>> ---
>>>>
>>>> About forward-compatibility (if we add new fields here) - agree.
>>>>
>>>> Maybe turn several boolean fields into one flags field. This way we'll get several "reserved" bits for future changes.
>>>>
>>>>>
>>>>> I'm concerned that this code / wire format is not extensible if we ever
>>>>> add another similar field to TAPState in the future.
>>>
>>> tap_save and tap_load should be replaced with a VMStateDescription for future
>>> extensibility.  Use VMSTATE_FD for the fd.  Define a postload hook for
>>> tap_read_poll and net_tap_init_vhost.
>>
>> How it works? I thought, if I add new field to vmsd, destination will try to load it anyway (as it loads them in a loop in vmstate_load_state()).. So, we'll have to add same new capabilities anyway to "enable" new fields (with help of .field_exists)? Same way we can add new field to current realization, with new migration capability and "if" in _load() function..
>>
>> Still, seems using VMSD is better anyway, so I should do it.
> 
> answering myself: at least, fields has versioning feature, which is intended for future-compatibility.

Plus you can add new fields compatibly by adding a subsection.
See Subsections in docs/devel/migration/main.rst.

- steve



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 1/8] net/tap: add some trace points
  2025-09-03 14:11   ` Daniel P. Berrangé
  2025-09-03 14:26     ` Vladimir Sementsov-Ogievskiy
@ 2025-09-05 12:33     ` Vladimir Sementsov-Ogievskiy
  1 sibling, 0 replies; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-05 12:33 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: jasowang, qemu-devel, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, armbru, thuth, philmd

On 03.09.25 17:11, Daniel P. Berrangé wrote:
> It occurrs to me that this ability to dump a buffer is likely useful for
> other areas of QEMU. How about moving this to somewhere in util/ and
> renaming it to something like "dump_buffer".

Ha, I started to do it, and found that we already have util/hexdump.c :)

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 4/8] qapi: add interface for local TAP migration
  2025-09-03 13:37 ` [PATCH v2 4/8] qapi: add interface for local TAP migration Vladimir Sementsov-Ogievskiy
@ 2025-09-10  6:28   ` Markus Armbruster
  2025-09-10 10:29     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 26+ messages in thread
From: Markus Armbruster @ 2025-09-10  6:28 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: jasowang, qemu-devel, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, thuth, philmd, berrange

Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:

> To migrate TAP device (including open fds) locally, user should:
>
> 1. enable local-tap migration capability both on source and target
> 2. use additional local-incoming=true option for tap on target
>
> Why capability is not enough? We need an option to modify early
> initialization of TAP, to avoid opening new fds. The capability
> may not be set at the moment of netdev initialization.

Bummer.  No way around that?

> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

[...]

> diff --git a/qapi/migration.json b/qapi/migration.json
> index 2387c21e9c..992a5b1e2b 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -517,6 +517,12 @@
>  #     each RAM page.  Requires a migration URI that supports seeking,
>  #     such as a file.  (since 9.0)
>  #
> +# @local-tap: Migrate TAPs locally, keeping backend alive. Open file
> +#     descriptors and TAP-related state are migrated. Only may be
> +#     used when migration channel is unix socket. For target device
> +#     also @local-incoming option must be specified (since 10.2)
> +#     (since 10.2)
> +#
>  # Features:
>  #
>  # @unstable: Members @x-colo and @x-ignore-shared are experimental.

Missing here: local-tap is also experimental.

> @@ -536,7 +542,8 @@
>             { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>             'validate-uuid', 'background-snapshot',
>             'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
> -           'dirty-limit', 'mapped-ram'] }
> +           'dirty-limit', 'mapped-ram',
> +           { 'name': 'local-tap', 'features': [ 'unstable' ] } ] }
>  
>  ##
>  # @MigrationCapabilityStatus:
> diff --git a/qapi/net.json b/qapi/net.json
> index 78bcc9871e..8f53549d58 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -353,6 +353,15 @@
>  # @poll-us: maximum number of microseconds that could be spent on busy
>  #     polling for tap (since 2.7)
>  #
> +# @local-incoming: Do load open file descriptor for that TAP
> +#     on incoming migration. May be used only if QEMU is started
> +#     for incoming migration. Will work only together with local-tap
> +#     migration capability enabled (default: false) (Since: 10.2)

Scratch "Do".

Re "Maybe be used only": what happens when you use it without incoming
migration or when local-tap is off?

Does "local-incoming": false count as invalid use then?

Two spaces between sentences for consistency, please.

> +#
> +# Features:
> +#
> +# @unstable: Member @local-incoming is experimental

Period at the end for consistency, please.

> +#
>  # Since: 1.2
>  ##
>  { 'struct': 'NetdevTapOptions',
> @@ -371,7 +380,8 @@
>      '*vhostfds':   'str',
>      '*vhostforce': 'bool',
>      '*queues':     'uint32',
> -    '*poll-us':    'uint32'} }
> +    '*poll-us':    'uint32',
> +    '*local-incoming': { 'type': 'bool', 'features': [ 'unstable' ] } } }
>  
>  ##
>  # @NetdevSocketOptions:



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 4/8] qapi: add interface for local TAP migration
  2025-09-10  6:28   ` Markus Armbruster
@ 2025-09-10 10:29     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 26+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-09-10 10:29 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: jasowang, qemu-devel, leiyang, steven.sistare, yc-core, peterx,
	mst, farosas, eblake, thuth, philmd, berrange

On 10.09.25 09:28, Markus Armbruster wrote:
> Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:
> 
>> To migrate TAP device (including open fds) locally, user should:
>>
>> 1. enable local-tap migration capability both on source and target
>> 2. use additional local-incoming=true option for tap on target
>>
>> Why capability is not enough? We need an option to modify early
>> initialization of TAP, to avoid opening new fds. The capability
>> may not be set at the moment of netdev initialization.
> 
> Bummer.  No way around that?

Thanks, you made me think about it once again)

Let me first describe the problem:

At initialization time, we want to know, are we going to get the
live backend from migration stream, or we should initialize it
by hand (e.g. calling open(), and other syscalls, and making other
preparations).

If we don't know, we have to postpone the initialization to some
later point, when we have an information.

The most simple thing is to postpone it to start of the vm.

But that's work bad: at this point we can't clearly rollback the migration.

So, we have to postpone to post-load in case of incoming migration, and
to start for normal start of QEMU (not incoming).

Still, there is still a significant disadvantage:

In case of non-fds migration, we move initilization of backing into downtime,
downtime becomes longer.

What to do?

We need a point in time, when downtime is not started, but we can check for
fds-passing global capability.

And this point in time is migration_incoming_state_setup(), actually.



Peter, could we add .incoming_setup() handler to VMStateDescription ?



So, final interface could look like:

1. global fds-passing migration capability, to enable/disable the whole feature

2. per-device fds-passing option, on by default for all supporting devices, to be
able to disable backing migration for some devices. (we discussed it here: https://lore.kernel.org/all/aL8kuXQ2JF1TV3M7@x1.local/ ).
Still, normally these options are always on by default.
And more over, I can postpone their implementation to separate series, to reduce discussion field, and to check that everything may work without additional user input.


And how it works (using the example of TAP)

1. Normal start of vm

- On device initialization we don't open /dev/net/tun, and don't intialize it
- On vm start (in device's set_status), we see that we are still don't have open fd, so open /dev/net/tun, and do all other actions around it

2. Usual migration without fds

on target:
- On device initialization we don't open /dev/net/tun, and don't intialize it
- In .incoming_setup(), we see that we are still don't have open fd, and we see that fds-capability is disabled, so open /dev/net/tun, and do all other actions around it, source is still running!
- Next, incoming migration starts, downtime starts.
- On vm start, we see that TAP is already initialized, nothing to do

3. Local migration with fds

- enable fds capability both on source and target
- In .incoming_setup(), we see that fds-capability is enabled, so nothing to do (we can check, that fd must not be set)
- During load of migration, we get fds
- In .post_load(), we do any remaining initialization actions around incoming fd (not too much, as backend device is already initialized)
- On vm start, we see that TAP is already initialized, nothing to do

> 
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> 
> [...]
> 
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index 2387c21e9c..992a5b1e2b 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -517,6 +517,12 @@
>>   #     each RAM page.  Requires a migration URI that supports seeking,
>>   #     such as a file.  (since 9.0)
>>   #
>> +# @local-tap: Migrate TAPs locally, keeping backend alive. Open file
>> +#     descriptors and TAP-related state are migrated. Only may be
>> +#     used when migration channel is unix socket. For target device
>> +#     also @local-incoming option must be specified (since 10.2)
>> +#     (since 10.2)
>> +#
>>   # Features:
>>   #
>>   # @unstable: Members @x-colo and @x-ignore-shared are experimental.
> 
> Missing here: local-tap is also experimental.
> 
>> @@ -536,7 +542,8 @@
>>              { 'name': 'x-ignore-shared', 'features': [ 'unstable' ] },
>>              'validate-uuid', 'background-snapshot',
>>              'zero-copy-send', 'postcopy-preempt', 'switchover-ack',
>> -           'dirty-limit', 'mapped-ram'] }
>> +           'dirty-limit', 'mapped-ram',
>> +           { 'name': 'local-tap', 'features': [ 'unstable' ] } ] }
>>   
>>   ##
>>   # @MigrationCapabilityStatus:
>> diff --git a/qapi/net.json b/qapi/net.json
>> index 78bcc9871e..8f53549d58 100644
>> --- a/qapi/net.json
>> +++ b/qapi/net.json
>> @@ -353,6 +353,15 @@
>>   # @poll-us: maximum number of microseconds that could be spent on busy
>>   #     polling for tap (since 2.7)
>>   #
>> +# @local-incoming: Do load open file descriptor for that TAP
>> +#     on incoming migration. May be used only if QEMU is started
>> +#     for incoming migration. Will work only together with local-tap
>> +#     migration capability enabled (default: false) (Since: 10.2)
> 
> Scratch "Do".
> 
> Re "Maybe be used only": what happens when you use it without incoming
> migration or when local-tap is off?
> 
> Does "local-incoming": false count as invalid use then?
> 
> Two spaces between sentences for consistency, please.
> 
>> +#
>> +# Features:
>> +#
>> +# @unstable: Member @local-incoming is experimental
> 
> Period at the end for consistency, please.
> 
>> +#
>>   # Since: 1.2
>>   ##
>>   { 'struct': 'NetdevTapOptions',
>> @@ -371,7 +380,8 @@
>>       '*vhostfds':   'str',
>>       '*vhostforce': 'bool',
>>       '*queues':     'uint32',
>> -    '*poll-us':    'uint32'} }
>> +    '*poll-us':    'uint32',
>> +    '*local-incoming': { 'type': 'bool', 'features': [ 'unstable' ] } } }
>>   
>>   ##
>>   # @NetdevSocketOptions:
> 


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2025-09-10 10:30 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-03 13:36 [PATCH v2 0/8] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
2025-09-03 13:36 ` [PATCH v2 1/8] net/tap: add some trace points Vladimir Sementsov-Ogievskiy
2025-09-03 14:11   ` Daniel P. Berrangé
2025-09-03 14:26     ` Vladimir Sementsov-Ogievskiy
2025-09-05 12:33     ` Vladimir Sementsov-Ogievskiy
2025-09-03 13:36 ` [PATCH v2 2/8] net/tap: keep exit notifier only when downscript set Vladimir Sementsov-Ogievskiy
2025-09-03 13:37 ` [PATCH v2 3/8] net/tap: refactor net_tap_setup_vhost() Vladimir Sementsov-Ogievskiy
2025-09-03 13:37 ` [PATCH v2 4/8] qapi: add interface for local TAP migration Vladimir Sementsov-Ogievskiy
2025-09-10  6:28   ` Markus Armbruster
2025-09-10 10:29     ` Vladimir Sementsov-Ogievskiy
2025-09-03 13:37 ` [PATCH v2 5/8] net/tap: implement interfaces for local migration Vladimir Sementsov-Ogievskiy
2025-09-03 14:34   ` Daniel P. Berrangé
2025-09-03 15:31     ` Vladimir Sementsov-Ogievskiy
2025-09-03 16:09       ` Steven Sistare
2025-09-04  7:41         ` Vladimir Sementsov-Ogievskiy
2025-09-05 10:14           ` Vladimir Sementsov-Ogievskiy
2025-09-05 12:23             ` Steven Sistare
2025-09-03 13:37 ` [PATCH v2 6/8] virtio-net: support local tap migration Vladimir Sementsov-Ogievskiy
2025-09-03 13:37 ` [PATCH v2 7/8] test/functional: exec_command_and_wait_for_pattern: add vm arg Vladimir Sementsov-Ogievskiy
2025-09-03 13:37 ` [PATCH v2 8/8] tests/functional: add test_x86_64_tap_fd_migration Vladimir Sementsov-Ogievskiy
2025-09-03 14:47   ` Daniel P. Berrangé
2025-09-03 15:14     ` Vladimir Sementsov-Ogievskiy
2025-09-03 15:19       ` Daniel P. Berrangé
2025-09-03 15:24         ` Vladimir Sementsov-Ogievskiy
2025-09-04 14:42 ` [PATCH v2 0/8] virtio-net: live-TAP local migration Lei Yang
2025-09-04 15:05   ` Vladimir Sementsov-Ogievskiy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).