[PATCH v7 00/19] virtio-net: live-TAP local migration

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v7 00/19] virtio-net: live-TAP local migration
@ 2025-10-10 17:39 Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 01/19] net/tap: net_init_tap_one(): drop extra error propagation Vladimir Sementsov-Ogievskiy
                   ` (19 more replies)
  0 siblings, 20 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

Hi all!

Here is a new migration parameter backend-transfer, which allows to
enable local migration of TAP virtio-net backend, including its
properties and open fds.

With this new option, management software doesn't need to
initialize new TAP and do a switch to it. Nothing should be
done around virtio-net in local migration: it just migrates
and continues to use same TAP device. So we avoid extra logic
in management software, extra allocations in kernel (for new TAP),
and corresponding extra delay in migration downtime.

v7:

01-13,18: r-b by Maxim Davydov
          t-b by Lei Yang

05: fix tap->script to tap->downscript
07: tiny rebase conflict around "NetOffloadsd ol = {}"

14: reworked to vmsd handler
    tap is migrated inside virtio-net. And we support backend-transfer
    only for virtio-net+tap. So, it's better to support initialization
    postponing directly in virtio-net, the code is simplified, and we
    don't have to manage global list of taps.

15: reworked on top of 14

16: - drop QAPI_LIST_CONTAINS macro
    - improve commit message
    - improve QAPI documentation comments

17: - don't add extra check into virtio_net_update_host_features(),
      as we now can call it only when needed (more explicit logic)
    - drop extra includes
    - no need in "attached_to_virtio_net" variable anymore
    - add .has_tunnel to the state

19: add also test-cases for TAP migration without backend-transfer
    (to be sure, that we don't break it with new feature:)

Vladimir Sementsov-Ogievskiy (19):
  net/tap: net_init_tap_one(): drop extra error propagation
  net/tap: net_init_tap_one(): move parameter checking earlier
  net/tap: rework net_tap_init()
  net/tap: pass NULL to net_init_tap_one() in cases when scripts are
    NULL
  net/tap: rework scripts handling
  net/tap: setup exit notifier only when needed
  net/tap: split net_tap_fd_init()
  net/tap: tap_set_sndbuf(): add return value
  net/tap: rework tap_set_sndbuf()
  net/tap: rework sndbuf handling
  net/tap: introduce net_tap_setup()
  net/tap: move vhost fd initialization to net_tap_new()
  net/tap: finalize net_tap_set_fd() logic
  migration: introduce .pre_incoming() vmsd handler
  net/tap: postpone tap setup to pre-incoming
  qapi: add interface for backend-transfer virtio-net/tap migration
  virtio-net: support backend-transfer migration for virtio-net/tap
  tests/functional: add skipWithoutSudo() decorator
  tests/functional: add test_x86_64_tap_migration

 hw/net/virtio-net.c                           | 150 ++++++-
 include/migration/vmstate.h                   |   1 +
 include/net/tap.h                             |   5 +
 migration/migration.c                         |   4 +
 migration/options.c                           |  33 ++
 migration/options.h                           |   2 +
 migration/savevm.c                            |  15 +
 migration/savevm.h                            |   1 +
 net/tap-bsd.c                                 |   3 +-
 net/tap-linux.c                               |  19 +-
 net/tap-solaris.c                             |   3 +-
 net/tap-stub.c                                |   3 +-
 net/tap-win32.c                               |  11 +
 net/tap.c                                     | 425 +++++++++++++-----
 net/tap_int.h                                 |   3 +-
 qapi/migration.json                           |  42 +-
 tests/functional/qemu_test/decorators.py      |  16 +
 tests/functional/test_x86_64_tap_migration.py | 396 ++++++++++++++++
 18 files changed, 1001 insertions(+), 131 deletions(-)
 create mode 100644 tests/functional/test_x86_64_tap_migration.py

-- 
2.48.1



^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH v7 01/19] net/tap: net_init_tap_one(): drop extra error propagation
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 02/19] net/tap: net_init_tap_one(): move parameter checking earlier Vladimir Sementsov-Ogievskiy
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 net/tap.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index abe3b2d036..70de798fe8 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -736,9 +736,8 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
         }
 
         if (vhostfdname) {
-            vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, &err);
+            vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, errp);
             if (vhostfd == -1) {
-                error_propagate(errp, err);
                 goto failed;
             }
             if (!qemu_set_blocking(vhostfd, false, errp)) {
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 02/19] net/tap: net_init_tap_one(): move parameter checking earlier
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 01/19] net/tap: net_init_tap_one(): drop extra error propagation Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 03/19] net/tap: rework net_tap_init() Vladimir Sementsov-Ogievskiy
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

Let's keep all similar argument checking in net_init_tap() function.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 net/tap.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 70de798fe8..f90050c3a0 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -768,9 +768,6 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
                        "vhost-net requested but could not be initialized");
             goto failed;
         }
-    } else if (vhostfdname) {
-        error_setg(errp, "vhostfd(s)= is not valid without vhost");
-        goto failed;
     }
 
     return;
@@ -832,6 +829,11 @@ int net_init_tap(const Netdev *netdev, const char *name,
         return -1;
     }
 
+    if (tap->has_vhost && !tap->vhost && (tap->vhostfds || tap->vhostfd)) {
+        error_setg(errp, "vhostfd(s)= is not valid without vhost");
+        return -1;
+    }
+
     if (tap->fd) {
         if (tap->ifname || tap->script || tap->downscript ||
             tap->has_vnet_hdr || tap->helper || tap->has_queues ||
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 03/19] net/tap: rework net_tap_init()
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 01/19] net/tap: net_init_tap_one(): drop extra error propagation Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 02/19] net/tap: net_init_tap_one(): move parameter checking earlier Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 04/19] net/tap: pass NULL to net_init_tap_one() in cases when scripts are NULL Vladimir Sementsov-Ogievskiy
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

In future (to support backend-transfer migration for virtio-net/tap,
which includes fds passing through unix socket) we'll want to postpone
fd-initialization to the later point, when QAPI structured parameters
are not available. So, let's now rework the function to interface
without "tap" parameter.

Also, rename to net_tap_open(), as it's just a wrapper on tap_open(),
and having net_tap_init() and net_init_tap() functions in one file
is confusing.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 net/tap.c | 18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index f90050c3a0..b1b64c508d 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -655,20 +655,12 @@ int net_init_bridge(const Netdev *netdev, const char *name,
     return 0;
 }
 
-static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
+static int net_tap_open(int *vnet_hdr, bool vnet_hdr_required,
                         const char *setup_script, char *ifname,
                         size_t ifname_sz, int mq_required, Error **errp)
 {
     Error *err = NULL;
-    int fd, vnet_hdr_required;
-
-    if (tap->has_vnet_hdr) {
-        *vnet_hdr = tap->vnet_hdr;
-        vnet_hdr_required = *vnet_hdr;
-    } else {
-        *vnet_hdr = 1;
-        vnet_hdr_required = 0;
-    }
+    int fd;
 
     fd = RETRY_ON_EINTR(tap_open(ifname, ifname_sz, vnet_hdr, vnet_hdr_required,
                       mq_required, errp));
@@ -977,6 +969,8 @@ free_fail:
     } else {
         g_autofree char *default_script = NULL;
         g_autofree char *default_downscript = NULL;
+        bool vnet_hdr_required = tap->has_vnet_hdr && tap->vnet_hdr;
+
         if (tap->vhostfds) {
             error_setg(errp, "vhostfds= is invalid if fds= wasn't specified");
             return -1;
@@ -997,7 +991,9 @@ free_fail:
         }
 
         for (i = 0; i < queues; i++) {
-            fd = net_tap_init(tap, &vnet_hdr, i >= 1 ? "no" : script,
+            vnet_hdr = tap->has_vnet_hdr ? tap->vnet_hdr : 1;
+            fd = net_tap_open(&vnet_hdr, vnet_hdr_required,
+                              i >= 1 ? "no" : script,
                               ifname, sizeof ifname, queues > 1, errp);
             if (fd == -1) {
                 return -1;
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 04/19] net/tap: pass NULL to net_init_tap_one() in cases when scripts are NULL
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (2 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 03/19] net/tap: rework net_tap_init() Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 05/19] net/tap: rework scripts handling Vladimir Sementsov-Ogievskiy
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

Directly pass NULL in cases where we report an error if script or
downscript are set.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 net/tap.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index b1b64c508d..a05cc7ef64 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -800,8 +800,6 @@ int net_init_tap(const Netdev *netdev, const char *name,
     const NetdevTapOptions *tap;
     int fd, vnet_hdr = 0, i = 0, queues;
     /* for the no-fd, no-helper case */
-    const char *script;
-    const char *downscript;
     Error *err = NULL;
     const char *vhostfdname;
     char ifname[128];
@@ -811,8 +809,6 @@ int net_init_tap(const Netdev *netdev, const char *name,
     tap = &netdev->u.tap;
     queues = tap->has_queues ? tap->queues : 1;
     vhostfdname = tap->vhostfd;
-    script = tap->script;
-    downscript = tap->downscript;
 
     /* QEMU hubs do not support multiqueue tap, in this case peer is set.
      * For -netdev, peer is always NULL. */
@@ -853,7 +849,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
         }
 
         net_init_tap_one(tap, peer, "tap", name, NULL,
-                         script, downscript,
+                         NULL, NULL,
                          vhostfdname, vnet_hdr, fd, &err);
         if (err) {
             error_propagate(errp, err);
@@ -914,7 +910,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
             }
 
             net_init_tap_one(tap, peer, "tap", name, ifname,
-                             script, downscript,
+                             NULL, NULL,
                              tap->vhostfds ? vhost_fds[i] : NULL,
                              vnet_hdr, fd, &err);
             if (err) {
@@ -959,7 +955,7 @@ free_fail:
         }
 
         net_init_tap_one(tap, peer, "bridge", name, ifname,
-                         script, downscript, vhostfdname,
+                         NULL, NULL, vhostfdname,
                          vnet_hdr, fd, &err);
         if (err) {
             error_propagate(errp, err);
@@ -967,6 +963,8 @@ free_fail:
             return -1;
         }
     } else {
+        const char *script = tap->script;
+        const char *downscript = tap->downscript;
         g_autofree char *default_script = NULL;
         g_autofree char *default_downscript = NULL;
         bool vnet_hdr_required = tap->has_vnet_hdr && tap->vnet_hdr;
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 05/19] net/tap: rework scripts handling
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (3 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 04/19] net/tap: pass NULL to net_init_tap_one() in cases when scripts are NULL Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 06/19] net/tap: setup exit notifier only when needed Vladimir Sementsov-Ogievskiy
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

Simplify handling scripts: parse all these "no" and '\0' once, and
then keep simpler logic for net_tap_open() and net_init_tap_one(): NULL
means no script to run, otherwise run script.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 net/tap.c | 45 +++++++++++++++++++++++++--------------------
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index a05cc7ef64..994e885c5f 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -91,6 +91,21 @@ static void launch_script(const char *setup_script, const char *ifname,
 static void tap_send(void *opaque);
 static void tap_writable(void *opaque);
 
+static char *tap_parse_script(const char *script_arg, const char *default_path)
+{
+    g_autofree char *res = g_strdup(script_arg);
+
+    if (!res) {
+        res = get_relocated_path(default_path);
+    }
+
+    if (res[0] == '\0' || strcmp(res, "no") == 0) {
+        return NULL;
+    }
+
+    return g_steal_pointer(&res);
+}
+
 static void tap_update_fd_handler(TAPState *s)
 {
     qemu_set_fd_handler(s->fd,
@@ -668,9 +683,7 @@ static int net_tap_open(int *vnet_hdr, bool vnet_hdr_required,
         return -1;
     }
 
-    if (setup_script &&
-        setup_script[0] != '\0' &&
-        strcmp(setup_script, "no") != 0) {
+    if (setup_script) {
         launch_script(setup_script, ifname, fd, &err);
         if (err) {
             error_propagate(errp, err);
@@ -706,9 +719,9 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
         qemu_set_info_str(&s->nc, "helper=%s", tap->helper);
     } else {
         qemu_set_info_str(&s->nc, "ifname=%s,script=%s,downscript=%s", ifname,
-                          script, downscript);
+                          script ?: "no", downscript ?: "no");
 
-        if (strcmp(downscript, "no") != 0) {
+        if (downscript) {
             snprintf(s->down_script, sizeof(s->down_script), "%s", downscript);
             snprintf(s->down_script_arg, sizeof(s->down_script_arg),
                      "%s", ifname);
@@ -963,10 +976,10 @@ free_fail:
             return -1;
         }
     } else {
-        const char *script = tap->script;
-        const char *downscript = tap->downscript;
-        g_autofree char *default_script = NULL;
-        g_autofree char *default_downscript = NULL;
+        g_autofree char *script =
+            tap_parse_script(tap->script, DEFAULT_NETWORK_SCRIPT);
+        g_autofree char *downscript =
+            tap_parse_script(tap->downscript, DEFAULT_NETWORK_DOWN_SCRIPT);
         bool vnet_hdr_required = tap->has_vnet_hdr && tap->vnet_hdr;
 
         if (tap->vhostfds) {
@@ -974,14 +987,6 @@ free_fail:
             return -1;
         }
 
-        if (!script) {
-            script = default_script = get_relocated_path(DEFAULT_NETWORK_SCRIPT);
-        }
-        if (!downscript) {
-            downscript = default_downscript =
-                                 get_relocated_path(DEFAULT_NETWORK_DOWN_SCRIPT);
-        }
-
         if (tap->ifname) {
             pstrcpy(ifname, sizeof ifname, tap->ifname);
         } else {
@@ -991,7 +996,7 @@ free_fail:
         for (i = 0; i < queues; i++) {
             vnet_hdr = tap->has_vnet_hdr ? tap->vnet_hdr : 1;
             fd = net_tap_open(&vnet_hdr, vnet_hdr_required,
-                              i >= 1 ? "no" : script,
+                              i >= 1 ? NULL : script,
                               ifname, sizeof ifname, queues > 1, errp);
             if (fd == -1) {
                 return -1;
@@ -1006,8 +1011,8 @@ free_fail:
             }
 
             net_init_tap_one(tap, peer, "tap", name, ifname,
-                             i >= 1 ? "no" : script,
-                             i >= 1 ? "no" : downscript,
+                             i >= 1 ? NULL : script,
+                             i >= 1 ? NULL : downscript,
                              vhostfdname, vnet_hdr, fd, &err);
             if (err) {
                 error_propagate(errp, err);
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 06/19] net/tap: setup exit notifier only when needed
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (4 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 05/19] net/tap: rework scripts handling Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 07/19] net/tap: split net_tap_fd_init() Vladimir Sementsov-Ogievskiy
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

No reason to setup notifier on each queue of multique tap,
when we actually want to run downscript only once.
As well, let's not setup notifier, when downscript is
not enabled (downsciprt="no").

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 net/tap.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 994e885c5f..17ad561f9c 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -326,11 +326,9 @@ static void tap_exit_notify(Notifier *notifier, void *data)
     TAPState *s = container_of(notifier, TAPState, exit);
     Error *err = NULL;
 
-    if (s->down_script[0]) {
-        launch_script(s->down_script, s->down_script_arg, s->fd, &err);
-        if (err) {
-            error_report_err(err);
-        }
+    launch_script(s->down_script, s->down_script_arg, s->fd, &err);
+    if (err) {
+        error_report_err(err);
     }
 }
 
@@ -346,8 +344,11 @@ static void tap_cleanup(NetClientState *nc)
 
     qemu_purge_queued_packets(nc);
 
-    tap_exit_notify(&s->exit, NULL);
-    qemu_remove_exit_notifier(&s->exit);
+    if (s->exit.notify) {
+        tap_exit_notify(&s->exit, NULL);
+        qemu_remove_exit_notifier(&s->exit);
+        s->exit.notify = NULL;
+    }
 
     tap_read_poll(s, false);
     tap_write_poll(s, false);
@@ -443,9 +444,6 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
     tap_read_poll(s, true);
     s->vhost_net = NULL;
 
-    s->exit.notify = tap_exit_notify;
-    qemu_add_exit_notifier(&s->exit);
-
     return s;
 }
 
@@ -725,6 +723,8 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
             snprintf(s->down_script, sizeof(s->down_script), "%s", downscript);
             snprintf(s->down_script_arg, sizeof(s->down_script_arg),
                      "%s", ifname);
+            s->exit.notify = tap_exit_notify;
+            qemu_add_exit_notifier(&s->exit);
         }
     }
 
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 07/19] net/tap: split net_tap_fd_init()
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (5 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 06/19] net/tap: setup exit notifier only when needed Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 08/19] net/tap: tap_set_sndbuf(): add return value Vladimir Sementsov-Ogievskiy
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

Split the function into separate net_tap_new() and net_tap_set_fd().

We start move to the following picture:

net_tap_new() - take QAPI @tap parameter, but don't have @fd,
initialize the net client, called during initialization.

net_tap_setup() - don't have @tap (QAPI), but have @fd parameter,
may be called at later point.

In this commit we introduce the first function.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 net/tap.c | 31 +++++++++++++++++--------------
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index 17ad561f9c..7cb694e683 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -412,19 +412,20 @@ static NetClientInfo net_tap_info = {
     .get_vhost_net = tap_get_vhost_net,
 };
 
-static TAPState *net_tap_fd_init(NetClientState *peer,
-                                 const char *model,
-                                 const char *name,
-                                 int fd,
-                                 int vnet_hdr)
+static TAPState *net_tap_new(NetClientState *peer, const char *model,
+                             const char *name)
 {
-    NetOffloads ol = {};
-    NetClientState *nc;
-    TAPState *s;
+    NetClientState *nc = qemu_new_net_client(&net_tap_info, peer, model, name);
+    TAPState *s = DO_UPCAST(TAPState, nc, nc);
 
-    nc = qemu_new_net_client(&net_tap_info, peer, model, name);
+    s->fd = -1;
 
-    s = DO_UPCAST(TAPState, nc, nc);
+    return s;
+}
+
+static void net_tap_set_fd(TAPState *s, int fd, int vnet_hdr)
+{
+    NetOffloads ol = {};
 
     s->fd = fd;
     s->host_vnet_hdr_len = vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
@@ -443,8 +444,6 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
     }
     tap_read_poll(s, true);
     s->vhost_net = NULL;
-
-    return s;
 }
 
 static void close_all_fds_after_fork(int excluded_fd)
@@ -661,7 +660,9 @@ int net_init_bridge(const Netdev *netdev, const char *name,
         close(fd);
         return -1;
     }
-    s = net_tap_fd_init(peer, "bridge", name, fd, vnet_hdr);
+
+    s = net_tap_new(peer, "bridge", name);
+    net_tap_set_fd(s, fd, vnet_hdr);
 
     qemu_set_info_str(&s->nc, "helper=%s,br=%s", helper, br);
 
@@ -702,9 +703,11 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
                              int vnet_hdr, int fd, Error **errp)
 {
     Error *err = NULL;
-    TAPState *s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
+    TAPState *s = net_tap_new(peer, model, name);
     int vhostfd;
 
+    net_tap_set_fd(s, fd, vnet_hdr);
+
     tap_set_sndbuf(s->fd, tap, &err);
     if (err) {
         error_propagate(errp, err);
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 08/19] net/tap: tap_set_sndbuf(): add return value
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (6 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 07/19] net/tap: split net_tap_fd_init() Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 09/19] net/tap: rework tap_set_sndbuf() Vladimir Sementsov-Ogievskiy
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

Follow common recommendations in include/qapi/error.h of having
a return value together with errp. This allows to avoid error propagation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 net/tap-bsd.c     | 3 ++-
 net/tap-linux.c   | 5 ++++-
 net/tap-solaris.c | 3 ++-
 net/tap-stub.c    | 3 ++-
 net/tap.c         | 5 +----
 net/tap_int.h     | 2 +-
 6 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index bbf84d1828..9bd282b69c 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -206,8 +206,9 @@ error:
 }
 #endif /* __FreeBSD__ */
 
-void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
 {
+    return true;
 }
 
 int tap_probe_vnet_hdr(int fd, Error **errp)
diff --git a/net/tap-linux.c b/net/tap-linux.c
index 2a90b58467..db68693bbf 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -145,7 +145,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
  */
 #define TAP_DEFAULT_SNDBUF 0
 
-void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
 {
     int sndbuf;
 
@@ -159,7 +159,10 @@ void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
 
     if (ioctl(fd, TUNSETSNDBUF, &sndbuf) == -1 && tap->has_sndbuf) {
         error_setg_errno(errp, errno, "TUNSETSNDBUF ioctl failed");
+        return false;
     }
+
+    return true;
 }
 
 int tap_probe_vnet_hdr(int fd, Error **errp)
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index 75397e6c54..e5ba89d926 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -208,8 +208,9 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
     return fd;
 }
 
-void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
 {
+    return true;
 }
 
 int tap_probe_vnet_hdr(int fd, Error **errp)
diff --git a/net/tap-stub.c b/net/tap-stub.c
index f7a5e0c163..86d7d38e0f 100644
--- a/net/tap-stub.c
+++ b/net/tap-stub.c
@@ -33,8 +33,9 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
     return -1;
 }
 
-void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
 {
+    return true;
 }
 
 int tap_probe_vnet_hdr(int fd, Error **errp)
diff --git a/net/tap.c b/net/tap.c
index 7cb694e683..25dedd8492 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -702,15 +702,12 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
                              const char *downscript, const char *vhostfdname,
                              int vnet_hdr, int fd, Error **errp)
 {
-    Error *err = NULL;
     TAPState *s = net_tap_new(peer, model, name);
     int vhostfd;
 
     net_tap_set_fd(s, fd, vnet_hdr);
 
-    tap_set_sndbuf(s->fd, tap, &err);
-    if (err) {
-        error_propagate(errp, err);
+    if (!tap_set_sndbuf(s->fd, tap, errp)) {
         goto failed;
     }
 
diff --git a/net/tap_int.h b/net/tap_int.h
index b76a05044b..7963dd6aae 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -34,7 +34,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
 
 ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen);
 
-void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp);
+bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp);
 int tap_probe_vnet_hdr(int fd, Error **errp);
 int tap_probe_has_ufo(int fd);
 int tap_probe_has_uso(int fd);
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 09/19] net/tap: rework tap_set_sndbuf()
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (7 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 08/19] net/tap: tap_set_sndbuf(): add return value Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 10/19] net/tap: rework sndbuf handling Vladimir Sementsov-Ogievskiy
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

Keep NetdevTapOptions related logic in tap.c, and make tap_set_sndbuf a
simple system call wrapper, more like other functions in tap-linux.c

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 net/tap-bsd.c     |  2 +-
 net/tap-linux.c   | 16 ++--------------
 net/tap-solaris.c |  2 +-
 net/tap-stub.c    |  2 +-
 net/tap.c         |  6 +++++-
 net/tap_int.h     |  3 +--
 6 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index 9bd282b69c..4cea60664e 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -206,7 +206,7 @@ error:
 }
 #endif /* __FreeBSD__ */
 
-bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, int sndbuf, Error **errp)
 {
     return true;
 }
diff --git a/net/tap-linux.c b/net/tap-linux.c
index db68693bbf..bb73fa4b13 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -143,21 +143,9 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
  * Ethernet NICs generally have txqueuelen=1000, so 1Mb is
  * a good value, given a 1500 byte MTU.
  */
-#define TAP_DEFAULT_SNDBUF 0
-
-bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, int sndbuf, Error **errp)
 {
-    int sndbuf;
-
-    sndbuf = !tap->has_sndbuf       ? TAP_DEFAULT_SNDBUF :
-             tap->sndbuf > INT_MAX  ? INT_MAX :
-             tap->sndbuf;
-
-    if (!sndbuf) {
-        sndbuf = INT_MAX;
-    }
-
-    if (ioctl(fd, TUNSETSNDBUF, &sndbuf) == -1 && tap->has_sndbuf) {
+    if (ioctl(fd, TUNSETSNDBUF, &sndbuf) == -1) {
         error_setg_errno(errp, errno, "TUNSETSNDBUF ioctl failed");
         return false;
     }
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index e5ba89d926..e925ca8ae9 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -208,7 +208,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
     return fd;
 }
 
-bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, int sndbuf, Error **errp)
 {
     return true;
 }
diff --git a/net/tap-stub.c b/net/tap-stub.c
index 86d7d38e0f..6aa60d96ad 100644
--- a/net/tap-stub.c
+++ b/net/tap-stub.c
@@ -33,7 +33,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
     return -1;
 }
 
-bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, int sndbuf, Error **errp)
 {
     return true;
 }
diff --git a/net/tap.c b/net/tap.c
index 25dedd8492..f5830f4b00 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -704,10 +704,14 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
 {
     TAPState *s = net_tap_new(peer, model, name);
     int vhostfd;
+    bool sndbuf_required = tap->has_sndbuf;
+    int sndbuf =
+        (tap->has_sndbuf && tap->sndbuf) ? MIN(tap->sndbuf, INT_MAX) : INT_MAX;
 
     net_tap_set_fd(s, fd, vnet_hdr);
 
-    if (!tap_set_sndbuf(s->fd, tap, errp)) {
+    if (!tap_set_sndbuf(fd, sndbuf, sndbuf_required ? errp : NULL) &&
+        sndbuf_required) {
         goto failed;
     }
 
diff --git a/net/tap_int.h b/net/tap_int.h
index 7963dd6aae..dc4f484006 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -26,7 +26,6 @@
 #ifndef NET_TAP_INT_H
 #define NET_TAP_INT_H
 
-#include "qapi/qapi-types-net.h"
 #include "net/net.h"
 
 int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
@@ -34,7 +33,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
 
 ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen);
 
-bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp);
+bool tap_set_sndbuf(int fd, int sndbuf, Error **errp);
 int tap_probe_vnet_hdr(int fd, Error **errp);
 int tap_probe_has_ufo(int fd);
 int tap_probe_has_uso(int fd);
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 10/19] net/tap: rework sndbuf handling
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (8 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 09/19] net/tap: rework tap_set_sndbuf() Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 11/19] net/tap: introduce net_tap_setup() Vladimir Sementsov-Ogievskiy
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

Continue the main idea: avoid dependency on @tap in net_tap_setup().
So, move QAPI parsing to net_tap_new().
Move setting sndbuf to net_tap_set_fd(), as it's more appropriate place
(other initial fd settings are here).

Note that net_tap_new() and net_tap_set_fd() are shared with
net_init_bridge(), which didn't set sndbuf. Handle this case by sndbuf=0
(we never pass zero to tap_set_sndbuf(), so let this specific value mean
that we don't want touch sndbuf).

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 net/tap.c | 38 ++++++++++++++++++++++++++------------
 1 file changed, 26 insertions(+), 12 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index f5830f4b00..b5ac856a3d 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -83,6 +83,9 @@ typedef struct TAPState {
     VHostNetState *vhost_net;
     unsigned host_vnet_hdr_len;
     Notifier exit;
+
+    bool sndbuf_required;
+    int sndbuf;
 } TAPState;
 
 static void launch_script(const char *setup_script, const char *ifname,
@@ -413,17 +416,25 @@ static NetClientInfo net_tap_info = {
 };
 
 static TAPState *net_tap_new(NetClientState *peer, const char *model,
-                             const char *name)
+                             const char *name, const NetdevTapOptions *tap)
 {
     NetClientState *nc = qemu_new_net_client(&net_tap_info, peer, model, name);
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
 
     s->fd = -1;
 
+    if (!tap) {
+        return s;
+    }
+
+    s->sndbuf_required = tap->has_sndbuf;
+    s->sndbuf =
+        (tap->has_sndbuf && tap->sndbuf) ? MIN(tap->sndbuf, INT_MAX) : INT_MAX;
+
     return s;
 }
 
-static void net_tap_set_fd(TAPState *s, int fd, int vnet_hdr)
+static bool net_tap_set_fd(TAPState *s, int fd, int vnet_hdr, Error **errp)
 {
     NetOffloads ol = {};
 
@@ -444,6 +455,15 @@ static void net_tap_set_fd(TAPState *s, int fd, int vnet_hdr)
     }
     tap_read_poll(s, true);
     s->vhost_net = NULL;
+
+    if (s->sndbuf) {
+        Error **e = s->sndbuf_required ? errp : NULL;
+        if (!tap_set_sndbuf(s->fd, s->sndbuf, e) && s->sndbuf_required) {
+            return false;
+        }
+    }
+
+    return true;
 }
 
 static void close_all_fds_after_fork(int excluded_fd)
@@ -661,8 +681,8 @@ int net_init_bridge(const Netdev *netdev, const char *name,
         return -1;
     }
 
-    s = net_tap_new(peer, "bridge", name);
-    net_tap_set_fd(s, fd, vnet_hdr);
+    s = net_tap_new(peer, "bridge", name, NULL);
+    net_tap_set_fd(s, fd, vnet_hdr, &error_abort);
 
     qemu_set_info_str(&s->nc, "helper=%s,br=%s", helper, br);
 
@@ -702,16 +722,10 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
                              const char *downscript, const char *vhostfdname,
                              int vnet_hdr, int fd, Error **errp)
 {
-    TAPState *s = net_tap_new(peer, model, name);
+    TAPState *s = net_tap_new(peer, model, name, tap);
     int vhostfd;
-    bool sndbuf_required = tap->has_sndbuf;
-    int sndbuf =
-        (tap->has_sndbuf && tap->sndbuf) ? MIN(tap->sndbuf, INT_MAX) : INT_MAX;
-
-    net_tap_set_fd(s, fd, vnet_hdr);
 
-    if (!tap_set_sndbuf(fd, sndbuf, sndbuf_required ? errp : NULL) &&
-        sndbuf_required) {
+    if (!net_tap_set_fd(s, fd, vnet_hdr, errp)) {
         goto failed;
     }
 
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 11/19] net/tap: introduce net_tap_setup()
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (9 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 10/19] net/tap: rework sndbuf handling Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 12/19] net/tap: move vhost fd initialization to net_tap_new() Vladimir Sementsov-Ogievskiy
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

Move most of net_init_tap_one() to net_tap_setup() - future pair
for net_tap_new(), for postponed setup.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 net/tap.c | 39 +++++++++++++++++++++++++--------------
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index b5ac856a3d..b01cd4d6c2 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -88,6 +88,10 @@ typedef struct TAPState {
     int sndbuf;
 } TAPState;
 
+static bool net_tap_setup(TAPState *s, const NetdevTapOptions *tap,
+                          const char *vhostfdname,
+                          int fd, int vnet_hdr, Error **errp);
+
 static void launch_script(const char *setup_script, const char *ifname,
                           int fd, Error **errp);
 
@@ -723,11 +727,6 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
                              int vnet_hdr, int fd, Error **errp)
 {
     TAPState *s = net_tap_new(peer, model, name, tap);
-    int vhostfd;
-
-    if (!net_tap_set_fd(s, fd, vnet_hdr, errp)) {
-        goto failed;
-    }
 
     if (tap->fd || tap->fds) {
         qemu_set_info_str(&s->nc, "fd=%d", fd);
@@ -746,6 +745,21 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
         }
     }
 
+    if (!net_tap_setup(s, tap, vhostfdname, fd, vnet_hdr, errp)) {
+        qemu_del_net_client(&s->nc);
+    }
+}
+
+static bool net_tap_setup(TAPState *s, const NetdevTapOptions *tap,
+                          const char *vhostfdname,
+                          int fd, int vnet_hdr, Error **errp)
+{
+    int vhostfd;
+
+    if (!net_tap_set_fd(s, fd, vnet_hdr, errp)) {
+        return false;
+    }
+
     if (tap->has_vhost ? tap->vhost :
         vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
         VhostNetOptions options;
@@ -761,20 +775,20 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
         if (vhostfdname) {
             vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, errp);
             if (vhostfd == -1) {
-                goto failed;
+                return false;
             }
             if (!qemu_set_blocking(vhostfd, false, errp)) {
-                goto failed;
+                return false;
             }
         } else {
             vhostfd = open("/dev/vhost-net", O_RDWR);
             if (vhostfd < 0) {
                 error_setg_errno(errp, errno,
                                  "tap: open vhost char device failed");
-                goto failed;
+                return false;
             }
             if (!qemu_set_blocking(vhostfd, false, errp)) {
-                goto failed;
+                return false;
             }
         }
         options.opaque = (void *)(uintptr_t)vhostfd;
@@ -789,14 +803,11 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
         if (!s->vhost_net) {
             error_setg(errp,
                        "vhost-net requested but could not be initialized");
-            goto failed;
+            return false;
         }
     }
 
-    return;
-
-failed:
-    qemu_del_net_client(&s->nc);
+    return true;
 }
 
 static int get_fds(char *str, char *fds[], int max)
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 12/19] net/tap: move vhost fd initialization to net_tap_new()
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (10 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 11/19] net/tap: introduce net_tap_setup() Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 13/19] net/tap: finalize net_tap_set_fd() logic Vladimir Sementsov-Ogievskiy
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

Continue the track to avoid dependency on @tap in net_tap_setup(),
no move the vhost fd initialization to net_tap_new(). So in
net_tap_setup() we simply check, do we have and vhostfd at this
point or not.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 net/tap.c | 90 ++++++++++++++++++++++++++++++-------------------------
 1 file changed, 50 insertions(+), 40 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index b01cd4d6c2..d08ef070e9 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -86,11 +86,11 @@ typedef struct TAPState {
 
     bool sndbuf_required;
     int sndbuf;
+    int vhostfd;
+    uint32_t vhost_busyloop_timeout;
 } TAPState;
 
-static bool net_tap_setup(TAPState *s, const NetdevTapOptions *tap,
-                          const char *vhostfdname,
-                          int fd, int vnet_hdr, Error **errp);
+static bool net_tap_setup(TAPState *s, int fd, int vnet_hdr, Error **errp);
 
 static void launch_script(const char *setup_script, const char *ifname,
                           int fd, Error **errp);
@@ -361,6 +361,11 @@ static void tap_cleanup(NetClientState *nc)
     tap_write_poll(s, false);
     close(s->fd);
     s->fd = -1;
+
+    if (s->vhostfd != -1) {
+        close(s->vhostfd);
+        s->vhostfd = -1;
+    }
 }
 
 static void tap_poll(NetClientState *nc, bool enable)
@@ -420,12 +425,14 @@ static NetClientInfo net_tap_info = {
 };
 
 static TAPState *net_tap_new(NetClientState *peer, const char *model,
-                             const char *name, const NetdevTapOptions *tap)
+                             const char *name, const NetdevTapOptions *tap,
+                             const char *vhostfdname, Error **errp)
 {
     NetClientState *nc = qemu_new_net_client(&net_tap_info, peer, model, name);
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
 
     s->fd = -1;
+    s->vhostfd = -1;
 
     if (!tap) {
         return s;
@@ -435,7 +442,36 @@ static TAPState *net_tap_new(NetClientState *peer, const char *model,
     s->sndbuf =
         (tap->has_sndbuf && tap->sndbuf) ? MIN(tap->sndbuf, INT_MAX) : INT_MAX;
 
+    if (tap->has_vhost ? tap->vhost :
+        vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
+        if (vhostfdname) {
+            s->vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, errp);
+            if (s->vhostfd == -1) {
+                goto failed;
+            }
+            if (!qemu_set_blocking(s->vhostfd, false, errp)) {
+                goto failed;
+            }
+        } else {
+            s->vhostfd = open("/dev/vhost-net", O_RDWR);
+            if (s->vhostfd < 0) {
+                error_setg_errno(errp, errno,
+                                 "tap: open vhost char device failed");
+                goto failed;
+            }
+            if (!qemu_set_blocking(s->vhostfd, false, errp)) {
+                goto failed;
+            }
+        }
+
+        s->vhost_busyloop_timeout = tap->has_poll_us ? tap->poll_us : 0;
+    }
+
     return s;
+
+failed:
+    qemu_del_net_client(&s->nc);
+    return NULL;
 }
 
 static bool net_tap_set_fd(TAPState *s, int fd, int vnet_hdr, Error **errp)
@@ -685,7 +721,7 @@ int net_init_bridge(const Netdev *netdev, const char *name,
         return -1;
     }
 
-    s = net_tap_new(peer, "bridge", name, NULL);
+    s = net_tap_new(peer, "bridge", name, NULL, NULL, &error_abort);
     net_tap_set_fd(s, fd, vnet_hdr, &error_abort);
 
     qemu_set_info_str(&s->nc, "helper=%s,br=%s", helper, br);
@@ -726,7 +762,10 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
                              const char *downscript, const char *vhostfdname,
                              int vnet_hdr, int fd, Error **errp)
 {
-    TAPState *s = net_tap_new(peer, model, name, tap);
+    TAPState *s = net_tap_new(peer, model, name, tap, vhostfdname, errp);
+    if (!s) {
+        return;
+    }
 
     if (tap->fd || tap->fds) {
         qemu_set_info_str(&s->nc, "fd=%d", fd);
@@ -745,53 +784,24 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
         }
     }
 
-    if (!net_tap_setup(s, tap, vhostfdname, fd, vnet_hdr, errp)) {
+    if (!net_tap_setup(s, fd, vnet_hdr, errp)) {
         qemu_del_net_client(&s->nc);
     }
 }
 
-static bool net_tap_setup(TAPState *s, const NetdevTapOptions *tap,
-                          const char *vhostfdname,
-                          int fd, int vnet_hdr, Error **errp)
+static bool net_tap_setup(TAPState *s, int fd, int vnet_hdr, Error **errp)
 {
-    int vhostfd;
-
     if (!net_tap_set_fd(s, fd, vnet_hdr, errp)) {
         return false;
     }
 
-    if (tap->has_vhost ? tap->vhost :
-        vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
+    if (s->vhostfd != -1) {
         VhostNetOptions options;
 
         options.backend_type = VHOST_BACKEND_TYPE_KERNEL;
         options.net_backend = &s->nc;
-        if (tap->has_poll_us) {
-            options.busyloop_timeout = tap->poll_us;
-        } else {
-            options.busyloop_timeout = 0;
-        }
-
-        if (vhostfdname) {
-            vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, errp);
-            if (vhostfd == -1) {
-                return false;
-            }
-            if (!qemu_set_blocking(vhostfd, false, errp)) {
-                return false;
-            }
-        } else {
-            vhostfd = open("/dev/vhost-net", O_RDWR);
-            if (vhostfd < 0) {
-                error_setg_errno(errp, errno,
-                                 "tap: open vhost char device failed");
-                return false;
-            }
-            if (!qemu_set_blocking(vhostfd, false, errp)) {
-                return false;
-            }
-        }
-        options.opaque = (void *)(uintptr_t)vhostfd;
+        options.busyloop_timeout = s->vhost_busyloop_timeout;
+        options.opaque = (void *)(uintptr_t)s->vhostfd;
         options.nvqs = 2;
         options.feature_bits = kernel_feature_bits;
         options.get_acked_features = NULL;
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 13/19] net/tap: finalize net_tap_set_fd() logic
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (11 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 12/19] net/tap: move vhost fd initialization to net_tap_new() Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 14/19] migration: introduce .pre_incoming() vmsd handler Vladimir Sementsov-Ogievskiy
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

Let net_tap_set_fd() do only fd-related setup.

Actually, for further backend-transfer migration for virtio-net/tap
we'll want to skip net_tap_set_fd() (as incoming fds are already
prepared by source QEMU). So move tap_read_poll() to net_tap_setup().

Don't care about using_vnet_hdr and vhost_net, the state is
zero-initialized.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 net/tap.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index d08ef070e9..7e85444ace 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -480,7 +480,6 @@ static bool net_tap_set_fd(TAPState *s, int fd, int vnet_hdr, Error **errp)
 
     s->fd = fd;
     s->host_vnet_hdr_len = vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
-    s->using_vnet_hdr = false;
     s->has_ufo = tap_probe_has_ufo(s->fd);
     s->has_uso = tap_probe_has_uso(s->fd);
     s->has_tunnel = tap_probe_has_tunnel(s->fd);
@@ -493,8 +492,6 @@ static bool net_tap_set_fd(TAPState *s, int fd, int vnet_hdr, Error **errp)
     if (vnet_hdr) {
         tap_fd_set_vnet_hdr_len(s->fd, s->host_vnet_hdr_len);
     }
-    tap_read_poll(s, true);
-    s->vhost_net = NULL;
 
     if (s->sndbuf) {
         Error **e = s->sndbuf_required ? errp : NULL;
@@ -795,6 +792,8 @@ static bool net_tap_setup(TAPState *s, int fd, int vnet_hdr, Error **errp)
         return false;
     }
 
+    tap_read_poll(s, true);
+
     if (s->vhostfd != -1) {
         VhostNetOptions options;
 
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 14/19] migration: introduce .pre_incoming() vmsd handler
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (12 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 13/19] net/tap: finalize net_tap_set_fd() logic Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-14 16:26   ` Peter Xu
  2025-10-10 17:39 ` [PATCH v7 15/19] net/tap: postpone tap setup to pre-incoming Vladimir Sementsov-Ogievskiy
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

Add possibility for devices to hook into top of migrate-incoming QMP
command. It's a place, where migration capabilities and parameters
are already set, but migration downtime is not yet started (source
is still running). So here devices may do some remaining initializations
dependent on migration capabilities. This will be used in further commit
to support backend-transfer migration feature for vhost-user-blk.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 include/migration/vmstate.h |  1 +
 migration/migration.c       |  4 ++++
 migration/savevm.c          | 15 +++++++++++++++
 migration/savevm.h          |  1 +
 4 files changed, 21 insertions(+)

diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 63ccaee07a..f243518fb5 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -217,6 +217,7 @@ struct VMStateDescription {
     int version_id;
     int minimum_version_id;
     MigrationPriority priority;
+    bool (*pre_incoming)(void *opaque, Error **errp);
     int (*pre_load)(void *opaque);
     int (*pre_load_errp)(void *opaque, Error **errp);
     int (*post_load)(void *opaque, int version_id);
diff --git a/migration/migration.c b/migration/migration.c
index a63b46bbef..6ed6a10f57 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1983,6 +1983,10 @@ void qmp_migrate_incoming(const char *uri, bool has_channels,
         return;
     }
 
+    if (!qemu_pre_incoming(errp)) {
+        return;
+    }
+
     if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) {
         return;
     }
diff --git a/migration/savevm.c b/migration/savevm.c
index 7b35ec4dd0..6e240ea100 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1268,6 +1268,21 @@ bool qemu_savevm_state_blocked(Error **errp)
     return false;
 }
 
+bool qemu_pre_incoming(Error **errp)
+{
+    SaveStateEntry *se;
+
+    QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+        if (se->vmsd && se->vmsd->pre_incoming) {
+            if (!se->vmsd->pre_incoming(se->opaque, errp)) {
+                return false;
+            }
+        }
+    }
+
+    return true;
+}
+
 void qemu_savevm_non_migratable_list(strList **reasons)
 {
     SaveStateEntry *se;
diff --git a/migration/savevm.h b/migration/savevm.h
index c337e3e3d1..4ad8997f94 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -29,6 +29,7 @@
 #define QEMU_VM_COMMAND              0x08
 #define QEMU_VM_SECTION_FOOTER       0x7e
 
+bool qemu_pre_incoming(Error **errp);
 bool qemu_savevm_state_blocked(Error **errp);
 void qemu_savevm_non_migratable_list(strList **reasons);
 int qemu_savevm_state_prepare(Error **errp);
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 15/19] net/tap: postpone tap setup to pre-incoming
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (13 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 14/19] migration: introduce .pre_incoming() vmsd handler Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration Vladimir Sementsov-Ogievskiy
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

As described in previous commit, to support backend-transfer migration
for virtio-net/tap, we need to postpone the decision to open the device
or to wait for incoming fds up to pre-incoming point (when we actually
can decide).

This commit only postpones TAP-open case of initialization.
We don't try to postpone the all cases of initialization, as it will
require a lot more work of refactoring the code.

So we postpone only the simple case, for which we are going to support
fd-incoming migration:

1. No fds / fd parameters: obviously, if user give fd/fds the should
be used, no incoming backend-transfer migration is possible.

2. No helper: just for simplicity. It probably possible to allow it (and
just ignore in case of backend-transfer migration), to allow user use
same cmdline on target QEMU.. But that questionable, and postponable.

3. No sciprt/downscript. It's not simple to support downscript:
we should pass the responsiblity to call it on target QEMU with
migration.. And back to source QEMU on migration failure. It
feasible, but may be implemented later on demand.

3. Concrete ifname: to not try to share it between queues, when we only
can setup queues as separate entities. Supporting undecided ifname will
require to create some extra netdev state, connecting all the taps, to
be able to iterate through them.

No part of backend-transfer migration is here, we only prepare the code
for future implementation of it.

Are net-drivers prepared to postponed initialization of NICs?
For future feature of backend-transfer migration, we are mainly
interested in virtio-net. So, let's prepare virtio-net to work with
postponed initialization of TAP (two places about early set/get
features) and for other drivers let's simply finalize initialization on
setting netdev property. Support for other drivers may be added later if
needed.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 hw/net/virtio-net.c |  78 ++++++++++++++++++++++++-
 include/net/tap.h   |   3 +
 net/tap-win32.c     |  11 ++++
 net/tap.c           | 136 +++++++++++++++++++++++++++++++++++++++++++-
 4 files changed, 226 insertions(+), 2 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 33116712eb..661413c72f 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -719,6 +719,30 @@ default_value:
     return VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE;
 }
 
+static bool peer_wait_incoming(VirtIONet *n)
+{
+    NetClientState *nc = qemu_get_queue(n->nic);
+
+    if (!nc->peer) {
+        return false;
+    }
+
+    if (nc->peer->info->type != NET_CLIENT_DRIVER_TAP) {
+        return false;
+    }
+
+    return tap_wait_incoming(nc->peer);
+}
+
+static bool peer_postponed_init(VirtIONet *n, int index, Error **errp)
+{
+    NetClientState *nc = qemu_get_subqueue(n->nic, index);
+
+    assert(nc->peer->info->type == NET_CLIENT_DRIVER_TAP);
+
+    return tap_postponed_init(nc->peer, errp);
+}
+
 static int peer_attach(VirtIONet *n, int index)
 {
     NetClientState *nc = qemu_get_subqueue(n->nic, index);
@@ -3060,7 +3084,17 @@ static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue)
     n->multiqueue = multiqueue;
     virtio_net_change_num_queues(n, max * 2 + 1);
 
-    virtio_net_set_queue_pairs(n);
+    /*
+     * virtio_net_set_multiqueue() called from set_features(0) on early
+     * reset, when peer may wait for incoming (and is not initialized
+     * yet).
+     * Don't worry about it: virtio_net_set_queue_pairs() will be called
+     * later form virtio_net_post_load_device(), and anyway will be
+     * noop for local incoming migration with live backend passing.
+     */
+    if (!peer_wait_incoming(n)) {
+        virtio_net_set_queue_pairs(n);
+    }
 }
 
 static int virtio_net_pre_load_queues(VirtIODevice *vdev, uint32_t n)
@@ -3089,6 +3123,17 @@ static void virtio_net_get_features(VirtIODevice *vdev, uint64_t *features,
 
     virtio_add_feature_ex(features, VIRTIO_NET_F_MAC);
 
+    if (peer_wait_incoming(n)) {
+        /*
+         * Excessive feature set is OK for early initialization when
+         * we wait for local incoming migration: actual guest-negotiated
+         * features will come with migration stream anyway. And we are sure
+         * that we support same host-features as source, because the backend
+         * is the same (the same TAP device, for example).
+         */
+        return;
+    }
+
     if (!peer_has_vnet_hdr(n)) {
         virtio_clear_feature_ex(features, VIRTIO_NET_F_CSUM);
         virtio_clear_feature_ex(features, VIRTIO_NET_F_HOST_TSO4);
@@ -3180,6 +3225,18 @@ static void virtio_net_get_features(VirtIODevice *vdev, uint64_t *features,
     }
 }
 
+static bool virtio_net_update_host_features(VirtIONet *n, Error **errp)
+{
+    ERRP_GUARD();
+    VirtIODevice *vdev = VIRTIO_DEVICE(n);
+
+    peer_test_vnet_hdr(n);
+
+    virtio_net_get_features(vdev, &vdev->host_features, errp);
+
+    return !*errp;
+}
+
 static int virtio_net_post_load_device(void *opaque, int version_id)
 {
     VirtIONet *n = opaque;
@@ -4177,6 +4234,24 @@ static bool dev_unplug_pending(void *opaque)
     return vdc->primary_unplug_pending(dev);
 }
 
+static bool vhost_user_blk_pre_incoming(void *opaque, Error **errp)
+{
+    VirtIONet *n = opaque;
+    int i;
+
+    if (peer_wait_incoming(n)) {
+        for (i = 0; i < n->max_queue_pairs; i++) {
+            if (!peer_postponed_init(n, i, errp)) {
+                return false;
+            }
+        }
+
+        return virtio_net_update_host_features(n, errp);
+    }
+
+    return true;
+}
+
 static const VMStateDescription vmstate_virtio_net = {
     .name = "virtio-net",
     .minimum_version_id = VIRTIO_NET_VM_VERSION,
@@ -4185,6 +4260,7 @@ static const VMStateDescription vmstate_virtio_net = {
         VMSTATE_VIRTIO_DEVICE,
         VMSTATE_END_OF_LIST()
     },
+    .pre_incoming = vhost_user_blk_pre_incoming,
     .pre_save = virtio_net_pre_save,
     .dev_unplug_pending = dev_unplug_pending,
 };
diff --git a/include/net/tap.h b/include/net/tap.h
index 6f34f13eae..5a926ba513 100644
--- a/include/net/tap.h
+++ b/include/net/tap.h
@@ -33,4 +33,7 @@ int tap_disable(NetClientState *nc);
 
 int tap_get_fd(NetClientState *nc);
 
+bool tap_wait_incoming(NetClientState *nc);
+bool tap_postponed_init(NetClientState *nc, Error **errp);
+
 #endif /* QEMU_NET_TAP_H */
diff --git a/net/tap-win32.c b/net/tap-win32.c
index 38baf90e0b..7430cdf6fa 100644
--- a/net/tap-win32.c
+++ b/net/tap-win32.c
@@ -766,3 +766,14 @@ int tap_disable(NetClientState *nc)
 {
     abort();
 }
+
+bool tap_wait_incoming(NetClientState *nc)
+{
+    return false;
+}
+
+bool tap_postponed_init(NetClientState *nc, Error **errp)
+{
+    error_setg(errp, "win32 tap postponed init is not supported");
+    return false;
+}
diff --git a/net/tap.c b/net/tap.c
index 7e85444ace..8afbf3b407 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -35,7 +35,9 @@
 #include "net/eth.h"
 #include "net/net.h"
 #include "clients.h"
+#include "migration/misc.h"
 #include "monitor/monitor.h"
+#include "system/runstate.h"
 #include "system/system.h"
 #include "qapi/error.h"
 #include "qemu/cutils.h"
@@ -88,6 +90,13 @@ typedef struct TAPState {
     int sndbuf;
     int vhostfd;
     uint32_t vhost_busyloop_timeout;
+
+    /* for postponed setup */
+    QTAILQ_ENTRY(TAPState) next;
+    bool vnet_hdr_required;
+    int vnet_hdr;
+    bool mq_required;
+    char *ifname;
 } TAPState;
 
 static bool net_tap_setup(TAPState *s, int fd, int vnet_hdr, Error **errp);
@@ -366,6 +375,8 @@ static void tap_cleanup(NetClientState *nc)
         close(s->vhostfd);
         s->vhostfd = -1;
     }
+
+    g_free(s->ifname);
 }
 
 static void tap_poll(NetClientState *nc, bool enable)
@@ -383,6 +394,25 @@ static bool tap_set_steering_ebpf(NetClientState *nc, int prog_fd)
     return tap_fd_set_steering_ebpf(s->fd, prog_fd) == 0;
 }
 
+static bool tap_check_peer_type(NetClientState *nc, ObjectClass *oc,
+                                Error **errp)
+{
+    TAPState *s = DO_UPCAST(TAPState, nc, nc);
+    const char *driver = object_class_get_name(oc);
+
+    if (!g_str_has_prefix(driver, "virtio-net-")) {
+        /*
+         * Only virtio-net support postponed TAP initialization, so
+         * for other drivers let's finalize initialization now.
+         */
+        if (tap_wait_incoming(nc)) {
+            return tap_postponed_init(&s->nc, errp);
+        }
+    }
+
+    return true;
+}
+
 int tap_get_fd(NetClientState *nc)
 {
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
@@ -422,6 +452,7 @@ static NetClientInfo net_tap_info = {
     .set_vnet_be = tap_set_vnet_be,
     .set_steering_ebpf = tap_set_steering_ebpf,
     .get_vhost_net = tap_get_vhost_net,
+    .check_peer_type = tap_check_peer_type,
 };
 
 static TAPState *net_tap_new(NetClientState *peer, const char *model,
@@ -845,6 +876,93 @@ static int get_fds(char *str, char *fds[], int max)
     return i;
 }
 
+#define TAP_OPEN_IFNAME_SZ 128
+
+bool tap_postponed_init(NetClientState *nc, Error **errp)
+{
+    TAPState *s = DO_UPCAST(TAPState, nc, nc);
+    char ifname[TAP_OPEN_IFNAME_SZ];
+    int vnet_hdr = s->vnet_hdr;
+    int fd;
+
+    pstrcpy(ifname, sizeof(ifname), s->ifname);
+    fd = net_tap_open(&vnet_hdr, s->vnet_hdr_required, NULL,
+                      ifname, sizeof(ifname),
+                      s->mq_required, errp);
+    if (fd < 0) {
+        goto fail;
+    }
+
+    if (!net_tap_setup(s, fd, vnet_hdr, errp)) {
+        goto fail;
+    }
+
+    return true;
+
+fail:
+    qemu_del_net_client(&s->nc);
+    return false;
+}
+
+static bool check_no_script(const char *script_arg)
+{
+    return script_arg &&
+        (script_arg[0] == '\0' || strcmp(script_arg, "no") == 0);
+}
+
+static bool tap_postpone_init(const NetdevTapOptions *tap,
+                              const char *name, NetClientState *peer,
+                              bool *postponed, Error **errp)
+{
+    int queues = tap->has_queues ? tap->queues : 1;
+
+    *postponed = false;
+
+    if (!runstate_check(RUN_STATE_INMIGRATE)) {
+        return true;
+    }
+
+    if (tap->fd || tap->fds || tap->helper || tap->vhostfds) {
+        return true;
+    }
+
+    if (!tap->ifname || tap->ifname[0] == '\0' ||
+        strstr(tap->ifname, "%d") != NULL) {
+        /*
+         * It's hard to postpone logic of parsing template or
+         * absent ifname
+         */
+        return true;
+    }
+
+    /*
+     * Supporting downscipt means understanding and realizing the logic of
+     * transfer of responsibility to call it in target QEMU process. Or in
+     * source QEMU process in case of migration failure. So for simplicity we
+     * don't support scripts together with fds migration.
+     */
+    if (!check_no_script(tap->script) || !check_no_script(tap->downscript)) {
+        return true;
+    }
+
+    for (int i = 0; i < queues; i++) {
+        TAPState *s = net_tap_new(peer, "tap", name, tap, NULL, errp);
+        if (!s) {
+            return false;
+        }
+
+        s->vnet_hdr_required = tap->has_vnet_hdr && tap->vnet_hdr;
+        s->vnet_hdr = tap->has_vnet_hdr ? tap->vnet_hdr : 1;
+        s->mq_required = queues > 1;
+        s->ifname = g_strdup(tap->ifname);
+        qemu_set_info_str(&s->nc, "ifname=%s,script=no,downscript=no",
+                          tap->ifname);
+    }
+
+    *postponed = true;
+    return true;
+}
+
 int net_init_tap(const Netdev *netdev, const char *name,
                  NetClientState *peer, Error **errp)
 {
@@ -853,8 +971,9 @@ int net_init_tap(const Netdev *netdev, const char *name,
     /* for the no-fd, no-helper case */
     Error *err = NULL;
     const char *vhostfdname;
-    char ifname[128];
+    char ifname[TAP_OPEN_IFNAME_SZ];
     int ret = 0;
+    bool postponed = false;
 
     assert(netdev->type == NET_CLIENT_DRIVER_TAP);
     tap = &netdev->u.tap;
@@ -873,6 +992,14 @@ int net_init_tap(const Netdev *netdev, const char *name,
         return -1;
     }
 
+    if (!tap_postpone_init(tap, name, peer, &postponed, errp)) {
+        return -1;
+    }
+
+    if (postponed) {
+        return 0;
+    }
+
     if (tap->fd) {
         if (tap->ifname || tap->script || tap->downscript ||
             tap->has_vnet_hdr || tap->helper || tap->has_queues ||
@@ -1097,3 +1224,10 @@ int tap_disable(NetClientState *nc)
         return ret;
     }
 }
+
+bool tap_wait_incoming(NetClientState *nc)
+{
+    TAPState *s = DO_UPCAST(TAPState, nc, nc);
+
+    return s->fd == -1;
+}
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (14 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 15/19] net/tap: postpone tap setup to pre-incoming Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-14 16:33   ` Peter Xu
  2025-10-10 17:39 ` [PATCH v7 17/19] virtio-net: support backend-transfer migration for virtio-net/tap Vladimir Sementsov-Ogievskiy
                   ` (3 subsequent siblings)
  19 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

To migrate virtio-net TAP device backend (including open fds) locally,
user should simply set migration parameter

   backend-transfer = ["virtio-net-tap"]

Why not simple boolean? To simplify migration to further versions,
when more devices will support backend-transfer migration.

Alternatively, we may add per-device option to disable backend-transfer
migration, but still:

1. It's more comfortable to set same capabilities/parameters on both
source and target QEMU, than care about each device.

2. To not break the design, that machine-type + device options +
migration capabilities and parameters are fully define the resulting
migration stream. We'll break this if add in future more
backend-transfer support in devices under same backend-transfer=true
parameter.

The commit only brings the interface, the realization will come in later
commit. That's why we add a temporary not-implemented error in
migrate_params_check().

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
 migration/options.c | 39 +++++++++++++++++++++++++++++++++++++++
 migration/options.h |  2 ++
 qapi/migration.json | 42 ++++++++++++++++++++++++++++++++++++------
 3 files changed, 77 insertions(+), 6 deletions(-)

diff --git a/migration/options.c b/migration/options.c
index 5183112775..76709af3ab 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -13,6 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/error-report.h"
+#include "qapi/util.h"
 #include "exec/target_page.h"
 #include "qapi/clone-visitor.h"
 #include "qapi/error.h"
@@ -262,6 +263,20 @@ bool migrate_mapped_ram(void)
     return s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM];
 }
 
+bool migrate_virtio_net_tap(void)
+{
+    MigrationState *s = migrate_get_current();
+    BackendTransferList *el = s->parameters.backend_transfer;
+
+    for ( ; el; el = el->next) {
+        if (el->value == BACKEND_TRANSFER_VIRTIO_NET_TAP) {
+            return true;
+        }
+    }
+
+    return false;
+}
+
 bool migrate_ignore_shared(void)
 {
     MigrationState *s = migrate_get_current();
@@ -963,6 +978,12 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
     params->cpr_exec_command = QAPI_CLONE(strList,
                                           s->parameters.cpr_exec_command);
 
+    if (s->parameters.backend_transfer) {
+        params->has_backend_transfer = true;
+        params->backend_transfer = QAPI_CLONE(BackendTransferList,
+                                              s->parameters.backend_transfer);
+    }
+
     return params;
 }
 
@@ -997,6 +1018,7 @@ void migrate_params_init(MigrationParameters *params)
     params->has_zero_page_detection = true;
     params->has_direct_io = true;
     params->has_cpr_exec_command = true;
+    params->has_backend_transfer = true;
 }
 
 /*
@@ -1183,6 +1205,12 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
         return false;
     }
 
+    /* TODO: implement backend-transfer and remove this check */
+    if (params->has_backend_transfer) {
+        error_setg(errp, "Not implemented");
+        return false;
+    }
+
     return true;
 }
 
@@ -1305,6 +1333,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
     if (params->has_cpr_exec_command) {
         dest->cpr_exec_command = params->cpr_exec_command;
     }
+
+    if (params->has_backend_transfer) {
+        dest->backend_transfer = params->backend_transfer;
+    }
 }
 
 static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1443,6 +1475,13 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
         s->parameters.cpr_exec_command =
             QAPI_CLONE(strList, params->cpr_exec_command);
     }
+
+    if (params->has_backend_transfer) {
+        qapi_free_BackendTransferList(s->parameters.backend_transfer);
+
+        s->parameters.backend_transfer = QAPI_CLONE(BackendTransferList,
+                                                    params->backend_transfer);
+    }
 }
 
 void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
diff --git a/migration/options.h b/migration/options.h
index 82d839709e..55c0345433 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -87,6 +87,8 @@ const char *migrate_tls_hostname(void);
 uint64_t migrate_xbzrle_cache_size(void);
 ZeroPageDetection migrate_zero_page_detection(void);
 
+bool migrate_virtio_net_tap(void);
+
 /* parameters helpers */
 
 bool migrate_params_check(MigrationParameters *params, Error **errp);
diff --git a/qapi/migration.json b/qapi/migration.json
index be0f3fcc12..1bfe7df191 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -770,6 +770,19 @@
       '*transform': 'BitmapMigrationBitmapAliasTransform'
   } }
 
+##
+# @BackendTransfer:
+#
+# @virtio-net-tap: Enable backend-transfer migration for
+#     virtio-net/tap. When enabled, TAP fds and all related state are
+#     passed to the destination in the migration channel (which must
+#     be a UNIX domain socket).
+#
+# Since: 10.2
+##
+{ 'enum': 'BackendTransfer',
+  'data': [ 'virtio-net-tap' ] }
+
 ##
 # @BitmapMigrationNodeAlias:
 #
@@ -951,9 +964,13 @@
 #     is @cpr-exec.  The first list element is the program's filename,
 #     the remainder its arguments.  (Since 10.2)
 #
+# @backend-transfer: List of targets for backend-transfer migration.
+#     See description in `BackendTransfer`.  Default is no
+#     backend-transfer migration (Since 10.2)
+#
 # Features:
 #
-# @unstable: Members @x-checkpoint-delay and
+# @unstable: Members @backend-transfer, @x-checkpoint-delay and
 #     @x-vcpu-dirty-limit-period are experimental.
 #
 # Since: 2.4
@@ -978,7 +995,8 @@
            'mode',
            'zero-page-detection',
            'direct-io',
-           'cpr-exec-command'] }
+           'cpr-exec-command',
+           { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
 
 ##
 # @MigrateSetParameters:
@@ -1137,9 +1155,13 @@
 #     is @cpr-exec.  The first list element is the program's filename,
 #     the remainder its arguments.  (Since 10.2)
 #
+# @backend-transfer: List of targets for backend-transfer migration.
+#     See description in `BackendTransfer`.  Default is no
+#     backend-transfer migration (Since 10.2)
+#
 # Features:
 #
-# @unstable: Members @x-checkpoint-delay and
+# @unstable: Members @backend-transfer, @x-checkpoint-delay and
 #     @x-vcpu-dirty-limit-period are experimental.
 #
 # TODO: either fuse back into `MigrationParameters`, or make
@@ -1179,7 +1201,9 @@
             '*mode': 'MigMode',
             '*zero-page-detection': 'ZeroPageDetection',
             '*direct-io': 'bool',
-            '*cpr-exec-command': [ 'str' ]} }
+            '*cpr-exec-command': [ 'str' ],
+            '*backend-transfer': { 'type': [ 'BackendTransfer' ],
+                                   'features': [ 'unstable' ] } } }
 
 ##
 # @migrate-set-parameters:
@@ -1352,9 +1376,13 @@
 #     is @cpr-exec.  The first list element is the program's filename,
 #     the remainder its arguments.  (Since 10.2)
 #
+# @backend-transfer: List of targets for backend-transfer migration.
+#     See description in `BackendTransfer`.  Default is no
+#     backend-transfer migration (Since 10.2)
+#
 # Features:
 #
-# @unstable: Members @x-checkpoint-delay and
+# @unstable: Members @backend-transfer, @x-checkpoint-delay and
 #     @x-vcpu-dirty-limit-period are experimental.
 #
 # Since: 2.4
@@ -1391,7 +1419,9 @@
             '*mode': 'MigMode',
             '*zero-page-detection': 'ZeroPageDetection',
             '*direct-io': 'bool',
-            '*cpr-exec-command': [ 'str' ]} }
+            '*cpr-exec-command': [ 'str' ],
+            '*backend-transfer': { 'type': [ 'BackendTransfer' ],
+                                   'features': [ 'unstable' ] } } }
 
 ##
 # @query-migrate-parameters:
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 17/19] virtio-net: support backend-transfer migration for virtio-net/tap
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (15 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 18/19] tests/functional: add skipWithoutSudo() decorator Vladimir Sementsov-Ogievskiy
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

Finally implement the new migration option

    backend-transfer = ["virtio-net-tap"].

With this enabled (both on source and target) of-course, and with
unix-socket used as migration-channel, we do "migrate" the virtio-net
backend - TAP device, with all its fds.

This way management tool should not care about creating new TAP, and
should not handle switching to it. Migration downtime become shorter.

How it works:

1. For incoming migration, we postpone TAP initialization up to
   pre-incoming point.

2. At pre-incoming point we see that "virtio-net-tap" is set for
   backend-transfer, so we postpone TAP initialization up to
   post-load

3. During virtio-load, we get TAP state (and fds) as part of
   virtio-net state

4. In post-load we finalize TAP initialization

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 hw/net/virtio-net.c | 74 ++++++++++++++++++++++++++++++++++++++++++++-
 include/net/tap.h   |  2 ++
 migration/options.c |  6 ----
 net/tap.c           | 45 ++++++++++++++++++++++++++-
 4 files changed, 119 insertions(+), 8 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 661413c72f..41c45a4bc7 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -38,6 +38,7 @@
 #include "qapi/qapi-events-migration.h"
 #include "hw/virtio/virtio-access.h"
 #include "migration/misc.h"
+#include "migration/options.h"
 #include "standard-headers/linux/ethtool.h"
 #include "system/system.h"
 #include "system/replay.h"
@@ -3358,6 +3359,9 @@ struct VirtIONetMigTmp {
     uint16_t        curr_queue_pairs_1;
     uint8_t         has_ufo;
     uint32_t        has_vnet_hdr;
+
+    NetClientState *ncs;
+    uint32_t max_queue_pairs;
 };
 
 /* The 2nd and subsequent tx_waiting flags are loaded later than
@@ -3627,6 +3631,71 @@ static const VMStateDescription vhost_user_net_backend_state = {
     }
 };
 
+static bool virtio_net_is_tap_mig(void *opaque, int version_id)
+{
+    VirtIONet *n = opaque;
+    NetClientState *nc;
+
+    nc = qemu_get_queue(n->nic);
+
+    return migrate_virtio_net_tap() && nc->peer &&
+        nc->peer->info->type == NET_CLIENT_DRIVER_TAP;
+}
+
+static int virtio_net_nic_pre_save(void *opaque)
+{
+    struct VirtIONetMigTmp *tmp = opaque;
+
+    tmp->ncs = tmp->parent->nic->ncs;
+    tmp->max_queue_pairs = tmp->parent->max_queue_pairs;
+
+    return 0;
+}
+
+static int virtio_net_nic_pre_load(void *opaque)
+{
+    /* Reuse the pointer setup from save */
+    virtio_net_nic_pre_save(opaque);
+
+    return 0;
+}
+
+static int virtio_net_nic_post_load(void *opaque, int version_id)
+{
+    struct VirtIONetMigTmp *tmp = opaque;
+    Error *local_err = NULL;
+
+    if (!virtio_net_update_host_features(tmp->parent, &local_err)) {
+        error_report_err(local_err);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static const VMStateDescription vmstate_virtio_net_nic_nc = {
+    .name = "virtio-net-nic-nc",
+    .fields = (const VMStateField[]) {
+        VMSTATE_STRUCT_POINTER(peer, NetClientState, vmstate_tap,
+                               NetClientState),
+        VMSTATE_END_OF_LIST()
+   },
+};
+
+static const VMStateDescription vmstate_virtio_net_nic = {
+    .name      = "virtio-net-nic",
+    .pre_load  = virtio_net_nic_pre_load,
+    .pre_save  = virtio_net_nic_pre_save,
+    .post_load  = virtio_net_nic_post_load,
+    .fields    = (const VMStateField[]) {
+        VMSTATE_STRUCT_VARRAY_POINTER_UINT32(ncs, struct VirtIONetMigTmp,
+                                             max_queue_pairs,
+                                             vmstate_virtio_net_nic_nc,
+                                             struct NetClientState),
+        VMSTATE_END_OF_LIST()
+    },
+};
+
 static const VMStateDescription vmstate_virtio_net_device = {
     .name = "virtio-net-device",
     .version_id = VIRTIO_NET_VM_VERSION,
@@ -3658,6 +3727,9 @@ static const VMStateDescription vmstate_virtio_net_device = {
          * but based on the uint.
          */
         VMSTATE_BUFFER_POINTER_UNSAFE(vlans, VirtIONet, 0, MAX_VLAN >> 3),
+        VMSTATE_WITH_TMP_TEST(VirtIONet, virtio_net_is_tap_mig,
+                              struct VirtIONetMigTmp,
+                              vmstate_virtio_net_nic),
         VMSTATE_WITH_TMP(VirtIONet, struct VirtIONetMigTmp,
                          vmstate_virtio_net_has_vnet),
         VMSTATE_UINT8(mac_table.multi_overflow, VirtIONet),
@@ -4239,7 +4311,7 @@ static bool vhost_user_blk_pre_incoming(void *opaque, Error **errp)
     VirtIONet *n = opaque;
     int i;
 
-    if (peer_wait_incoming(n)) {
+    if (!virtio_net_is_tap_mig(opaque, 0) && peer_wait_incoming(n)) {
         for (i = 0; i < n->max_queue_pairs; i++) {
             if (!peer_postponed_init(n, i, errp)) {
                 return false;
diff --git a/include/net/tap.h b/include/net/tap.h
index 5a926ba513..506f7ab719 100644
--- a/include/net/tap.h
+++ b/include/net/tap.h
@@ -36,4 +36,6 @@ int tap_get_fd(NetClientState *nc);
 bool tap_wait_incoming(NetClientState *nc);
 bool tap_postponed_init(NetClientState *nc, Error **errp);
 
+extern const VMStateDescription vmstate_tap;
+
 #endif /* QEMU_NET_TAP_H */
diff --git a/migration/options.c b/migration/options.c
index 76709af3ab..7c7df6c484 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -1205,12 +1205,6 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
         return false;
     }
 
-    /* TODO: implement backend-transfer and remove this check */
-    if (params->has_backend_transfer) {
-        error_setg(errp, "Not implemented");
-        return false;
-    }
-
     return true;
 }
 
diff --git a/net/tap.c b/net/tap.c
index 8afbf3b407..b9c12dd64c 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -819,7 +819,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
 
 static bool net_tap_setup(TAPState *s, int fd, int vnet_hdr, Error **errp)
 {
-    if (!net_tap_set_fd(s, fd, vnet_hdr, errp)) {
+    if (fd != -1 && !net_tap_set_fd(s, fd, vnet_hdr, errp)) {
         return false;
     }
 
@@ -1225,6 +1225,49 @@ int tap_disable(NetClientState *nc)
     }
 }
 
+static int tap_pre_load(void *opaque)
+{
+    TAPState *s = opaque;
+
+    if (s->fd != -1) {
+        error_report(
+            "TAP is already initialized and cannot receive incoming fd");
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+static int tap_post_load(void *opaque, int version_id)
+{
+    TAPState *s = opaque;
+    Error *local_err = NULL;
+
+    if (!net_tap_setup(s, -1, -1, &local_err)) {
+        error_report_err(local_err);
+        qemu_del_net_client(&s->nc);
+        return -EINVAL;
+    }
+
+    return 0;
+}
+
+const VMStateDescription vmstate_tap = {
+    .name = "net-tap",
+    .pre_load = tap_pre_load,
+    .post_load = tap_post_load,
+    .fields = (const VMStateField[]) {
+        VMSTATE_FD(fd, TAPState),
+        VMSTATE_BOOL(using_vnet_hdr, TAPState),
+        VMSTATE_BOOL(has_ufo, TAPState),
+        VMSTATE_BOOL(has_uso, TAPState),
+        VMSTATE_BOOL(has_tunnel, TAPState),
+        VMSTATE_BOOL(enabled, TAPState),
+        VMSTATE_UINT32(host_vnet_hdr_len, TAPState),
+        VMSTATE_END_OF_LIST()
+    }
+};
+
 bool tap_wait_incoming(NetClientState *nc)
 {
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 18/19] tests/functional: add skipWithoutSudo() decorator
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (16 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 17/19] virtio-net: support backend-transfer migration for virtio-net/tap Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-10 17:39 ` [PATCH v7 19/19] tests/functional: add test_x86_64_tap_migration Vladimir Sementsov-Ogievskiy
  2025-10-11 15:26 ` [PATCH v7 00/19] virtio-net: live-TAP local migration Lei Yang
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

To be used in the next commit: that would be a test for TAP
networking, and it will need to setup TAP device.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 tests/functional/qemu_test/decorators.py | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/tests/functional/qemu_test/decorators.py b/tests/functional/qemu_test/decorators.py
index b239295804..125d31dda6 100644
--- a/tests/functional/qemu_test/decorators.py
+++ b/tests/functional/qemu_test/decorators.py
@@ -6,6 +6,7 @@
 import os
 import platform
 import resource
+import subprocess
 from unittest import skipIf, skipUnless
 
 from .cmd import which
@@ -167,3 +168,18 @@ def skipLockedMemoryTest(locked_memory):
         ulimit_memory == resource.RLIM_INFINITY or ulimit_memory >= locked_memory * 1024,
         f'Test required {locked_memory} kB of available locked memory',
     )
+
+'''
+Decorator to skip execution of a test if passwordless
+sudo command is not available.
+'''
+def skipWithoutSudo():
+    proc = subprocess.run(["sudo", "-n", "/bin/true"],
+                          stdin=subprocess.PIPE,
+                          stdout=subprocess.PIPE,
+                          stderr=subprocess.STDOUT,
+                          universal_newlines=True,
+                          check=False)
+
+    return skipUnless(proc.returncode == 0,
+                      f'requires password-less sudo access: {proc.stdout}')
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH v7 19/19] tests/functional: add test_x86_64_tap_migration
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (17 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 18/19] tests/functional: add skipWithoutSudo() decorator Vladimir Sementsov-Ogievskiy
@ 2025-10-10 17:39 ` Vladimir Sementsov-Ogievskiy
  2025-10-11 15:26 ` [PATCH v7 00/19] virtio-net: live-TAP local migration Lei Yang
  19 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-10 17:39 UTC (permalink / raw)
  To: mst, jasowang
  Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
	qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
	yc-core, vsementsov, raphael.s.norwitz

Add test for a new backend-transfer migration of virtio-net/tap, with fd
passing through unix socket.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
 tests/functional/test_x86_64_tap_migration.py | 396 ++++++++++++++++++
 1 file changed, 396 insertions(+)
 create mode 100644 tests/functional/test_x86_64_tap_migration.py

diff --git a/tests/functional/test_x86_64_tap_migration.py b/tests/functional/test_x86_64_tap_migration.py
new file mode 100644
index 0000000000..d7d1ed72bf
--- /dev/null
+++ b/tests/functional/test_x86_64_tap_migration.py
@@ -0,0 +1,396 @@
+#!/usr/bin/env python3
+#
+# Functional test that tests TAP local migration
+# with fd passing
+#
+# Copyright (c) Yandex Technologies LLC, 2025
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import os
+import time
+import subprocess
+from subprocess import run
+import signal
+from typing import Tuple
+
+from qemu_test import (
+    LinuxKernelTest,
+    Asset,
+    exec_command_and_wait_for_pattern,
+)
+from qemu_test.decorators import skipWithoutSudo
+
+GUEST_IP = "10.0.1.2"
+GUEST_IP_MASK = f"{GUEST_IP}/24"
+GUEST_MAC = "d6:0d:75:f8:0f:b7"
+HOST_IP = "10.0.1.1"
+HOST_IP_MASK = f"{HOST_IP}/24"
+TAP_ID = "tap0"
+TAP_ID2 = "tap1"
+TAP_MAC = "e6:1d:44:b5:03:5d"
+
+
+def ip(args, check=True) -> None:
+    """Run ip command with sudo"""
+    run(["sudo", "ip"] + args, check=check)
+
+
+def del_tap(tap_name: str = TAP_ID) -> None:
+    ip(["tuntap", "del", tap_name, "mode", "tap", "multi_queue"], check=False)
+
+
+def init_tap(tap_name: str = TAP_ID, with_ip: bool = True) -> None:
+    ip(["tuntap", "add", "dev", tap_name, "mode", "tap", "multi_queue"])
+    if with_ip:
+        ip(["link", "set", "dev", tap_name, "address", TAP_MAC])
+        ip(["addr", "add", HOST_IP_MASK, "dev", tap_name])
+    ip(["link", "set", tap_name, "up"])
+
+
+def switch_network_to_tap2() -> None:
+    ip(["link", "set", TAP_ID2, "down"])
+    ip(["link", "set", TAP_ID, "down"])
+    ip(["addr", "delete", HOST_IP_MASK, "dev", TAP_ID])
+    ip(["link", "set", "dev", TAP_ID2, "address", TAP_MAC])
+    ip(["addr", "add", HOST_IP_MASK, "dev", TAP_ID2])
+    ip(["link", "set", TAP_ID2, "up"])
+
+
+def parse_ping_line(line: str) -> float:
+    # suspect lines like
+    # [1748524876.590509] 64 bytes from 94.245.155.3 \
+    #      (94.245.155.3): icmp_seq=1 ttl=250 time=101 ms
+    spl = line.split()
+    return float(spl[0][1:-1])
+
+
+def parse_ping_output(out) -> Tuple[bool, float, float]:
+    lines = [x for x in out.split("\n") if x.startswith("[")]
+
+    try:
+        first_no_ans = next(
+            (ind for ind in range(len(lines)) if lines[ind][20:26] == "no ans")
+        )
+    except StopIteration:
+        return False, parse_ping_line(lines[0]), parse_ping_line(lines[-1])
+
+    last_no_ans = next(
+        ind
+        for ind in range(len(lines) - 1, -1, -1)
+        if lines[ind][20:26] == "no ans"
+    )
+
+    return (
+        True,
+        parse_ping_line(lines[first_no_ans]),
+        parse_ping_line(lines[last_no_ans]),
+    )
+
+
+def wait_migration_finish(source_vm, target_vm):
+    migr_events = (
+        ("MIGRATION", {"data": {"status": "completed"}}),
+        ("MIGRATION", {"data": {"status": "failed"}}),
+    )
+
+    source_e = source_vm.events_wait(migr_events)["data"]
+    target_e = target_vm.events_wait(migr_events)["data"]
+
+    source_s = source_vm.cmd("query-status")["status"]
+    target_s = target_vm.cmd("query-status")["status"]
+
+    assert (
+        source_e["status"] == "completed"
+        and target_e["status"] == "completed"
+        and source_s == "postmigrate"
+        and target_s == "paused"
+    ), f"""Migration failed:
+    SRC status: {source_s}
+    SRC event: {source_e}
+    TGT status: {target_s}
+    TGT event:{target_e}"""
+
+
+@skipWithoutSudo()
+class VhostUserBlkFdMigration(LinuxKernelTest):
+
+    ASSET_KERNEL = Asset(
+        (
+            "https://archives.fedoraproject.org/pub/archive/fedora/linux/releases"
+            "/31/Server/x86_64/os/images/pxeboot/vmlinuz"
+        ),
+        "d4738d03dbbe083ca610d0821d0a8f1488bebbdccef54ce33e3adb35fda00129",
+    )
+
+    ASSET_INITRD = Asset(
+        (
+            "https://archives.fedoraproject.org/pub/archive/fedora/linux/releases"
+            "/31/Server/x86_64/os/images/pxeboot/initrd.img"
+        ),
+        "277cd6c7adf77c7e63d73bbb2cded8ef9e2d3a2f100000e92ff1f8396513cd8b",
+    )
+
+    ASSET_ALPINE_ISO = Asset(
+        (
+            "https://dl-cdn.alpinelinux.org/"
+            "alpine/v3.22/releases/x86_64/alpine-standard-3.22.1-x86_64.iso"
+        ),
+        "96d1b44ea1b8a5a884f193526d92edb4676054e9fa903ad2f016441a0fe13089",
+    )
+
+    def setUp(self):
+        super().setUp()
+
+        init_tap()
+
+        self.outer_ping_proc = None
+
+    def tearDown(self):
+        try:
+            del_tap(TAP_ID)
+            del_tap(TAP_ID2)
+
+            if self.outer_ping_proc:
+                self.stop_outer_ping()
+        finally:
+            super().tearDown()
+
+    def start_outer_ping(self) -> None:
+        assert self.outer_ping_proc is None
+        self.outer_ping_log = self.scratch_file("ping.log")
+        with open(self.outer_ping_log, "w") as f:
+            self.outer_ping_proc = subprocess.Popen(
+                ["ping", "-i", "0", "-O", "-D", GUEST_IP],
+                text=True,
+                stdout=f,
+            )
+
+    def stop_outer_ping(self) -> str:
+        assert self.outer_ping_proc
+        self.outer_ping_proc.send_signal(signal.SIGINT)
+
+        self.outer_ping_proc.communicate(timeout=5)
+        self.outer_ping_proc = None
+
+        with open(self.outer_ping_log) as f:
+            return f.read()
+
+    def stop_ping_and_check(self, stop_time, resume_time):
+        ping_res = self.stop_outer_ping()
+
+        discon, a, b = parse_ping_output(ping_res)
+
+        if not discon:
+            text = (
+                f"STOP: {stop_time}, RESUME: {resume_time}," f"PING: {a} - {b}"
+            )
+            if a > stop_time or b < resume_time:
+                self.fail(f"PING failed: {text}")
+            self.log.info(f"PING: no packets lost: {text}")
+            return
+
+        text = (
+            f"STOP: {stop_time}, RESUME: {resume_time},"
+            f"PING: disconnect: {a} - {b}"
+        )
+        self.log.info(text)
+        eps = 0.01
+        if a < stop_time - eps or b > resume_time + eps:
+            self.fail(text)
+
+    def one_ping_from_guest(self, vm) -> None:
+        exec_command_and_wait_for_pattern(
+            self,
+            f"ping -c 1 -W 1 {HOST_IP}",
+            "1 packets transmitted, 1 packets received",
+            "1 packets transmitted, 0 packets received",
+            vm=vm,
+        )
+        self.wait_for_console_pattern("# ", vm=vm)
+
+    def one_ping_from_host(self) -> None:
+        run(["ping", "-c", "1", "-W", "1", GUEST_IP])
+
+    def setup_shared_memory(self):
+        shm_path = f"/dev/shm/qemu_test_{os.getpid()}"
+
+        try:
+            with open(shm_path, "wb") as f:
+                f.write(b"\0" * (1024 * 1024 * 1024))  # 1GB
+        except Exception as e:
+            self.fail(f"Failed to create shared memory file: {e}")
+
+        return shm_path
+
+    def prepare_and_launch_vm(
+        self, shm_path, vhost, incoming=False, vm=None, backend_transfer=True
+    ):
+        if not vm:
+            vm = self.vm
+
+        vm.set_console()
+        vm.add_args("-accel", "kvm")
+        vm.add_args("-device", "pcie-pci-bridge,id=pci.1,bus=pcie.0")
+        vm.add_args("-m", "1G")
+
+        vm.add_args(
+            "-object",
+            f"memory-backend-file,id=ram0,size=1G,mem-path={shm_path},share=on",
+        )
+        vm.add_args("-machine", "memory-backend=ram0")
+
+        vm.add_args(
+            "-drive",
+            f"file={self.ASSET_ALPINE_ISO.fetch()},media=cdrom,format=raw",
+        )
+
+        vm.add_args("-S")
+
+        if incoming:
+            vm.add_args("-incoming", "defer")
+
+        vm_s = "target" if incoming else "source"
+        self.log.info(f"Launching {vm_s} VM")
+        vm.launch()
+
+        self.set_migration_capabilities(vm, backend_transfer)
+
+        if not backend_transfer:
+            tap_name = TAP_ID2 if incoming else TAP_ID
+        else:
+            tap_name = TAP_ID
+
+        self.add_virtio_net(vm, vhost, tap_name)
+
+    def add_virtio_net(self, vm, vhost: bool, tap_name: str = "tap0"):
+        netdev_params = {
+            "id": "netdev.1",
+            "vhost": vhost,
+            "type": "tap",
+            "ifname": tap_name,
+            "script": "no",
+            "downscript": "no",
+            "queues": 4,
+            "vnet_hdr": True,
+        }
+
+        vm.cmd("netdev_add", netdev_params)
+
+        vm.cmd(
+            "device_add",
+            driver="virtio-net-pci",
+            romfile="",
+            id="vnet.1",
+            netdev="netdev.1",
+            mq=True,
+            vectors=18,
+            bus="pci.1",
+            mac=GUEST_MAC,
+            disable_legacy="off",
+        )
+
+    def set_migration_capabilities(self, vm, backend_transfer=True):
+        capabilities = [
+            {"capability": "events", "state": True},
+            {"capability": "x-ignore-shared", "state": True},
+        ]
+        vm.cmd("migrate-set-capabilities", {"capabilities": capabilities})
+        if backend_transfer:
+            vm.cmd(
+                "migrate-set-parameters",
+                {"backend-transfer": ["virtio-net-tap"]},
+            )
+
+    def setup_guest_network(self) -> None:
+        exec_command_and_wait_for_pattern(self, "ip addr", "# ")
+        exec_command_and_wait_for_pattern(
+            self,
+            f"ip addr add {GUEST_IP_MASK} dev eth0 && "
+            "ip link set eth0 up && echo OK",
+            "OK",
+        )
+        self.wait_for_console_pattern("# ")
+
+    def do_test_tap_fd_migration(self, vhost, backend_transfer=True):
+        self.require_accelerator("kvm")
+        self.set_machine("q35")
+
+        socket_dir = self.socket_dir()
+        migration_socket = os.path.join(socket_dir.name, "migration.sock")
+
+        shm_path = self.setup_shared_memory()
+
+        # Setup second TAP if needed
+        if not backend_transfer:
+            del_tap(TAP_ID2)
+            init_tap(TAP_ID2, with_ip=False)
+
+        self.prepare_and_launch_vm(
+            shm_path, vhost, backend_transfer=backend_transfer
+        )
+        self.vm.cmd("cont")
+        self.wait_for_console_pattern("login:")
+        exec_command_and_wait_for_pattern(self, "root", "# ")
+
+        self.setup_guest_network()
+
+        self.one_ping_from_guest(self.vm)
+        self.one_ping_from_host()
+        self.start_outer_ping()
+
+        # Get some successful pings before migration
+        time.sleep(0.5)
+
+        target_vm = self.get_vm(name="target")
+        self.prepare_and_launch_vm(
+            shm_path,
+            vhost,
+            incoming=True,
+            vm=target_vm,
+            backend_transfer=backend_transfer,
+        )
+
+        target_vm.cmd("migrate-incoming", {"uri": f"unix:{migration_socket}"})
+
+        self.log.info("Starting migration")
+        freeze_start = time.time()
+        self.vm.cmd("migrate", {"uri": f"unix:{migration_socket}"})
+
+        self.log.info("Waiting for migration completion")
+        wait_migration_finish(self.vm, target_vm)
+
+        # Switch network to tap1 if not using backend transfer
+        if not backend_transfer:
+            switch_network_to_tap2()
+
+        target_vm.cmd("cont")
+        freeze_end = time.time()
+
+        self.vm.shutdown()
+
+        self.log.info("Verifying PING on target VM after migration")
+        self.one_ping_from_guest(target_vm)
+        self.one_ping_from_host()
+
+        # And a bit more pings after source shutdown
+        time.sleep(0.3)
+        self.stop_ping_and_check(freeze_start, freeze_end)
+
+        target_vm.shutdown()
+
+    def test_tap_fd_migration(self):
+        self.do_test_tap_fd_migration(False)
+
+    def test_tap_fd_migration_vhost(self):
+        self.do_test_tap_fd_migration(True)
+
+    def test_tap_new_tap_migration(self):
+        self.do_test_tap_fd_migration(False, backend_transfer=False)
+
+    def test_tap_new_tap_migration_vhost(self):
+        self.do_test_tap_fd_migration(True, backend_transfer=False)
+
+
+if __name__ == "__main__":
+    LinuxKernelTest.main()
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 00/19] virtio-net: live-TAP local migration
  2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
                   ` (18 preceding siblings ...)
  2025-10-10 17:39 ` [PATCH v7 19/19] tests/functional: add test_x86_64_tap_migration Vladimir Sementsov-Ogievskiy
@ 2025-10-11 15:26 ` Lei Yang
  19 siblings, 0 replies; 32+ messages in thread
From: Lei Yang @ 2025-10-11 15:26 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: mst, jasowang, peterx, farosas, sw, eblake, armbru, thuth, philmd,
	berrange, qemu-devel, michael.roth, steven.sistare, davydov-max,
	yc-core, raphael.s.norwitz

Tested this series of patches with virtio-net regression tests,
everything works fine.

Tested-by: Lei Yang <leiyang@redhat.com>

On Sat, Oct 11, 2025 at 1:40 AM Vladimir Sementsov-Ogievskiy
<vsementsov@yandex-team.ru> wrote:
>
> Hi all!
>
> Here is a new migration parameter backend-transfer, which allows to
> enable local migration of TAP virtio-net backend, including its
> properties and open fds.
>
> With this new option, management software doesn't need to
> initialize new TAP and do a switch to it. Nothing should be
> done around virtio-net in local migration: it just migrates
> and continues to use same TAP device. So we avoid extra logic
> in management software, extra allocations in kernel (for new TAP),
> and corresponding extra delay in migration downtime.
>
> v7:
>
> 01-13,18: r-b by Maxim Davydov
>           t-b by Lei Yang
>
> 05: fix tap->script to tap->downscript
> 07: tiny rebase conflict around "NetOffloadsd ol = {}"
>
> 14: reworked to vmsd handler
>     tap is migrated inside virtio-net. And we support backend-transfer
>     only for virtio-net+tap. So, it's better to support initialization
>     postponing directly in virtio-net, the code is simplified, and we
>     don't have to manage global list of taps.
>
> 15: reworked on top of 14
>
> 16: - drop QAPI_LIST_CONTAINS macro
>     - improve commit message
>     - improve QAPI documentation comments
>
> 17: - don't add extra check into virtio_net_update_host_features(),
>       as we now can call it only when needed (more explicit logic)
>     - drop extra includes
>     - no need in "attached_to_virtio_net" variable anymore
>     - add .has_tunnel to the state
>
> 19: add also test-cases for TAP migration without backend-transfer
>     (to be sure, that we don't break it with new feature:)
>
> Vladimir Sementsov-Ogievskiy (19):
>   net/tap: net_init_tap_one(): drop extra error propagation
>   net/tap: net_init_tap_one(): move parameter checking earlier
>   net/tap: rework net_tap_init()
>   net/tap: pass NULL to net_init_tap_one() in cases when scripts are
>     NULL
>   net/tap: rework scripts handling
>   net/tap: setup exit notifier only when needed
>   net/tap: split net_tap_fd_init()
>   net/tap: tap_set_sndbuf(): add return value
>   net/tap: rework tap_set_sndbuf()
>   net/tap: rework sndbuf handling
>   net/tap: introduce net_tap_setup()
>   net/tap: move vhost fd initialization to net_tap_new()
>   net/tap: finalize net_tap_set_fd() logic
>   migration: introduce .pre_incoming() vmsd handler
>   net/tap: postpone tap setup to pre-incoming
>   qapi: add interface for backend-transfer virtio-net/tap migration
>   virtio-net: support backend-transfer migration for virtio-net/tap
>   tests/functional: add skipWithoutSudo() decorator
>   tests/functional: add test_x86_64_tap_migration
>
>  hw/net/virtio-net.c                           | 150 ++++++-
>  include/migration/vmstate.h                   |   1 +
>  include/net/tap.h                             |   5 +
>  migration/migration.c                         |   4 +
>  migration/options.c                           |  33 ++
>  migration/options.h                           |   2 +
>  migration/savevm.c                            |  15 +
>  migration/savevm.h                            |   1 +
>  net/tap-bsd.c                                 |   3 +-
>  net/tap-linux.c                               |  19 +-
>  net/tap-solaris.c                             |   3 +-
>  net/tap-stub.c                                |   3 +-
>  net/tap-win32.c                               |  11 +
>  net/tap.c                                     | 425 +++++++++++++-----
>  net/tap_int.h                                 |   3 +-
>  qapi/migration.json                           |  42 +-
>  tests/functional/qemu_test/decorators.py      |  16 +
>  tests/functional/test_x86_64_tap_migration.py | 396 ++++++++++++++++
>  18 files changed, 1001 insertions(+), 131 deletions(-)
>  create mode 100644 tests/functional/test_x86_64_tap_migration.py
>
> --
> 2.48.1
>



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 14/19] migration: introduce .pre_incoming() vmsd handler
  2025-10-10 17:39 ` [PATCH v7 14/19] migration: introduce .pre_incoming() vmsd handler Vladimir Sementsov-Ogievskiy
@ 2025-10-14 16:26   ` Peter Xu
  0 siblings, 0 replies; 32+ messages in thread
From: Peter Xu @ 2025-10-14 16:26 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
	berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
	davydov-max, yc-core, raphael.s.norwitz

On Fri, Oct 10, 2025 at 08:39:52PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Add possibility for devices to hook into top of migrate-incoming QMP
> command. It's a place, where migration capabilities and parameters
> are already set, but migration downtime is not yet started (source
> is still running). So here devices may do some remaining initializations
> dependent on migration capabilities. This will be used in further commit
> to support backend-transfer migration feature for vhost-user-blk.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>

Acked-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration
  2025-10-10 17:39 ` [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration Vladimir Sementsov-Ogievskiy
@ 2025-10-14 16:33   ` Peter Xu
  2025-10-14 19:31     ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Xu @ 2025-10-14 16:33 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
	berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
	davydov-max, yc-core, raphael.s.norwitz

On Fri, Oct 10, 2025 at 08:39:54PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> To migrate virtio-net TAP device backend (including open fds) locally,
> user should simply set migration parameter
> 
>    backend-transfer = ["virtio-net-tap"]
> 
> Why not simple boolean? To simplify migration to further versions,
> when more devices will support backend-transfer migration.
> 
> Alternatively, we may add per-device option to disable backend-transfer
> migration, but still:
> 
> 1. It's more comfortable to set same capabilities/parameters on both
> source and target QEMU, than care about each device.

But it loses per-device control, right?  Say, we can have two devices, and
the admin can decide if only one of the devices will enable this feature.

> 
> 2. To not break the design, that machine-type + device options +
> migration capabilities and parameters are fully define the resulting
> migration stream. We'll break this if add in future more
> backend-transfer support in devices under same backend-transfer=true
> parameter.

Could you elaborate?

I thought last time we discussed, we planned to have both the global knob
and a per-device flag, then the feature is enabled only if both flags are
set.

If these parameters are all set the same on src/dst, would it also not
break the design when new devices start to support it (and the new device
will need to introduce its own per-device flags)?

> 
> The commit only brings the interface, the realization will come in later
> commit. That's why we add a temporary not-implemented error in
> migrate_params_check().
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
>  migration/options.c | 39 +++++++++++++++++++++++++++++++++++++++
>  migration/options.h |  2 ++
>  qapi/migration.json | 42 ++++++++++++++++++++++++++++++++++++------
>  3 files changed, 77 insertions(+), 6 deletions(-)
> 
> diff --git a/migration/options.c b/migration/options.c
> index 5183112775..76709af3ab 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -13,6 +13,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qemu/error-report.h"
> +#include "qapi/util.h"
>  #include "exec/target_page.h"
>  #include "qapi/clone-visitor.h"
>  #include "qapi/error.h"
> @@ -262,6 +263,20 @@ bool migrate_mapped_ram(void)
>      return s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM];
>  }
>  
> +bool migrate_virtio_net_tap(void)
> +{
> +    MigrationState *s = migrate_get_current();
> +    BackendTransferList *el = s->parameters.backend_transfer;
> +
> +    for ( ; el; el = el->next) {
> +        if (el->value == BACKEND_TRANSFER_VIRTIO_NET_TAP) {

So this is also something I want to avoid.  The hope is we don't
necessarily need to invent new device names into qapi/migration.json.
OTOH, we can export a helper in migration/misc.h so that devices can query
wehther the global feature is enabled or not, using that to AND the
per-device flag.

Thanks,

> +            return true;
> +        }
> +    }
> +
> +    return false;
> +}
> +
>  bool migrate_ignore_shared(void)
>  {
>      MigrationState *s = migrate_get_current();
> @@ -963,6 +978,12 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
>      params->cpr_exec_command = QAPI_CLONE(strList,
>                                            s->parameters.cpr_exec_command);
>  
> +    if (s->parameters.backend_transfer) {
> +        params->has_backend_transfer = true;
> +        params->backend_transfer = QAPI_CLONE(BackendTransferList,
> +                                              s->parameters.backend_transfer);
> +    }
> +
>      return params;
>  }
>  
> @@ -997,6 +1018,7 @@ void migrate_params_init(MigrationParameters *params)
>      params->has_zero_page_detection = true;
>      params->has_direct_io = true;
>      params->has_cpr_exec_command = true;
> +    params->has_backend_transfer = true;
>  }
>  
>  /*
> @@ -1183,6 +1205,12 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
>          return false;
>      }
>  
> +    /* TODO: implement backend-transfer and remove this check */
> +    if (params->has_backend_transfer) {
> +        error_setg(errp, "Not implemented");
> +        return false;
> +    }
> +
>      return true;
>  }
>  
> @@ -1305,6 +1333,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
>      if (params->has_cpr_exec_command) {
>          dest->cpr_exec_command = params->cpr_exec_command;
>      }
> +
> +    if (params->has_backend_transfer) {
> +        dest->backend_transfer = params->backend_transfer;
> +    }
>  }
>  
>  static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
> @@ -1443,6 +1475,13 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
>          s->parameters.cpr_exec_command =
>              QAPI_CLONE(strList, params->cpr_exec_command);
>      }
> +
> +    if (params->has_backend_transfer) {
> +        qapi_free_BackendTransferList(s->parameters.backend_transfer);
> +
> +        s->parameters.backend_transfer = QAPI_CLONE(BackendTransferList,
> +                                                    params->backend_transfer);
> +    }
>  }
>  
>  void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
> diff --git a/migration/options.h b/migration/options.h
> index 82d839709e..55c0345433 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -87,6 +87,8 @@ const char *migrate_tls_hostname(void);
>  uint64_t migrate_xbzrle_cache_size(void);
>  ZeroPageDetection migrate_zero_page_detection(void);
>  
> +bool migrate_virtio_net_tap(void);
> +
>  /* parameters helpers */
>  
>  bool migrate_params_check(MigrationParameters *params, Error **errp);
> diff --git a/qapi/migration.json b/qapi/migration.json
> index be0f3fcc12..1bfe7df191 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -770,6 +770,19 @@
>        '*transform': 'BitmapMigrationBitmapAliasTransform'
>    } }
>  
> +##
> +# @BackendTransfer:
> +#
> +# @virtio-net-tap: Enable backend-transfer migration for
> +#     virtio-net/tap. When enabled, TAP fds and all related state are
> +#     passed to the destination in the migration channel (which must
> +#     be a UNIX domain socket).
> +#
> +# Since: 10.2
> +##
> +{ 'enum': 'BackendTransfer',
> +  'data': [ 'virtio-net-tap' ] }
> +
>  ##
>  # @BitmapMigrationNodeAlias:
>  #
> @@ -951,9 +964,13 @@
>  #     is @cpr-exec.  The first list element is the program's filename,
>  #     the remainder its arguments.  (Since 10.2)
>  #
> +# @backend-transfer: List of targets for backend-transfer migration.
> +#     See description in `BackendTransfer`.  Default is no
> +#     backend-transfer migration (Since 10.2)
> +#
>  # Features:
>  #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
>  #     @x-vcpu-dirty-limit-period are experimental.
>  #
>  # Since: 2.4
> @@ -978,7 +995,8 @@
>             'mode',
>             'zero-page-detection',
>             'direct-io',
> -           'cpr-exec-command'] }
> +           'cpr-exec-command',
> +           { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
>  
>  ##
>  # @MigrateSetParameters:
> @@ -1137,9 +1155,13 @@
>  #     is @cpr-exec.  The first list element is the program's filename,
>  #     the remainder its arguments.  (Since 10.2)
>  #
> +# @backend-transfer: List of targets for backend-transfer migration.
> +#     See description in `BackendTransfer`.  Default is no
> +#     backend-transfer migration (Since 10.2)
> +#
>  # Features:
>  #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
>  #     @x-vcpu-dirty-limit-period are experimental.
>  #
>  # TODO: either fuse back into `MigrationParameters`, or make
> @@ -1179,7 +1201,9 @@
>              '*mode': 'MigMode',
>              '*zero-page-detection': 'ZeroPageDetection',
>              '*direct-io': 'bool',
> -            '*cpr-exec-command': [ 'str' ]} }
> +            '*cpr-exec-command': [ 'str' ],
> +            '*backend-transfer': { 'type': [ 'BackendTransfer' ],
> +                                   'features': [ 'unstable' ] } } }
>  
>  ##
>  # @migrate-set-parameters:
> @@ -1352,9 +1376,13 @@
>  #     is @cpr-exec.  The first list element is the program's filename,
>  #     the remainder its arguments.  (Since 10.2)
>  #
> +# @backend-transfer: List of targets for backend-transfer migration.
> +#     See description in `BackendTransfer`.  Default is no
> +#     backend-transfer migration (Since 10.2)
> +#
>  # Features:
>  #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
>  #     @x-vcpu-dirty-limit-period are experimental.
>  #
>  # Since: 2.4
> @@ -1391,7 +1419,9 @@
>              '*mode': 'MigMode',
>              '*zero-page-detection': 'ZeroPageDetection',
>              '*direct-io': 'bool',
> -            '*cpr-exec-command': [ 'str' ]} }
> +            '*cpr-exec-command': [ 'str' ],
> +            '*backend-transfer': { 'type': [ 'BackendTransfer' ],
> +                                   'features': [ 'unstable' ] } } }
>  
>  ##
>  # @query-migrate-parameters:
> -- 
> 2.48.1
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration
  2025-10-14 16:33   ` Peter Xu
@ 2025-10-14 19:31     ` Vladimir Sementsov-Ogievskiy
  2025-10-14 20:25       ` Peter Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-14 19:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
	berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
	davydov-max, yc-core, raphael.s.norwitz

On 14.10.25 19:33, Peter Xu wrote:
> On Fri, Oct 10, 2025 at 08:39:54PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> To migrate virtio-net TAP device backend (including open fds) locally,
>> user should simply set migration parameter
>>
>>     backend-transfer = ["virtio-net-tap"]
>>
>> Why not simple boolean? To simplify migration to further versions,
>> when more devices will support backend-transfer migration.
>>
>> Alternatively, we may add per-device option to disable backend-transfer
>> migration, but still:
>>
>> 1. It's more comfortable to set same capabilities/parameters on both
>> source and target QEMU, than care about each device.
> 
> But it loses per-device control, right?  Say, we can have two devices, and
> the admin can decide if only one of the devices will enable this feature.
> 

Right. But, in short:

1. I'm not sure, that such granularity is necessary.

2. It may implemented later, on top of the feature.

>>
>> 2. To not break the design, that machine-type + device options +
>> migration capabilities and parameters are fully define the resulting
>> migration stream. We'll break this if add in future more
>> backend-transfer support in devices under same backend-transfer=true
>> parameter.
> 
> Could you elaborate?
> 
> I thought last time we discussed, we planned to have both the global knob
> and a per-device flag, then the feature is enabled only if both flags are
> set.

Right, here in v3: https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01644.html

Still at this point, I also needed local-incoming=true target option, so I
considered all the parameters like "I can't make feature without extra
per-device options, so here they are".

A day later, after motivating comment from Markus (accidentally in v2),
I found and suggested the way:

https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01960.html

And further versions v4-v7 were the realization of the idea. Still, main
benefit is possibility to get rid of per-device local-incoming=true
options for target, not about a kind of per-device "capability" flag we
discuss now.

A, and here I said [1]:

> 1. global fds-passing migration capability, to enable/disable the whole feature
> 
> 2. per-device fds-passing option, on by default for all supporting devices, to 
> be
> able to disable backing migration for some devices. (we discussed it here: 
> https://lore.kernel.org/all/aL8kuXQ2JF1TV3M7@x1.local/ ).
> Still, normally these options are always on by default.
> And more over, I can postpone their implementation to separate series, to 
> reduce discussion field, and to check that everything may work without 
> additional user input.

And then, went this way, postponing realization of per-device options..

And then, developing similar migration for vhost-user-blk, found
that I can't use on boolean capability for such features, the reason
in commit message, which we discuss now.

Than, current design came in v5 (v4 was skipped).. And I even got an
approval from Fabiano :)

https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg03999.html

> 
> If these parameters are all set the same on src/dst, would it also not
> break the design when new devices start to support it (and the new device
> will need to introduce its own per-device flags)?

Yes, right.

I missed, that, "postponing (probably forever)" per-device options
realization, I started to implement another way to solve the same
problem (switching from one boolean capability to a backend-transfer
list).

In other words, if at some point implement per-device options, that will
partly intersect by functionality with current complex migration
parameter..

-

But still, I think, that parameter backend-transfer = [list of targets]
is better than per-device option. With per-device options we'll have to
care about them forever. I can't imagine a way to make them TRUE by
default.

Using machine type, to set option to TRUE by default in new MT, and to
false in all previous ones doesn't make real sense: we never migrate on
another MT, but we do can migrate from QEMU without support for
virtio-net backend transfer to the QEMU with such support. And on target
QEMU we'll want to enable virtio-net backend-transfer for further
migrations..

So, I think, modifying machine types is wrong idea here. So, we have to
keep new options FALSE by default, and management tool have to care to
set them appropriately.

-

Let's look from the POV of management tool.

With complex parameter (list of backend-transfer targets, suggested with
this series), what should we do?

1. With introspection, get backend-transfer targets supported by source
    and target QEMUs
2. Get and intersection, assume X
3. Set same backend-transfer=X on source and target
4. Start a migration

But with per-device parameters it becomes a lot more complicated and
error prone

1. Somehow understand (how?), which devices support backend-transfer on
    source and target
2. Get an intersection
3. Set all the backend-transfer options on both vms correspondingly,
    doing personal qom-set for each device
4. Start a migration

-

In short:

1. per device - is too high granularity, making management more complex

2. per feature - is what we need. And it's a normal use for migration
capabilities: we implement a new migration feature, and add new
capability. The only new bit with this series is that "we are going to"
implement similar capabilities later, and seems good to organize them
all into a list, rather than make separate booleans.

> 
>>
>> The commit only brings the interface, the realization will come in later
>> commit. That's why we add a temporary not-implemented error in
>> migrate_params_check().
>>

[..]

>>   
>> +bool migrate_virtio_net_tap(void)
>> +{
>> +    MigrationState *s = migrate_get_current();
>> +    BackendTransferList *el = s->parameters.backend_transfer;
>> +
>> +    for ( ; el; el = el->next) {
>> +        if (el->value == BACKEND_TRANSFER_VIRTIO_NET_TAP) {
> 
> So this is also something I want to avoid.  The hope is we don't
> necessarily need to invent new device names into qapi/migration.json.
> OTOH, we can export a helper in migration/misc.h so that devices can query
> wehther the global feature is enabled or not, using that to AND the
> per-device flag.
> 

Understand. But I can't imagine how to keep management simple with per-device
options..

-

What do you think?

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration
  2025-10-14 19:31     ` Vladimir Sementsov-Ogievskiy
@ 2025-10-14 20:25       ` Peter Xu
  2025-10-14 21:46         ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Xu @ 2025-10-14 20:25 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
	berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
	davydov-max, yc-core, raphael.s.norwitz

On Tue, Oct 14, 2025 at 10:31:30PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 14.10.25 19:33, Peter Xu wrote:
> > On Fri, Oct 10, 2025 at 08:39:54PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > To migrate virtio-net TAP device backend (including open fds) locally,
> > > user should simply set migration parameter
> > > 
> > >     backend-transfer = ["virtio-net-tap"]
> > > 
> > > Why not simple boolean? To simplify migration to further versions,
> > > when more devices will support backend-transfer migration.
> > > 
> > > Alternatively, we may add per-device option to disable backend-transfer
> > > migration, but still:
> > > 
> > > 1. It's more comfortable to set same capabilities/parameters on both
> > > source and target QEMU, than care about each device.
> > 
> > But it loses per-device control, right?  Say, we can have two devices, and
> > the admin can decide if only one of the devices will enable this feature.
> > 
> 
> Right. But, in short:
> 
> 1. I'm not sure, that such granularity is necessary.
> 
> 2. It may implemented later, on top of the feature.

I confess that's not a good example, but my point was that it was
straightforward idea to have two layers of settings, meanwhile it provides
full flexiblity.

> 
> > > 
> > > 2. To not break the design, that machine-type + device options +
> > > migration capabilities and parameters are fully define the resulting
> > > migration stream. We'll break this if add in future more
> > > backend-transfer support in devices under same backend-transfer=true
> > > parameter.
> > 
> > Could you elaborate?
> > 
> > I thought last time we discussed, we planned to have both the global knob
> > and a per-device flag, then the feature is enabled only if both flags are
> > set.
> 
> Right, here in v3: https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01644.html
> 
> Still at this point, I also needed local-incoming=true target option, so I
> considered all the parameters like "I can't make feature without extra
> per-device options, so here they are".
> 
> A day later, after motivating comment from Markus (accidentally in v2),
> I found and suggested the way:
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01960.html
> 
> And further versions v4-v7 were the realization of the idea. Still, main
> benefit is possibility to get rid of per-device local-incoming=true
> options for target, not about a kind of per-device "capability" flag we
> discuss now.
> 
> A, and here I said [1]:
> 
> > 1. global fds-passing migration capability, to enable/disable the whole feature
> > 
> > 2. per-device fds-passing option, on by default for all supporting
> > devices, to be
> > able to disable backing migration for some devices. (we discussed it
> > here: https://lore.kernel.org/all/aL8kuXQ2JF1TV3M7@x1.local/ ).
> > Still, normally these options are always on by default.
> > And more over, I can postpone their implementation to separate series,
> > to reduce discussion field, and to check that everything may work
> > without additional user input.
> 
> And then, went this way, postponing realization of per-device options..

Postponing the per-device flag might still break different backends if you
specify the list with virtio-net-pci.

But only until now, I noticed you were using "virtio-net-tap" instead of
"virtio-net-pci".

Ouch.. I think that's even more complicated. :(

Here I think the problem is, introducing some arbitrary strings into
migration QAPI to represent some combinations of "virtio frontend F1" and
"virtio backend B1" doesn't sound the right thing to do.  Migration ideally
should have zero knowledge of the device topology, types of devices,
frontends or backends.  "virtio-*" as a string should not appear in
migration/ or qapi/migration.json at all..

> 
> And then, developing similar migration for vhost-user-blk, found
> that I can't use on boolean capability for such features, the reason
> in commit message, which we discuss now.

Why a bool isn't enough?  Could you share a link to that discussion?

> 
> Than, current design came in v5 (v4 was skipped).. And I even got an
> approval from Fabiano :)
> 
> https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg03999.html
> 
> > 
> > If these parameters are all set the same on src/dst, would it also not
> > break the design when new devices start to support it (and the new device
> > will need to introduce its own per-device flags)?
> 
> Yes, right.
> 
> I missed, that, "postponing (probably forever)" per-device options
> realization, I started to implement another way to solve the same
> problem (switching from one boolean capability to a backend-transfer
> list).
> 
> In other words, if at some point implement per-device options, that will
> partly intersect by functionality with current complex migration
> parameter..
> 
> -
> 
> But still, I think, that parameter backend-transfer = [list of targets]
> is better than per-device option. With per-device options we'll have to
> care about them forever. I can't imagine a way to make them TRUE by
> default.
> 
> Using machine type, to set option to TRUE by default in new MT, and to
> false in all previous ones doesn't make real sense: we never migrate on
> another MT, but we do can migrate from QEMU without support for
> virtio-net backend transfer to the QEMU with such support. And on target
> QEMU we'll want to enable virtio-net backend-transfer for further
> migrations..

So this is likely why you changed your mind.  I think machine properties
definitely make sense.

We set it OFF on old machines because when on old machines the src QEMU
_may_ not support this feature.  We set it ON on new machines because when
the QEMU has the new machine declared anyway, it is guaranteed to support
the feature.

We can still manually set the per-device properties iff the admin is sure
that both sides of "old" QEMUs support this feature.  However machine
properties worked like that for many years and I believe that's how it
works, by being always on the safe side.

> 
> So, I think, modifying machine types is wrong idea here. So, we have to
> keep new options FALSE by default, and management tool have to care to
> set them appropriately.
> 
> -
> 
> Let's look from the POV of management tool.
> 
> With complex parameter (list of backend-transfer targets, suggested with
> this series), what should we do?
> 
> 1. With introspection, get backend-transfer targets supported by source
>    and target QEMUs
> 2. Get and intersection, assume X
> 3. Set same backend-transfer=X on source and target
> 4. Start a migration
> 
> But with per-device parameters it becomes a lot more complicated and
> error prone
> 
> 1. Somehow understand (how?), which devices support backend-transfer on
>    source and target
> 2. Get an intersection
> 3. Set all the backend-transfer options on both vms correspondingly,
>    doing personal qom-set for each device
> 4. Start a migration
> 
> -
> 
> In short:
> 
> 1. per device - is too high granularity, making management more complex

If we follow the machine property way of doing this (which I believe we
used for years), then mgmt doesn't need any change except properly enable
fd-passing in migration cap/params when it's a local migration.  That's
all.  It doesn't need to know anything about "which device(s) supports
fd-passing", because they'll all be auto-set by the machine types.

> 
> 2. per feature - is what we need. And it's a normal use for migration
> capabilities: we implement a new migration feature, and add new
> capability. The only new bit with this series is that "we are going to"
> implement similar capabilities later, and seems good to organize them
> all into a list, rather than make separate booleans.
> 
> 
> > 
> > > 
> > > The commit only brings the interface, the realization will come in later
> > > commit. That's why we add a temporary not-implemented error in
> > > migrate_params_check().
> > > 
> 
> [..]
> 
> > > +bool migrate_virtio_net_tap(void)
> > > +{
> > > +    MigrationState *s = migrate_get_current();
> > > +    BackendTransferList *el = s->parameters.backend_transfer;
> > > +
> > > +    for ( ; el; el = el->next) {
> > > +        if (el->value == BACKEND_TRANSFER_VIRTIO_NET_TAP) {
> > 
> > So this is also something I want to avoid.  The hope is we don't
> > necessarily need to invent new device names into qapi/migration.json.
> > OTOH, we can export a helper in migration/misc.h so that devices can query
> > wehther the global feature is enabled or not, using that to AND the
> > per-device flag.
> > 
> 
> Understand. But I can't imagine how to keep management simple with per-device
> options..
> 
> -
> 
> What do you think?

I feel like you wanted to enable this feature _while_ using an old machine
type.  Is that what you're looking for?  Can you simply urge the users to
move to new machine types when looking for new features?  I believe that's
what we do..

MT properties were working like that for a long time.  What you were asking
is fair, but if so I'd still like to double check with you on that's your
real purpose (enabling this feature on NEW qemus but OLD machine types, all
automatically).

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration
  2025-10-14 20:25       ` Peter Xu
@ 2025-10-14 21:46         ` Vladimir Sementsov-Ogievskiy
  2025-10-14 21:54           ` Vladimir Sementsov-Ogievskiy
  2025-10-15 18:27           ` Peter Xu
  0 siblings, 2 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-14 21:46 UTC (permalink / raw)
  To: Peter Xu
  Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
	berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
	davydov-max, yc-core, raphael.s.norwitz

On 14.10.25 23:25, Peter Xu wrote:
> On Tue, Oct 14, 2025 at 10:31:30PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 14.10.25 19:33, Peter Xu wrote:
>>> On Fri, Oct 10, 2025 at 08:39:54PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> To migrate virtio-net TAP device backend (including open fds) locally,
>>>> user should simply set migration parameter
>>>>
>>>>      backend-transfer = ["virtio-net-tap"]
>>>>
>>>> Why not simple boolean? To simplify migration to further versions,
>>>> when more devices will support backend-transfer migration.
>>>>
>>>> Alternatively, we may add per-device option to disable backend-transfer
>>>> migration, but still:
>>>>
>>>> 1. It's more comfortable to set same capabilities/parameters on both
>>>> source and target QEMU, than care about each device.
>>>
>>> But it loses per-device control, right?  Say, we can have two devices, and
>>> the admin can decide if only one of the devices will enable this feature.
>>>
>>
>> Right. But, in short:
>>
>> 1. I'm not sure, that such granularity is necessary.
>>
>> 2. It may implemented later, on top of the feature.
> 
> I confess that's not a good example, but my point was that it was
> straightforward idea to have two layers of settings, meanwhile it provides
> full flexiblity.
> 
>>
>>>>
>>>> 2. To not break the design, that machine-type + device options +
>>>> migration capabilities and parameters are fully define the resulting
>>>> migration stream. We'll break this if add in future more
>>>> backend-transfer support in devices under same backend-transfer=true
>>>> parameter.
>>>
>>> Could you elaborate?
>>>
>>> I thought last time we discussed, we planned to have both the global knob
>>> and a per-device flag, then the feature is enabled only if both flags are
>>> set.
>>
>> Right, here in v3: https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01644.html
>>
>> Still at this point, I also needed local-incoming=true target option, so I
>> considered all the parameters like "I can't make feature without extra
>> per-device options, so here they are".
>>
>> A day later, after motivating comment from Markus (accidentally in v2),
>> I found and suggested the way:
>>
>> https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01960.html
>>
>> And further versions v4-v7 were the realization of the idea. Still, main
>> benefit is possibility to get rid of per-device local-incoming=true
>> options for target, not about a kind of per-device "capability" flag we
>> discuss now.
>>
>> A, and here I said [1]:
>>
>>> 1. global fds-passing migration capability, to enable/disable the whole feature
>>>
>>> 2. per-device fds-passing option, on by default for all supporting
>>> devices, to be
>>> able to disable backing migration for some devices. (we discussed it
>>> here: https://lore.kernel.org/all/aL8kuXQ2JF1TV3M7@x1.local/ ).
>>> Still, normally these options are always on by default.
>>> And more over, I can postpone their implementation to separate series,
>>> to reduce discussion field, and to check that everything may work
>>> without additional user input.
>>
>> And then, went this way, postponing realization of per-device options..
> 
> Postponing the per-device flag might still break different backends if you
> specify the list with virtio-net-pci.
> 
> But only until now, I noticed you were using "virtio-net-tap" instead of
> "virtio-net-pci".
> 
> Ouch.. I think that's even more complicated. :(
> 
> Here I think the problem is, introducing some arbitrary strings into
> migration QAPI to represent some combinations of "virtio frontend F1" and
> "virtio backend B1" doesn't sound the right thing to do.  Migration ideally
> should have zero knowledge of the device topology, types of devices,
> frontends or backends.  "virtio-*" as a string should not appear in
> migration/ or qapi/migration.json at all..
> 
>>
>> And then, developing similar migration for vhost-user-blk, found
>> that I can't use on boolean capability for such features, the reason
>> in commit message, which we discuss now.
> 
> Why a bool isn't enough?  Could you share a link to that discussion?
> 
>>
>> Than, current design came in v5 (v4 was skipped).. And I even got an
>> approval from Fabiano :)
>>
>> https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg03999.html
>>
>>>
>>> If these parameters are all set the same on src/dst, would it also not
>>> break the design when new devices start to support it (and the new device
>>> will need to introduce its own per-device flags)?
>>
>> Yes, right.
>>
>> I missed, that, "postponing (probably forever)" per-device options
>> realization, I started to implement another way to solve the same
>> problem (switching from one boolean capability to a backend-transfer
>> list).
>>
>> In other words, if at some point implement per-device options, that will
>> partly intersect by functionality with current complex migration
>> parameter..
>>
>> -
>>
>> But still, I think, that parameter backend-transfer = [list of targets]
>> is better than per-device option. With per-device options we'll have to
>> care about them forever. I can't imagine a way to make them TRUE by
>> default.
>>
>> Using machine type, to set option to TRUE by default in new MT, and to
>> false in all previous ones doesn't make real sense: we never migrate on
>> another MT, but we do can migrate from QEMU without support for
>> virtio-net backend transfer to the QEMU with such support. And on target
>> QEMU we'll want to enable virtio-net backend-transfer for further
>> migrations..
> 
> So this is likely why you changed your mind.  I think machine properties
> definitely make sense.
> 
> We set it OFF on old machines because when on old machines the src QEMU
> _may_ not support this feature.  We set it ON on new machines because when
> the QEMU has the new machine declared anyway, it is guaranteed to support
> the feature.
> 
> We can still manually set the per-device properties iff the admin is sure
> that both sides of "old" QEMUs support this feature.  However machine
> properties worked like that for many years and I believe that's how it
> works, by being always on the safe side.
> 
>>
>> So, I think, modifying machine types is wrong idea here. So, we have to
>> keep new options FALSE by default, and management tool have to care to
>> set them appropriately.
>>
>> -
>>
>> Let's look from the POV of management tool.
>>
>> With complex parameter (list of backend-transfer targets, suggested with
>> this series), what should we do?
>>
>> 1. With introspection, get backend-transfer targets supported by source
>>     and target QEMUs
>> 2. Get and intersection, assume X
>> 3. Set same backend-transfer=X on source and target
>> 4. Start a migration
>>
>> But with per-device parameters it becomes a lot more complicated and
>> error prone
>>
>> 1. Somehow understand (how?), which devices support backend-transfer on
>>     source and target
>> 2. Get an intersection
>> 3. Set all the backend-transfer options on both vms correspondingly,
>>     doing personal qom-set for each device
>> 4. Start a migration
>>
>> -
>>
>> In short:
>>
>> 1. per device - is too high granularity, making management more complex
> 
> If we follow the machine property way of doing this (which I believe we
> used for years), then mgmt doesn't need any change except properly enable
> fd-passing in migration cap/params when it's a local migration.  That's
> all.  It doesn't need to know anything about "which device(s) supports
> fd-passing", because they'll all be auto-set by the machine types.
> 
>>
>> 2. per feature - is what we need. And it's a normal use for migration
>> capabilities: we implement a new migration feature, and add new
>> capability. The only new bit with this series is that "we are going to"
>> implement similar capabilities later, and seems good to organize them
>> all into a list, rather than make separate booleans.
>>
>>
>>>
>>>>
>>>> The commit only brings the interface, the realization will come in later
>>>> commit. That's why we add a temporary not-implemented error in
>>>> migrate_params_check().
>>>>
>>
>> [..]
>>
>>>> +bool migrate_virtio_net_tap(void)
>>>> +{
>>>> +    MigrationState *s = migrate_get_current();
>>>> +    BackendTransferList *el = s->parameters.backend_transfer;
>>>> +
>>>> +    for ( ; el; el = el->next) {
>>>> +        if (el->value == BACKEND_TRANSFER_VIRTIO_NET_TAP) {
>>>
>>> So this is also something I want to avoid.  The hope is we don't
>>> necessarily need to invent new device names into qapi/migration.json.
>>> OTOH, we can export a helper in migration/misc.h so that devices can query
>>> wehther the global feature is enabled or not, using that to AND the
>>> per-device flag.
>>>
>>
>> Understand. But I can't imagine how to keep management simple with per-device
>> options..
>>
>> -
>>
>> What do you think?
> 
> I feel like you wanted to enable this feature _while_ using an old machine
> type.

Exactly

> Is that what you're looking for?  Can you simply urge the users to
> move to new machine types when looking for new features?  I believe that's
> what we do..
> 
> MT properties were working like that for a long time.  What you were asking
> is fair, but if so I'd still like to double check with you on that's your
> real purpose (enabling this feature on NEW qemus but OLD machine types, all
> automatically).
> 

You made me think.

On the one hand, you are right, I agree with all arguments about migration
being separate from virtio device types, their backends and frontends.

And yes, if refuse the idea of enabling the feature in old machine types
automatically, everything fits into existing paradigm.

On the other hand is our downstream practice in the cloud. We introduce
new machine types _very_ seldom. Almost always, new features developed
or backported to our downstream doesn't require new machine type. In such
situation, creating feature, which theoretically (and more simple in API!)
may be done without introducing new MT, but creating it by introducing new
MT, postponing the moment when we start to widely use it up to the moment when
most of existing vms will die or restart naturally (as for sure, we'll not
ask users to restart them, it would be too expensive (not saying about,
is restart a safe way to change MT, or we'd better recreate a vm), seems
very strange for me. (too long sentence detector blinking).

So, finally, it's OK for me to switch to per-device properties. Then, in
downstream I may implement corresponding capabilities to simplify management.
That's rather simple.

-

Interesting, could migration "return path" be somehow used to get information
from target, does it support backend transfer for concrete device?

So that, we simply enable backend-transfer=true parameter both on
source and target. Than, source somehow find out through return path,
for the device, does target support backend-transfer for it, and decide,
what to do? Or that's too complicated?

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration
  2025-10-14 21:46         ` Vladimir Sementsov-Ogievskiy
@ 2025-10-14 21:54           ` Vladimir Sementsov-Ogievskiy
  2025-10-15  7:11             ` Daniel P. Berrangé
  2025-10-15 18:27           ` Peter Xu
  1 sibling, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-14 21:54 UTC (permalink / raw)
  To: Peter Xu
  Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
	berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
	davydov-max, yc-core, raphael.s.norwitz

On 15.10.25 00:46, Vladimir Sementsov-Ogievskiy wrote:
>>
>> And then, developing similar migration for vhost-user-blk, found
>> that I can't use on boolean capability for such features, the reason
>> in commit message, which we discuss now.
> 
> Why a bool isn't enough?  Could you share a link to that discussion?

I mean, one boolean is not enough for different devices, when not assisted
by per-device options. So, I came to idea of "list of backend targets"
in migration parameter.

It doesn't matter, our discussion has already gone far ahead)

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration
  2025-10-14 21:54           ` Vladimir Sementsov-Ogievskiy
@ 2025-10-15  7:11             ` Daniel P. Berrangé
  0 siblings, 0 replies; 32+ messages in thread
From: Daniel P. Berrangé @ 2025-10-15  7:11 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Peter Xu, mst, jasowang, farosas, sw, eblake, armbru, thuth,
	philmd, qemu-devel, michael.roth, steven.sistare, leiyang,
	davydov-max, yc-core, raphael.s.norwitz

On Wed, Oct 15, 2025 at 12:54:21AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.10.25 00:46, Vladimir Sementsov-Ogievskiy wrote:
> > > 
> > > And then, developing similar migration for vhost-user-blk, found
> > > that I can't use on boolean capability for such features, the reason
> > > in commit message, which we discuss now.
> > 
> > Why a bool isn't enough?  Could you share a link to that discussion?
> 
> I mean, one boolean is not enough for different devices, when not assisted
> by per-device options. So, I came to idea of "list of backend targets"
> in migration parameter.

If we need to identify backends or frontends, surely we should be using
the "id" that the mgmt app used when creating the object, that gets set
in the QOM tree.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration
  2025-10-14 21:46         ` Vladimir Sementsov-Ogievskiy
  2025-10-14 21:54           ` Vladimir Sementsov-Ogievskiy
@ 2025-10-15 18:27           ` Peter Xu
  2025-10-15 20:17             ` Vladimir Sementsov-Ogievskiy
  1 sibling, 1 reply; 32+ messages in thread
From: Peter Xu @ 2025-10-15 18:27 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
	berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
	davydov-max, yc-core, raphael.s.norwitz

On Wed, Oct 15, 2025 at 12:46:26AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 14.10.25 23:25, Peter Xu wrote:
> > On Tue, Oct 14, 2025 at 10:31:30PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > On 14.10.25 19:33, Peter Xu wrote:
> > > > On Fri, Oct 10, 2025 at 08:39:54PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > To migrate virtio-net TAP device backend (including open fds) locally,
> > > > > user should simply set migration parameter
> > > > > 
> > > > >      backend-transfer = ["virtio-net-tap"]
> > > > > 
> > > > > Why not simple boolean? To simplify migration to further versions,
> > > > > when more devices will support backend-transfer migration.
> > > > > 
> > > > > Alternatively, we may add per-device option to disable backend-transfer
> > > > > migration, but still:
> > > > > 
> > > > > 1. It's more comfortable to set same capabilities/parameters on both
> > > > > source and target QEMU, than care about each device.
> > > > 
> > > > But it loses per-device control, right?  Say, we can have two devices, and
> > > > the admin can decide if only one of the devices will enable this feature.
> > > > 
> > > 
> > > Right. But, in short:
> > > 
> > > 1. I'm not sure, that such granularity is necessary.
> > > 
> > > 2. It may implemented later, on top of the feature.
> > 
> > I confess that's not a good example, but my point was that it was
> > straightforward idea to have two layers of settings, meanwhile it provides
> > full flexiblity.
> > 
> > > 
> > > > > 
> > > > > 2. To not break the design, that machine-type + device options +
> > > > > migration capabilities and parameters are fully define the resulting
> > > > > migration stream. We'll break this if add in future more
> > > > > backend-transfer support in devices under same backend-transfer=true
> > > > > parameter.
> > > > 
> > > > Could you elaborate?
> > > > 
> > > > I thought last time we discussed, we planned to have both the global knob
> > > > and a per-device flag, then the feature is enabled only if both flags are
> > > > set.
> > > 
> > > Right, here in v3: https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01644.html
> > > 
> > > Still at this point, I also needed local-incoming=true target option, so I
> > > considered all the parameters like "I can't make feature without extra
> > > per-device options, so here they are".
> > > 
> > > A day later, after motivating comment from Markus (accidentally in v2),
> > > I found and suggested the way:
> > > 
> > > https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg01960.html
> > > 
> > > And further versions v4-v7 were the realization of the idea. Still, main
> > > benefit is possibility to get rid of per-device local-incoming=true
> > > options for target, not about a kind of per-device "capability" flag we
> > > discuss now.
> > > 
> > > A, and here I said [1]:
> > > 
> > > > 1. global fds-passing migration capability, to enable/disable the whole feature
> > > > 
> > > > 2. per-device fds-passing option, on by default for all supporting
> > > > devices, to be
> > > > able to disable backing migration for some devices. (we discussed it
> > > > here: https://lore.kernel.org/all/aL8kuXQ2JF1TV3M7@x1.local/ ).
> > > > Still, normally these options are always on by default.
> > > > And more over, I can postpone their implementation to separate series,
> > > > to reduce discussion field, and to check that everything may work
> > > > without additional user input.
> > > 
> > > And then, went this way, postponing realization of per-device options..
> > 
> > Postponing the per-device flag might still break different backends if you
> > specify the list with virtio-net-pci.
> > 
> > But only until now, I noticed you were using "virtio-net-tap" instead of
> > "virtio-net-pci".
> > 
> > Ouch.. I think that's even more complicated. :(
> > 
> > Here I think the problem is, introducing some arbitrary strings into
> > migration QAPI to represent some combinations of "virtio frontend F1" and
> > "virtio backend B1" doesn't sound the right thing to do.  Migration ideally
> > should have zero knowledge of the device topology, types of devices,
> > frontends or backends.  "virtio-*" as a string should not appear in
> > migration/ or qapi/migration.json at all..
> > 
> > > 
> > > And then, developing similar migration for vhost-user-blk, found
> > > that I can't use on boolean capability for such features, the reason
> > > in commit message, which we discuss now.
> > 
> > Why a bool isn't enough?  Could you share a link to that discussion?
> > 
> > > 
> > > Than, current design came in v5 (v4 was skipped).. And I even got an
> > > approval from Fabiano :)
> > > 
> > > https://lists.nongnu.org/archive/html/qemu-devel/2025-09/msg03999.html
> > > 
> > > > 
> > > > If these parameters are all set the same on src/dst, would it also not
> > > > break the design when new devices start to support it (and the new device
> > > > will need to introduce its own per-device flags)?
> > > 
> > > Yes, right.
> > > 
> > > I missed, that, "postponing (probably forever)" per-device options
> > > realization, I started to implement another way to solve the same
> > > problem (switching from one boolean capability to a backend-transfer
> > > list).
> > > 
> > > In other words, if at some point implement per-device options, that will
> > > partly intersect by functionality with current complex migration
> > > parameter..
> > > 
> > > -
> > > 
> > > But still, I think, that parameter backend-transfer = [list of targets]
> > > is better than per-device option. With per-device options we'll have to
> > > care about them forever. I can't imagine a way to make them TRUE by
> > > default.
> > > 
> > > Using machine type, to set option to TRUE by default in new MT, and to
> > > false in all previous ones doesn't make real sense: we never migrate on
> > > another MT, but we do can migrate from QEMU without support for
> > > virtio-net backend transfer to the QEMU with such support. And on target
> > > QEMU we'll want to enable virtio-net backend-transfer for further
> > > migrations..
> > 
> > So this is likely why you changed your mind.  I think machine properties
> > definitely make sense.
> > 
> > We set it OFF on old machines because when on old machines the src QEMU
> > _may_ not support this feature.  We set it ON on new machines because when
> > the QEMU has the new machine declared anyway, it is guaranteed to support
> > the feature.
> > 
> > We can still manually set the per-device properties iff the admin is sure
> > that both sides of "old" QEMUs support this feature.  However machine
> > properties worked like that for many years and I believe that's how it
> > works, by being always on the safe side.
> > 
> > > 
> > > So, I think, modifying machine types is wrong idea here. So, we have to
> > > keep new options FALSE by default, and management tool have to care to
> > > set them appropriately.
> > > 
> > > -
> > > 
> > > Let's look from the POV of management tool.
> > > 
> > > With complex parameter (list of backend-transfer targets, suggested with
> > > this series), what should we do?
> > > 
> > > 1. With introspection, get backend-transfer targets supported by source
> > >     and target QEMUs
> > > 2. Get and intersection, assume X
> > > 3. Set same backend-transfer=X on source and target
> > > 4. Start a migration
> > > 
> > > But with per-device parameters it becomes a lot more complicated and
> > > error prone
> > > 
> > > 1. Somehow understand (how?), which devices support backend-transfer on
> > >     source and target
> > > 2. Get an intersection
> > > 3. Set all the backend-transfer options on both vms correspondingly,
> > >     doing personal qom-set for each device
> > > 4. Start a migration
> > > 
> > > -
> > > 
> > > In short:
> > > 
> > > 1. per device - is too high granularity, making management more complex
> > 
> > If we follow the machine property way of doing this (which I believe we
> > used for years), then mgmt doesn't need any change except properly enable
> > fd-passing in migration cap/params when it's a local migration.  That's
> > all.  It doesn't need to know anything about "which device(s) supports
> > fd-passing", because they'll all be auto-set by the machine types.
> > 
> > > 
> > > 2. per feature - is what we need. And it's a normal use for migration
> > > capabilities: we implement a new migration feature, and add new
> > > capability. The only new bit with this series is that "we are going to"
> > > implement similar capabilities later, and seems good to organize them
> > > all into a list, rather than make separate booleans.
> > > 
> > > 
> > > > 
> > > > > 
> > > > > The commit only brings the interface, the realization will come in later
> > > > > commit. That's why we add a temporary not-implemented error in
> > > > > migrate_params_check().
> > > > > 
> > > 
> > > [..]
> > > 
> > > > > +bool migrate_virtio_net_tap(void)
> > > > > +{
> > > > > +    MigrationState *s = migrate_get_current();
> > > > > +    BackendTransferList *el = s->parameters.backend_transfer;
> > > > > +
> > > > > +    for ( ; el; el = el->next) {
> > > > > +        if (el->value == BACKEND_TRANSFER_VIRTIO_NET_TAP) {
> > > > 
> > > > So this is also something I want to avoid.  The hope is we don't
> > > > necessarily need to invent new device names into qapi/migration.json.
> > > > OTOH, we can export a helper in migration/misc.h so that devices can query
> > > > wehther the global feature is enabled or not, using that to AND the
> > > > per-device flag.
> > > > 
> > > 
> > > Understand. But I can't imagine how to keep management simple with per-device
> > > options..
> > > 
> > > -
> > > 
> > > What do you think?
> > 
> > I feel like you wanted to enable this feature _while_ using an old machine
> > type.
> 
> Exactly
> 
> > Is that what you're looking for?  Can you simply urge the users to
> > move to new machine types when looking for new features?  I believe that's
> > what we do..
> > 
> > MT properties were working like that for a long time.  What you were asking
> > is fair, but if so I'd still like to double check with you on that's your
> > real purpose (enabling this feature on NEW qemus but OLD machine types, all
> > automatically).
> > 
> 
> You made me think.
> 
> On the one hand, you are right, I agree with all arguments about migration
> being separate from virtio device types, their backends and frontends.
> 
> And yes, if refuse the idea of enabling the feature in old machine types
> automatically, everything fits into existing paradigm.
> 
> On the other hand is our downstream practice in the cloud. We introduce
> new machine types _very_ seldom. Almost always, new features developed
> or backported to our downstream doesn't require new machine type. In such
> situation, creating feature, which theoretically (and more simple in API!)
> may be done without introducing new MT, but creating it by introducing new
> MT, postponing the moment when we start to widely use it up to the moment when
> most of existing vms will die or restart naturally (as for sure, we'll not
> ask users to restart them, it would be too expensive (not saying about,
> is restart a safe way to change MT, or we'd better recreate a vm), seems
> very strange for me. (too long sentence detector blinking).

Yes, I agree once more it's still a fair ask, it's just not the major way
we do it in QEMU upstream otherwise there's no point introducing versioned
machine types (while we still need things like pc/q35 to identify the
boards even if no versioning on each of them).

> 
> So, finally, it's OK for me to switch to per-device properties. Then, in
> downstream I may implement corresponding capabilities to simplify management.
> That's rather simple.

With per-device properties, maybe.. it's still feasible to qom-list the
devices on both src/dst to know whether both of them would support this,
then turning it on if qom-list can report the property on both sides.  I
didn't think deeper than that, though..

> 
> -
> 
> Interesting, could migration "return path" be somehow used to get information
> from target, does it support backend transfer for concrete device?
> 
> So that, we simply enable backend-transfer=true parameter both on
> source and target. Than, source somehow find out through return path,
> for the device, does target support backend-transfer for it, and decide,
> what to do? Or that's too complicated?

Fabiano is looking at something like that, we called it migration
handshake.

https://wiki.qemu.org/ToDo/LiveMigration#Migration_handshake

Fundamentally one of its goal is that we can have bi-directional "talks"
between src/dst, before migration ever started, to synchronize on things
like this.  It's still likely not gonna happen this release.. though..  but
it's on the radar.  With that, dst also doesn't need to set migration
caps/params the same as src, because they'll talk things over.

> 
> -- 
> Best regards,
> Vladimir
> 

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration
  2025-10-15 18:27           ` Peter Xu
@ 2025-10-15 20:17             ` Vladimir Sementsov-Ogievskiy
  2025-10-16 16:25               ` Peter Xu
  0 siblings, 1 reply; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 20:17 UTC (permalink / raw)
  To: Peter Xu
  Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
	berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
	davydov-max, yc-core, raphael.s.norwitz

On 15.10.25 21:27, Peter Xu wrote:
>> Interesting, could migration "return path" be somehow used to get information
>> from target, does it support backend transfer for concrete device?
>>
>> So that, we simply enable backend-transfer=true parameter both on
>> source and target. Than, source somehow find out through return path,
>> for the device, does target support backend-transfer for it, and decide,
>> what to do? Or that's too complicated?
> Fabiano is looking at something like that, we called it migration
> handshake.
> 
> https://wiki.qemu.org/ToDo/LiveMigration#Migration_handshake
> 
> Fundamentally one of its goal is that we can have bi-directional "talks"
> between src/dst, before migration ever started, to synchronize on things
> like this.  It's still likely not gonna happen this release.. though..  but
> it's on the radar.  With that, dst also doesn't need to set migration
> caps/params the same as src, because they'll talk things over.

Oh, that sounds cool, I've always dreamed of something like this.

Note for myself: look through the QEMU wiki, it may contain quite interesting things,
not only "QEMU Planning" and "Submit a Patch" :)

For live-update with backend transfer, we'll probably can not only check the
device tree, but recreate it automatically, using information from target.

> Allow QMP command "migrate[_incoming]" ..

O I thought about this too.

-

Off topic:

Didn't you think about moving to some context-free protocol for migration
stream? Current protocol is hardly bound to migration states definitions
in the code. This, for example, makes writing an external tool to analyze the
stream almost impossible. As well, any misconfiguration leads to strange
error, when we treat data wrongly on the target.

I imagine.. json? Or something like this.. So that we can always understand
the structure of incoming object, even if we don't know, what exactly we
are going to get. This also simplifies expanding the state in new verions:
we just add a new field into migratable object, and can handle absent field
in incoming stream.

-- 
Best regards,
Vladimir

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration
  2025-10-15 20:17             ` Vladimir Sementsov-Ogievskiy
@ 2025-10-16 16:25               ` Peter Xu
  2025-10-16 17:06                 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 32+ messages in thread
From: Peter Xu @ 2025-10-16 16:25 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
	berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
	davydov-max, yc-core, raphael.s.norwitz

On Wed, Oct 15, 2025 at 11:17:27PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Off topic:
> 
> Didn't you think about moving to some context-free protocol for migration
> stream? Current protocol is hardly bound to migration states definitions
> in the code. This, for example, makes writing an external tool to analyze the
> stream almost impossible. As well, any misconfiguration leads to strange
> error, when we treat data wrongly on the target.
> 
> I imagine.. json? Or something like this.. So that we can always understand
> the structure of incoming object, even if we don't know, what exactly we
> are going to get. This also simplifies expanding the state in new verions:
> we just add a new field into migratable object, and can handle absent field
> in incoming stream.

Have you looked at the current encoded JSON dump within the migration
stream?  See should_send_vmdesc().

That looks like what you're describing, but definitely different in that it
should only be used for debugging purposes e.g. when a stream is dumped
into a file.  The JSON should only only appear also on precopy as of now.

We might try to move it _before_ the real binary stream, or making the
stream itself to be JSON, but there'll be tricky things we need to think
about.

At least it should be problematic when we want to dump it before the binary
stream, because there can be VMSD fields or subsections that has a test()
function that will only conditionally appear depending on any possible
conditions (e.g. device register states).  If we try to dump it before
hand, it may mean after device registers changed and when we stop VM and
dump the real binary stream the test() fn may return something different,
starting to mismatch with the JSON description.

Dump the whole thing completely with JSON format is indeed another approach
that I am not aware of anyone hought further.  I believe some of us
(including myself) pictured how it could look like, but I am not aware
anyone went deeper than that.  Maybe it's because the current methods work
not as good but okay so that no one yet decided to think it all through.
In short, for simple machine types, they use VMSD versioning hence backward
migration is not supported.  For enterprise use, machine type properties
are used and there aren't a huge lot so maybe not as bothering.

Thanks,

-- 
Peter Xu

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration
  2025-10-16 16:25               ` Peter Xu
@ 2025-10-16 17:06                 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 0 replies; 32+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-16 17:06 UTC (permalink / raw)
  To: Peter Xu
  Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
	berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
	davydov-max, yc-core, raphael.s.norwitz

On 16.10.25 19:25, Peter Xu wrote:
> On Wed, Oct 15, 2025 at 11:17:27PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> Off topic:
>>
>> Didn't you think about moving to some context-free protocol for migration
>> stream? Current protocol is hardly bound to migration states definitions
>> in the code. This, for example, makes writing an external tool to analyze the
>> stream almost impossible. As well, any misconfiguration leads to strange
>> error, when we treat data wrongly on the target.
>>
>> I imagine.. json? Or something like this.. So that we can always understand
>> the structure of incoming object, even if we don't know, what exactly we
>> are going to get. This also simplifies expanding the state in new verions:
>> we just add a new field into migratable object, and can handle absent field
>> in incoming stream.
> 
> Have you looked at the current encoded JSON dump within the migration
> stream?  See should_send_vmdesc().
> 
> That looks like what you're describing, but definitely different in that it
> should only be used for debugging purposes e.g. when a stream is dumped
> into a file.  The JSON should only only appear also on precopy as of now.
> 
> We might try to move it _before_ the real binary stream, or making the
> stream itself to be JSON, but there'll be tricky things we need to think
> about.
> 
> At least it should be problematic when we want to dump it before the binary
> stream, because there can be VMSD fields or subsections that has a test()
> function that will only conditionally appear depending on any possible
> conditions (e.g. device register states).  If we try to dump it before
> hand, it may mean after device registers changed and when we stop VM and
> dump the real binary stream the test() fn may return something different,
> starting to mismatch with the JSON description.
> 
> Dump the whole thing completely with JSON format is indeed another approach

Yes I meant this. Or maybe some other external binary protocol like protobuf.

> that I am not aware of anyone hought further.  I believe some of us
> (including myself) pictured how it could look like, but I am not aware
> anyone went deeper than that.  Maybe it's because the current methods work
> not as good but okay so that no one yet decided to think it all through.
> In short, for simple machine types, they use VMSD versioning hence backward
> migration is not supported.  For enterprise use, machine type properties
> are used and there aren't a huge lot so maybe not as bothering.
> 

yes. Too much work with little benefit..

another thought:

We have QAPI protocol, with quite good schema description, and we can add
new optional fields to structures, and backward compatibility works.

Maybe, we can migrate a QAPI generated structures? Then we may describe
state of devices in QAPI..

Just note: working with QEMU's migration protocol and QAPI for years,
I can say that QAPI is a lot simpler in:
- implementing new features in backward compatible style
- maintaining downstream-only features

Still, QAPI is not good for passing big chunks of raw data, like memory pages.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2025-10-16 17:07 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-10 17:39 [PATCH v7 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 01/19] net/tap: net_init_tap_one(): drop extra error propagation Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 02/19] net/tap: net_init_tap_one(): move parameter checking earlier Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 03/19] net/tap: rework net_tap_init() Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 04/19] net/tap: pass NULL to net_init_tap_one() in cases when scripts are NULL Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 05/19] net/tap: rework scripts handling Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 06/19] net/tap: setup exit notifier only when needed Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 07/19] net/tap: split net_tap_fd_init() Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 08/19] net/tap: tap_set_sndbuf(): add return value Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 09/19] net/tap: rework tap_set_sndbuf() Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 10/19] net/tap: rework sndbuf handling Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 11/19] net/tap: introduce net_tap_setup() Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 12/19] net/tap: move vhost fd initialization to net_tap_new() Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 13/19] net/tap: finalize net_tap_set_fd() logic Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 14/19] migration: introduce .pre_incoming() vmsd handler Vladimir Sementsov-Ogievskiy
2025-10-14 16:26   ` Peter Xu
2025-10-10 17:39 ` [PATCH v7 15/19] net/tap: postpone tap setup to pre-incoming Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 16/19] qapi: add interface for backend-transfer virtio-net/tap migration Vladimir Sementsov-Ogievskiy
2025-10-14 16:33   ` Peter Xu
2025-10-14 19:31     ` Vladimir Sementsov-Ogievskiy
2025-10-14 20:25       ` Peter Xu
2025-10-14 21:46         ` Vladimir Sementsov-Ogievskiy
2025-10-14 21:54           ` Vladimir Sementsov-Ogievskiy
2025-10-15  7:11             ` Daniel P. Berrangé
2025-10-15 18:27           ` Peter Xu
2025-10-15 20:17             ` Vladimir Sementsov-Ogievskiy
2025-10-16 16:25               ` Peter Xu
2025-10-16 17:06                 ` Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 17/19] virtio-net: support backend-transfer migration for virtio-net/tap Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 18/19] tests/functional: add skipWithoutSudo() decorator Vladimir Sementsov-Ogievskiy
2025-10-10 17:39 ` [PATCH v7 19/19] tests/functional: add test_x86_64_tap_migration Vladimir Sementsov-Ogievskiy
2025-10-11 15:26 ` [PATCH v7 00/19] virtio-net: live-TAP local migration Lei Yang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).