* [PATCH v8 00/19] virtio-net: live-TAP local migration
@ 2025-10-15 13:21 Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 01/19] net/tap: net_init_tap_one(): drop extra error propagation Vladimir Sementsov-Ogievskiy
` (19 more replies)
0 siblings, 20 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
Hi all!
Here is a new migration parameter backend-transfer, which, being
assisted by new device property backend-transfer, allows to
enable local migration of TAP virtio-net backend, including its
properties and open fds.
With this new option, management software doesn't need to
initialize new TAP and do a switch to it. Nothing should be
done around virtio-net in local migration: it just migrates
and continues to use same TAP device. So we avoid extra logic
in management software, extra allocations in kernel (for new TAP),
and corresponding extra delay in migration downtime.
v8:
14: add a-b by Peter
16: rework to one boolean parameter
17: rework to use per-device property
19: update to use new API
Vladimir Sementsov-Ogievskiy (19):
net/tap: net_init_tap_one(): drop extra error propagation
net/tap: net_init_tap_one(): move parameter checking earlier
net/tap: rework net_tap_init()
net/tap: pass NULL to net_init_tap_one() in cases when scripts are
NULL
net/tap: rework scripts handling
net/tap: setup exit notifier only when needed
net/tap: split net_tap_fd_init()
net/tap: tap_set_sndbuf(): add return value
net/tap: rework tap_set_sndbuf()
net/tap: rework sndbuf handling
net/tap: introduce net_tap_setup()
net/tap: move vhost fd initialization to net_tap_new()
net/tap: finalize net_tap_set_fd() logic
migration: introduce .pre_incoming() vmsd handler
net/tap: postpone tap setup to pre-incoming
qapi: introduce backend-transfer migration parameter
virtio-net: support backend-transfer migration for virtio-net/tap
tests/functional: add skipWithoutSudo() decorator
tests/functional: add test_x86_64_tap_migration
hw/core/machine.c | 1 +
hw/net/virtio-net.c | 151 ++++++-
include/hw/virtio/virtio-net.h | 1 +
include/migration/vmstate.h | 1 +
include/net/tap.h | 5 +
migration/migration.c | 4 +
migration/options.c | 18 +
migration/options.h | 2 +
migration/savevm.c | 15 +
migration/savevm.h | 1 +
net/tap-bsd.c | 3 +-
net/tap-linux.c | 19 +-
net/tap-solaris.c | 3 +-
net/tap-stub.c | 3 +-
net/tap-win32.c | 11 +
net/tap.c | 425 +++++++++++++-----
net/tap_int.h | 3 +-
qapi/migration.json | 38 +-
tests/functional/qemu_test/decorators.py | 16 +
tests/functional/test_x86_64_tap_migration.py | 395 ++++++++++++++++
20 files changed, 984 insertions(+), 131 deletions(-)
create mode 100644 tests/functional/test_x86_64_tap_migration.py
--
2.48.1
^ permalink raw reply [flat|nested] 51+ messages in thread
* [PATCH v8 01/19] net/tap: net_init_tap_one(): drop extra error propagation
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 02/19] net/tap: net_init_tap_one(): move parameter checking earlier Vladimir Sementsov-Ogievskiy
` (18 subsequent siblings)
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
net/tap.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index abe3b2d036..70de798fe8 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -736,9 +736,8 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
}
if (vhostfdname) {
- vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, &err);
+ vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, errp);
if (vhostfd == -1) {
- error_propagate(errp, err);
goto failed;
}
if (!qemu_set_blocking(vhostfd, false, errp)) {
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 02/19] net/tap: net_init_tap_one(): move parameter checking earlier
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 01/19] net/tap: net_init_tap_one(): drop extra error propagation Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 03/19] net/tap: rework net_tap_init() Vladimir Sementsov-Ogievskiy
` (17 subsequent siblings)
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
Let's keep all similar argument checking in net_init_tap() function.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
net/tap.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index 70de798fe8..f90050c3a0 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -768,9 +768,6 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
"vhost-net requested but could not be initialized");
goto failed;
}
- } else if (vhostfdname) {
- error_setg(errp, "vhostfd(s)= is not valid without vhost");
- goto failed;
}
return;
@@ -832,6 +829,11 @@ int net_init_tap(const Netdev *netdev, const char *name,
return -1;
}
+ if (tap->has_vhost && !tap->vhost && (tap->vhostfds || tap->vhostfd)) {
+ error_setg(errp, "vhostfd(s)= is not valid without vhost");
+ return -1;
+ }
+
if (tap->fd) {
if (tap->ifname || tap->script || tap->downscript ||
tap->has_vnet_hdr || tap->helper || tap->has_queues ||
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 03/19] net/tap: rework net_tap_init()
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 01/19] net/tap: net_init_tap_one(): drop extra error propagation Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 02/19] net/tap: net_init_tap_one(): move parameter checking earlier Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 04/19] net/tap: pass NULL to net_init_tap_one() in cases when scripts are NULL Vladimir Sementsov-Ogievskiy
` (16 subsequent siblings)
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
In future (to support backend-transfer migration for virtio-net/tap,
which includes fds passing through unix socket) we'll want to postpone
fd-initialization to the later point, when QAPI structured parameters
are not available. So, let's now rework the function to interface
without "tap" parameter.
Also, rename to net_tap_open(), as it's just a wrapper on tap_open(),
and having net_tap_init() and net_init_tap() functions in one file
is confusing.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
net/tap.c | 18 +++++++-----------
1 file changed, 7 insertions(+), 11 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index f90050c3a0..b1b64c508d 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -655,20 +655,12 @@ int net_init_bridge(const Netdev *netdev, const char *name,
return 0;
}
-static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
+static int net_tap_open(int *vnet_hdr, bool vnet_hdr_required,
const char *setup_script, char *ifname,
size_t ifname_sz, int mq_required, Error **errp)
{
Error *err = NULL;
- int fd, vnet_hdr_required;
-
- if (tap->has_vnet_hdr) {
- *vnet_hdr = tap->vnet_hdr;
- vnet_hdr_required = *vnet_hdr;
- } else {
- *vnet_hdr = 1;
- vnet_hdr_required = 0;
- }
+ int fd;
fd = RETRY_ON_EINTR(tap_open(ifname, ifname_sz, vnet_hdr, vnet_hdr_required,
mq_required, errp));
@@ -977,6 +969,8 @@ free_fail:
} else {
g_autofree char *default_script = NULL;
g_autofree char *default_downscript = NULL;
+ bool vnet_hdr_required = tap->has_vnet_hdr && tap->vnet_hdr;
+
if (tap->vhostfds) {
error_setg(errp, "vhostfds= is invalid if fds= wasn't specified");
return -1;
@@ -997,7 +991,9 @@ free_fail:
}
for (i = 0; i < queues; i++) {
- fd = net_tap_init(tap, &vnet_hdr, i >= 1 ? "no" : script,
+ vnet_hdr = tap->has_vnet_hdr ? tap->vnet_hdr : 1;
+ fd = net_tap_open(&vnet_hdr, vnet_hdr_required,
+ i >= 1 ? "no" : script,
ifname, sizeof ifname, queues > 1, errp);
if (fd == -1) {
return -1;
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 04/19] net/tap: pass NULL to net_init_tap_one() in cases when scripts are NULL
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (2 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 03/19] net/tap: rework net_tap_init() Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 05/19] net/tap: rework scripts handling Vladimir Sementsov-Ogievskiy
` (15 subsequent siblings)
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
Directly pass NULL in cases where we report an error if script or
downscript are set.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
net/tap.c | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index b1b64c508d..a05cc7ef64 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -800,8 +800,6 @@ int net_init_tap(const Netdev *netdev, const char *name,
const NetdevTapOptions *tap;
int fd, vnet_hdr = 0, i = 0, queues;
/* for the no-fd, no-helper case */
- const char *script;
- const char *downscript;
Error *err = NULL;
const char *vhostfdname;
char ifname[128];
@@ -811,8 +809,6 @@ int net_init_tap(const Netdev *netdev, const char *name,
tap = &netdev->u.tap;
queues = tap->has_queues ? tap->queues : 1;
vhostfdname = tap->vhostfd;
- script = tap->script;
- downscript = tap->downscript;
/* QEMU hubs do not support multiqueue tap, in this case peer is set.
* For -netdev, peer is always NULL. */
@@ -853,7 +849,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
}
net_init_tap_one(tap, peer, "tap", name, NULL,
- script, downscript,
+ NULL, NULL,
vhostfdname, vnet_hdr, fd, &err);
if (err) {
error_propagate(errp, err);
@@ -914,7 +910,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
}
net_init_tap_one(tap, peer, "tap", name, ifname,
- script, downscript,
+ NULL, NULL,
tap->vhostfds ? vhost_fds[i] : NULL,
vnet_hdr, fd, &err);
if (err) {
@@ -959,7 +955,7 @@ free_fail:
}
net_init_tap_one(tap, peer, "bridge", name, ifname,
- script, downscript, vhostfdname,
+ NULL, NULL, vhostfdname,
vnet_hdr, fd, &err);
if (err) {
error_propagate(errp, err);
@@ -967,6 +963,8 @@ free_fail:
return -1;
}
} else {
+ const char *script = tap->script;
+ const char *downscript = tap->downscript;
g_autofree char *default_script = NULL;
g_autofree char *default_downscript = NULL;
bool vnet_hdr_required = tap->has_vnet_hdr && tap->vnet_hdr;
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 05/19] net/tap: rework scripts handling
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (3 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 04/19] net/tap: pass NULL to net_init_tap_one() in cases when scripts are NULL Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 06/19] net/tap: setup exit notifier only when needed Vladimir Sementsov-Ogievskiy
` (14 subsequent siblings)
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
Simplify handling scripts: parse all these "no" and '\0' once, and
then keep simpler logic for net_tap_open() and net_init_tap_one(): NULL
means no script to run, otherwise run script.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
net/tap.c | 45 +++++++++++++++++++++++++--------------------
1 file changed, 25 insertions(+), 20 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index a05cc7ef64..994e885c5f 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -91,6 +91,21 @@ static void launch_script(const char *setup_script, const char *ifname,
static void tap_send(void *opaque);
static void tap_writable(void *opaque);
+static char *tap_parse_script(const char *script_arg, const char *default_path)
+{
+ g_autofree char *res = g_strdup(script_arg);
+
+ if (!res) {
+ res = get_relocated_path(default_path);
+ }
+
+ if (res[0] == '\0' || strcmp(res, "no") == 0) {
+ return NULL;
+ }
+
+ return g_steal_pointer(&res);
+}
+
static void tap_update_fd_handler(TAPState *s)
{
qemu_set_fd_handler(s->fd,
@@ -668,9 +683,7 @@ static int net_tap_open(int *vnet_hdr, bool vnet_hdr_required,
return -1;
}
- if (setup_script &&
- setup_script[0] != '\0' &&
- strcmp(setup_script, "no") != 0) {
+ if (setup_script) {
launch_script(setup_script, ifname, fd, &err);
if (err) {
error_propagate(errp, err);
@@ -706,9 +719,9 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
qemu_set_info_str(&s->nc, "helper=%s", tap->helper);
} else {
qemu_set_info_str(&s->nc, "ifname=%s,script=%s,downscript=%s", ifname,
- script, downscript);
+ script ?: "no", downscript ?: "no");
- if (strcmp(downscript, "no") != 0) {
+ if (downscript) {
snprintf(s->down_script, sizeof(s->down_script), "%s", downscript);
snprintf(s->down_script_arg, sizeof(s->down_script_arg),
"%s", ifname);
@@ -963,10 +976,10 @@ free_fail:
return -1;
}
} else {
- const char *script = tap->script;
- const char *downscript = tap->downscript;
- g_autofree char *default_script = NULL;
- g_autofree char *default_downscript = NULL;
+ g_autofree char *script =
+ tap_parse_script(tap->script, DEFAULT_NETWORK_SCRIPT);
+ g_autofree char *downscript =
+ tap_parse_script(tap->downscript, DEFAULT_NETWORK_DOWN_SCRIPT);
bool vnet_hdr_required = tap->has_vnet_hdr && tap->vnet_hdr;
if (tap->vhostfds) {
@@ -974,14 +987,6 @@ free_fail:
return -1;
}
- if (!script) {
- script = default_script = get_relocated_path(DEFAULT_NETWORK_SCRIPT);
- }
- if (!downscript) {
- downscript = default_downscript =
- get_relocated_path(DEFAULT_NETWORK_DOWN_SCRIPT);
- }
-
if (tap->ifname) {
pstrcpy(ifname, sizeof ifname, tap->ifname);
} else {
@@ -991,7 +996,7 @@ free_fail:
for (i = 0; i < queues; i++) {
vnet_hdr = tap->has_vnet_hdr ? tap->vnet_hdr : 1;
fd = net_tap_open(&vnet_hdr, vnet_hdr_required,
- i >= 1 ? "no" : script,
+ i >= 1 ? NULL : script,
ifname, sizeof ifname, queues > 1, errp);
if (fd == -1) {
return -1;
@@ -1006,8 +1011,8 @@ free_fail:
}
net_init_tap_one(tap, peer, "tap", name, ifname,
- i >= 1 ? "no" : script,
- i >= 1 ? "no" : downscript,
+ i >= 1 ? NULL : script,
+ i >= 1 ? NULL : downscript,
vhostfdname, vnet_hdr, fd, &err);
if (err) {
error_propagate(errp, err);
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 06/19] net/tap: setup exit notifier only when needed
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (4 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 05/19] net/tap: rework scripts handling Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 07/19] net/tap: split net_tap_fd_init() Vladimir Sementsov-Ogievskiy
` (13 subsequent siblings)
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
No reason to setup notifier on each queue of multique tap,
when we actually want to run downscript only once.
As well, let's not setup notifier, when downscript is
not enabled (downsciprt="no").
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
net/tap.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index 994e885c5f..17ad561f9c 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -326,11 +326,9 @@ static void tap_exit_notify(Notifier *notifier, void *data)
TAPState *s = container_of(notifier, TAPState, exit);
Error *err = NULL;
- if (s->down_script[0]) {
- launch_script(s->down_script, s->down_script_arg, s->fd, &err);
- if (err) {
- error_report_err(err);
- }
+ launch_script(s->down_script, s->down_script_arg, s->fd, &err);
+ if (err) {
+ error_report_err(err);
}
}
@@ -346,8 +344,11 @@ static void tap_cleanup(NetClientState *nc)
qemu_purge_queued_packets(nc);
- tap_exit_notify(&s->exit, NULL);
- qemu_remove_exit_notifier(&s->exit);
+ if (s->exit.notify) {
+ tap_exit_notify(&s->exit, NULL);
+ qemu_remove_exit_notifier(&s->exit);
+ s->exit.notify = NULL;
+ }
tap_read_poll(s, false);
tap_write_poll(s, false);
@@ -443,9 +444,6 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
tap_read_poll(s, true);
s->vhost_net = NULL;
- s->exit.notify = tap_exit_notify;
- qemu_add_exit_notifier(&s->exit);
-
return s;
}
@@ -725,6 +723,8 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
snprintf(s->down_script, sizeof(s->down_script), "%s", downscript);
snprintf(s->down_script_arg, sizeof(s->down_script_arg),
"%s", ifname);
+ s->exit.notify = tap_exit_notify;
+ qemu_add_exit_notifier(&s->exit);
}
}
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 07/19] net/tap: split net_tap_fd_init()
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (5 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 06/19] net/tap: setup exit notifier only when needed Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 08/19] net/tap: tap_set_sndbuf(): add return value Vladimir Sementsov-Ogievskiy
` (12 subsequent siblings)
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
Split the function into separate net_tap_new() and net_tap_set_fd().
We start move to the following picture:
net_tap_new() - take QAPI @tap parameter, but don't have @fd,
initialize the net client, called during initialization.
net_tap_setup() - don't have @tap (QAPI), but have @fd parameter,
may be called at later point.
In this commit we introduce the first function.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
net/tap.c | 31 +++++++++++++++++--------------
1 file changed, 17 insertions(+), 14 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index 17ad561f9c..7cb694e683 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -412,19 +412,20 @@ static NetClientInfo net_tap_info = {
.get_vhost_net = tap_get_vhost_net,
};
-static TAPState *net_tap_fd_init(NetClientState *peer,
- const char *model,
- const char *name,
- int fd,
- int vnet_hdr)
+static TAPState *net_tap_new(NetClientState *peer, const char *model,
+ const char *name)
{
- NetOffloads ol = {};
- NetClientState *nc;
- TAPState *s;
+ NetClientState *nc = qemu_new_net_client(&net_tap_info, peer, model, name);
+ TAPState *s = DO_UPCAST(TAPState, nc, nc);
- nc = qemu_new_net_client(&net_tap_info, peer, model, name);
+ s->fd = -1;
- s = DO_UPCAST(TAPState, nc, nc);
+ return s;
+}
+
+static void net_tap_set_fd(TAPState *s, int fd, int vnet_hdr)
+{
+ NetOffloads ol = {};
s->fd = fd;
s->host_vnet_hdr_len = vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
@@ -443,8 +444,6 @@ static TAPState *net_tap_fd_init(NetClientState *peer,
}
tap_read_poll(s, true);
s->vhost_net = NULL;
-
- return s;
}
static void close_all_fds_after_fork(int excluded_fd)
@@ -661,7 +660,9 @@ int net_init_bridge(const Netdev *netdev, const char *name,
close(fd);
return -1;
}
- s = net_tap_fd_init(peer, "bridge", name, fd, vnet_hdr);
+
+ s = net_tap_new(peer, "bridge", name);
+ net_tap_set_fd(s, fd, vnet_hdr);
qemu_set_info_str(&s->nc, "helper=%s,br=%s", helper, br);
@@ -702,9 +703,11 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
int vnet_hdr, int fd, Error **errp)
{
Error *err = NULL;
- TAPState *s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
+ TAPState *s = net_tap_new(peer, model, name);
int vhostfd;
+ net_tap_set_fd(s, fd, vnet_hdr);
+
tap_set_sndbuf(s->fd, tap, &err);
if (err) {
error_propagate(errp, err);
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 08/19] net/tap: tap_set_sndbuf(): add return value
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (6 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 07/19] net/tap: split net_tap_fd_init() Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 09/19] net/tap: rework tap_set_sndbuf() Vladimir Sementsov-Ogievskiy
` (11 subsequent siblings)
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
Follow common recommendations in include/qapi/error.h of having
a return value together with errp. This allows to avoid error propagation.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
net/tap-bsd.c | 3 ++-
net/tap-linux.c | 5 ++++-
net/tap-solaris.c | 3 ++-
net/tap-stub.c | 3 ++-
net/tap.c | 5 +----
net/tap_int.h | 2 +-
6 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index bbf84d1828..9bd282b69c 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -206,8 +206,9 @@ error:
}
#endif /* __FreeBSD__ */
-void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
{
+ return true;
}
int tap_probe_vnet_hdr(int fd, Error **errp)
diff --git a/net/tap-linux.c b/net/tap-linux.c
index 2a90b58467..db68693bbf 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -145,7 +145,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
*/
#define TAP_DEFAULT_SNDBUF 0
-void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
{
int sndbuf;
@@ -159,7 +159,10 @@ void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
if (ioctl(fd, TUNSETSNDBUF, &sndbuf) == -1 && tap->has_sndbuf) {
error_setg_errno(errp, errno, "TUNSETSNDBUF ioctl failed");
+ return false;
}
+
+ return true;
}
int tap_probe_vnet_hdr(int fd, Error **errp)
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index 75397e6c54..e5ba89d926 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -208,8 +208,9 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
return fd;
}
-void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
{
+ return true;
}
int tap_probe_vnet_hdr(int fd, Error **errp)
diff --git a/net/tap-stub.c b/net/tap-stub.c
index f7a5e0c163..86d7d38e0f 100644
--- a/net/tap-stub.c
+++ b/net/tap-stub.c
@@ -33,8 +33,9 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
return -1;
}
-void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
{
+ return true;
}
int tap_probe_vnet_hdr(int fd, Error **errp)
diff --git a/net/tap.c b/net/tap.c
index 7cb694e683..25dedd8492 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -702,15 +702,12 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
const char *downscript, const char *vhostfdname,
int vnet_hdr, int fd, Error **errp)
{
- Error *err = NULL;
TAPState *s = net_tap_new(peer, model, name);
int vhostfd;
net_tap_set_fd(s, fd, vnet_hdr);
- tap_set_sndbuf(s->fd, tap, &err);
- if (err) {
- error_propagate(errp, err);
+ if (!tap_set_sndbuf(s->fd, tap, errp)) {
goto failed;
}
diff --git a/net/tap_int.h b/net/tap_int.h
index b76a05044b..7963dd6aae 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -34,7 +34,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen);
-void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp);
+bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp);
int tap_probe_vnet_hdr(int fd, Error **errp);
int tap_probe_has_ufo(int fd);
int tap_probe_has_uso(int fd);
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 09/19] net/tap: rework tap_set_sndbuf()
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (7 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 08/19] net/tap: tap_set_sndbuf(): add return value Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 10/19] net/tap: rework sndbuf handling Vladimir Sementsov-Ogievskiy
` (10 subsequent siblings)
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
Keep NetdevTapOptions related logic in tap.c, and make tap_set_sndbuf a
simple system call wrapper, more like other functions in tap-linux.c
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
net/tap-bsd.c | 2 +-
net/tap-linux.c | 16 ++--------------
net/tap-solaris.c | 2 +-
net/tap-stub.c | 2 +-
net/tap.c | 6 +++++-
net/tap_int.h | 3 +--
6 files changed, 11 insertions(+), 20 deletions(-)
diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index 9bd282b69c..4cea60664e 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -206,7 +206,7 @@ error:
}
#endif /* __FreeBSD__ */
-bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, int sndbuf, Error **errp)
{
return true;
}
diff --git a/net/tap-linux.c b/net/tap-linux.c
index db68693bbf..bb73fa4b13 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -143,21 +143,9 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
* Ethernet NICs generally have txqueuelen=1000, so 1Mb is
* a good value, given a 1500 byte MTU.
*/
-#define TAP_DEFAULT_SNDBUF 0
-
-bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, int sndbuf, Error **errp)
{
- int sndbuf;
-
- sndbuf = !tap->has_sndbuf ? TAP_DEFAULT_SNDBUF :
- tap->sndbuf > INT_MAX ? INT_MAX :
- tap->sndbuf;
-
- if (!sndbuf) {
- sndbuf = INT_MAX;
- }
-
- if (ioctl(fd, TUNSETSNDBUF, &sndbuf) == -1 && tap->has_sndbuf) {
+ if (ioctl(fd, TUNSETSNDBUF, &sndbuf) == -1) {
error_setg_errno(errp, errno, "TUNSETSNDBUF ioctl failed");
return false;
}
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index e5ba89d926..e925ca8ae9 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -208,7 +208,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
return fd;
}
-bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, int sndbuf, Error **errp)
{
return true;
}
diff --git a/net/tap-stub.c b/net/tap-stub.c
index 86d7d38e0f..6aa60d96ad 100644
--- a/net/tap-stub.c
+++ b/net/tap-stub.c
@@ -33,7 +33,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
return -1;
}
-bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
+bool tap_set_sndbuf(int fd, int sndbuf, Error **errp)
{
return true;
}
diff --git a/net/tap.c b/net/tap.c
index 25dedd8492..f5830f4b00 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -704,10 +704,14 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
{
TAPState *s = net_tap_new(peer, model, name);
int vhostfd;
+ bool sndbuf_required = tap->has_sndbuf;
+ int sndbuf =
+ (tap->has_sndbuf && tap->sndbuf) ? MIN(tap->sndbuf, INT_MAX) : INT_MAX;
net_tap_set_fd(s, fd, vnet_hdr);
- if (!tap_set_sndbuf(s->fd, tap, errp)) {
+ if (!tap_set_sndbuf(fd, sndbuf, sndbuf_required ? errp : NULL) &&
+ sndbuf_required) {
goto failed;
}
diff --git a/net/tap_int.h b/net/tap_int.h
index 7963dd6aae..dc4f484006 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -26,7 +26,6 @@
#ifndef NET_TAP_INT_H
#define NET_TAP_INT_H
-#include "qapi/qapi-types-net.h"
#include "net/net.h"
int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
@@ -34,7 +33,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen);
-bool tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp);
+bool tap_set_sndbuf(int fd, int sndbuf, Error **errp);
int tap_probe_vnet_hdr(int fd, Error **errp);
int tap_probe_has_ufo(int fd);
int tap_probe_has_uso(int fd);
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 10/19] net/tap: rework sndbuf handling
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (8 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 09/19] net/tap: rework tap_set_sndbuf() Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 11/19] net/tap: introduce net_tap_setup() Vladimir Sementsov-Ogievskiy
` (9 subsequent siblings)
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
Continue the main idea: avoid dependency on @tap in net_tap_setup().
So, move QAPI parsing to net_tap_new().
Move setting sndbuf to net_tap_set_fd(), as it's more appropriate place
(other initial fd settings are here).
Note that net_tap_new() and net_tap_set_fd() are shared with
net_init_bridge(), which didn't set sndbuf. Handle this case by sndbuf=0
(we never pass zero to tap_set_sndbuf(), so let this specific value mean
that we don't want touch sndbuf).
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
net/tap.c | 38 ++++++++++++++++++++++++++------------
1 file changed, 26 insertions(+), 12 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index f5830f4b00..b5ac856a3d 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -83,6 +83,9 @@ typedef struct TAPState {
VHostNetState *vhost_net;
unsigned host_vnet_hdr_len;
Notifier exit;
+
+ bool sndbuf_required;
+ int sndbuf;
} TAPState;
static void launch_script(const char *setup_script, const char *ifname,
@@ -413,17 +416,25 @@ static NetClientInfo net_tap_info = {
};
static TAPState *net_tap_new(NetClientState *peer, const char *model,
- const char *name)
+ const char *name, const NetdevTapOptions *tap)
{
NetClientState *nc = qemu_new_net_client(&net_tap_info, peer, model, name);
TAPState *s = DO_UPCAST(TAPState, nc, nc);
s->fd = -1;
+ if (!tap) {
+ return s;
+ }
+
+ s->sndbuf_required = tap->has_sndbuf;
+ s->sndbuf =
+ (tap->has_sndbuf && tap->sndbuf) ? MIN(tap->sndbuf, INT_MAX) : INT_MAX;
+
return s;
}
-static void net_tap_set_fd(TAPState *s, int fd, int vnet_hdr)
+static bool net_tap_set_fd(TAPState *s, int fd, int vnet_hdr, Error **errp)
{
NetOffloads ol = {};
@@ -444,6 +455,15 @@ static void net_tap_set_fd(TAPState *s, int fd, int vnet_hdr)
}
tap_read_poll(s, true);
s->vhost_net = NULL;
+
+ if (s->sndbuf) {
+ Error **e = s->sndbuf_required ? errp : NULL;
+ if (!tap_set_sndbuf(s->fd, s->sndbuf, e) && s->sndbuf_required) {
+ return false;
+ }
+ }
+
+ return true;
}
static void close_all_fds_after_fork(int excluded_fd)
@@ -661,8 +681,8 @@ int net_init_bridge(const Netdev *netdev, const char *name,
return -1;
}
- s = net_tap_new(peer, "bridge", name);
- net_tap_set_fd(s, fd, vnet_hdr);
+ s = net_tap_new(peer, "bridge", name, NULL);
+ net_tap_set_fd(s, fd, vnet_hdr, &error_abort);
qemu_set_info_str(&s->nc, "helper=%s,br=%s", helper, br);
@@ -702,16 +722,10 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
const char *downscript, const char *vhostfdname,
int vnet_hdr, int fd, Error **errp)
{
- TAPState *s = net_tap_new(peer, model, name);
+ TAPState *s = net_tap_new(peer, model, name, tap);
int vhostfd;
- bool sndbuf_required = tap->has_sndbuf;
- int sndbuf =
- (tap->has_sndbuf && tap->sndbuf) ? MIN(tap->sndbuf, INT_MAX) : INT_MAX;
-
- net_tap_set_fd(s, fd, vnet_hdr);
- if (!tap_set_sndbuf(fd, sndbuf, sndbuf_required ? errp : NULL) &&
- sndbuf_required) {
+ if (!net_tap_set_fd(s, fd, vnet_hdr, errp)) {
goto failed;
}
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 11/19] net/tap: introduce net_tap_setup()
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (9 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 10/19] net/tap: rework sndbuf handling Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 12/19] net/tap: move vhost fd initialization to net_tap_new() Vladimir Sementsov-Ogievskiy
` (8 subsequent siblings)
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
Move most of net_init_tap_one() to net_tap_setup() - future pair
for net_tap_new(), for postponed setup.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
net/tap.c | 39 +++++++++++++++++++++++++--------------
1 file changed, 25 insertions(+), 14 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index b5ac856a3d..b01cd4d6c2 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -88,6 +88,10 @@ typedef struct TAPState {
int sndbuf;
} TAPState;
+static bool net_tap_setup(TAPState *s, const NetdevTapOptions *tap,
+ const char *vhostfdname,
+ int fd, int vnet_hdr, Error **errp);
+
static void launch_script(const char *setup_script, const char *ifname,
int fd, Error **errp);
@@ -723,11 +727,6 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
int vnet_hdr, int fd, Error **errp)
{
TAPState *s = net_tap_new(peer, model, name, tap);
- int vhostfd;
-
- if (!net_tap_set_fd(s, fd, vnet_hdr, errp)) {
- goto failed;
- }
if (tap->fd || tap->fds) {
qemu_set_info_str(&s->nc, "fd=%d", fd);
@@ -746,6 +745,21 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
}
}
+ if (!net_tap_setup(s, tap, vhostfdname, fd, vnet_hdr, errp)) {
+ qemu_del_net_client(&s->nc);
+ }
+}
+
+static bool net_tap_setup(TAPState *s, const NetdevTapOptions *tap,
+ const char *vhostfdname,
+ int fd, int vnet_hdr, Error **errp)
+{
+ int vhostfd;
+
+ if (!net_tap_set_fd(s, fd, vnet_hdr, errp)) {
+ return false;
+ }
+
if (tap->has_vhost ? tap->vhost :
vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
VhostNetOptions options;
@@ -761,20 +775,20 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
if (vhostfdname) {
vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, errp);
if (vhostfd == -1) {
- goto failed;
+ return false;
}
if (!qemu_set_blocking(vhostfd, false, errp)) {
- goto failed;
+ return false;
}
} else {
vhostfd = open("/dev/vhost-net", O_RDWR);
if (vhostfd < 0) {
error_setg_errno(errp, errno,
"tap: open vhost char device failed");
- goto failed;
+ return false;
}
if (!qemu_set_blocking(vhostfd, false, errp)) {
- goto failed;
+ return false;
}
}
options.opaque = (void *)(uintptr_t)vhostfd;
@@ -789,14 +803,11 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
if (!s->vhost_net) {
error_setg(errp,
"vhost-net requested but could not be initialized");
- goto failed;
+ return false;
}
}
- return;
-
-failed:
- qemu_del_net_client(&s->nc);
+ return true;
}
static int get_fds(char *str, char *fds[], int max)
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 12/19] net/tap: move vhost fd initialization to net_tap_new()
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (10 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 11/19] net/tap: introduce net_tap_setup() Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 13/19] net/tap: finalize net_tap_set_fd() logic Vladimir Sementsov-Ogievskiy
` (7 subsequent siblings)
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
Continue the track to avoid dependency on @tap in net_tap_setup(),
no move the vhost fd initialization to net_tap_new(). So in
net_tap_setup() we simply check, do we have and vhostfd at this
point or not.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
net/tap.c | 90 ++++++++++++++++++++++++++++++-------------------------
1 file changed, 50 insertions(+), 40 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index b01cd4d6c2..d08ef070e9 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -86,11 +86,11 @@ typedef struct TAPState {
bool sndbuf_required;
int sndbuf;
+ int vhostfd;
+ uint32_t vhost_busyloop_timeout;
} TAPState;
-static bool net_tap_setup(TAPState *s, const NetdevTapOptions *tap,
- const char *vhostfdname,
- int fd, int vnet_hdr, Error **errp);
+static bool net_tap_setup(TAPState *s, int fd, int vnet_hdr, Error **errp);
static void launch_script(const char *setup_script, const char *ifname,
int fd, Error **errp);
@@ -361,6 +361,11 @@ static void tap_cleanup(NetClientState *nc)
tap_write_poll(s, false);
close(s->fd);
s->fd = -1;
+
+ if (s->vhostfd != -1) {
+ close(s->vhostfd);
+ s->vhostfd = -1;
+ }
}
static void tap_poll(NetClientState *nc, bool enable)
@@ -420,12 +425,14 @@ static NetClientInfo net_tap_info = {
};
static TAPState *net_tap_new(NetClientState *peer, const char *model,
- const char *name, const NetdevTapOptions *tap)
+ const char *name, const NetdevTapOptions *tap,
+ const char *vhostfdname, Error **errp)
{
NetClientState *nc = qemu_new_net_client(&net_tap_info, peer, model, name);
TAPState *s = DO_UPCAST(TAPState, nc, nc);
s->fd = -1;
+ s->vhostfd = -1;
if (!tap) {
return s;
@@ -435,7 +442,36 @@ static TAPState *net_tap_new(NetClientState *peer, const char *model,
s->sndbuf =
(tap->has_sndbuf && tap->sndbuf) ? MIN(tap->sndbuf, INT_MAX) : INT_MAX;
+ if (tap->has_vhost ? tap->vhost :
+ vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
+ if (vhostfdname) {
+ s->vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, errp);
+ if (s->vhostfd == -1) {
+ goto failed;
+ }
+ if (!qemu_set_blocking(s->vhostfd, false, errp)) {
+ goto failed;
+ }
+ } else {
+ s->vhostfd = open("/dev/vhost-net", O_RDWR);
+ if (s->vhostfd < 0) {
+ error_setg_errno(errp, errno,
+ "tap: open vhost char device failed");
+ goto failed;
+ }
+ if (!qemu_set_blocking(s->vhostfd, false, errp)) {
+ goto failed;
+ }
+ }
+
+ s->vhost_busyloop_timeout = tap->has_poll_us ? tap->poll_us : 0;
+ }
+
return s;
+
+failed:
+ qemu_del_net_client(&s->nc);
+ return NULL;
}
static bool net_tap_set_fd(TAPState *s, int fd, int vnet_hdr, Error **errp)
@@ -685,7 +721,7 @@ int net_init_bridge(const Netdev *netdev, const char *name,
return -1;
}
- s = net_tap_new(peer, "bridge", name, NULL);
+ s = net_tap_new(peer, "bridge", name, NULL, NULL, &error_abort);
net_tap_set_fd(s, fd, vnet_hdr, &error_abort);
qemu_set_info_str(&s->nc, "helper=%s,br=%s", helper, br);
@@ -726,7 +762,10 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
const char *downscript, const char *vhostfdname,
int vnet_hdr, int fd, Error **errp)
{
- TAPState *s = net_tap_new(peer, model, name, tap);
+ TAPState *s = net_tap_new(peer, model, name, tap, vhostfdname, errp);
+ if (!s) {
+ return;
+ }
if (tap->fd || tap->fds) {
qemu_set_info_str(&s->nc, "fd=%d", fd);
@@ -745,53 +784,24 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
}
}
- if (!net_tap_setup(s, tap, vhostfdname, fd, vnet_hdr, errp)) {
+ if (!net_tap_setup(s, fd, vnet_hdr, errp)) {
qemu_del_net_client(&s->nc);
}
}
-static bool net_tap_setup(TAPState *s, const NetdevTapOptions *tap,
- const char *vhostfdname,
- int fd, int vnet_hdr, Error **errp)
+static bool net_tap_setup(TAPState *s, int fd, int vnet_hdr, Error **errp)
{
- int vhostfd;
-
if (!net_tap_set_fd(s, fd, vnet_hdr, errp)) {
return false;
}
- if (tap->has_vhost ? tap->vhost :
- vhostfdname || (tap->has_vhostforce && tap->vhostforce)) {
+ if (s->vhostfd != -1) {
VhostNetOptions options;
options.backend_type = VHOST_BACKEND_TYPE_KERNEL;
options.net_backend = &s->nc;
- if (tap->has_poll_us) {
- options.busyloop_timeout = tap->poll_us;
- } else {
- options.busyloop_timeout = 0;
- }
-
- if (vhostfdname) {
- vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, errp);
- if (vhostfd == -1) {
- return false;
- }
- if (!qemu_set_blocking(vhostfd, false, errp)) {
- return false;
- }
- } else {
- vhostfd = open("/dev/vhost-net", O_RDWR);
- if (vhostfd < 0) {
- error_setg_errno(errp, errno,
- "tap: open vhost char device failed");
- return false;
- }
- if (!qemu_set_blocking(vhostfd, false, errp)) {
- return false;
- }
- }
- options.opaque = (void *)(uintptr_t)vhostfd;
+ options.busyloop_timeout = s->vhost_busyloop_timeout;
+ options.opaque = (void *)(uintptr_t)s->vhostfd;
options.nvqs = 2;
options.feature_bits = kernel_feature_bits;
options.get_acked_features = NULL;
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 13/19] net/tap: finalize net_tap_set_fd() logic
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (11 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 12/19] net/tap: move vhost fd initialization to net_tap_new() Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 14/19] migration: introduce .pre_incoming() vmsd handler Vladimir Sementsov-Ogievskiy
` (6 subsequent siblings)
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
Let net_tap_set_fd() do only fd-related setup.
Actually, for further backend-transfer migration for virtio-net/tap
we'll want to skip net_tap_set_fd() (as incoming fds are already
prepared by source QEMU). So move tap_read_poll() to net_tap_setup().
Don't care about using_vnet_hdr and vhost_net, the state is
zero-initialized.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
net/tap.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index d08ef070e9..7e85444ace 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -480,7 +480,6 @@ static bool net_tap_set_fd(TAPState *s, int fd, int vnet_hdr, Error **errp)
s->fd = fd;
s->host_vnet_hdr_len = vnet_hdr ? sizeof(struct virtio_net_hdr) : 0;
- s->using_vnet_hdr = false;
s->has_ufo = tap_probe_has_ufo(s->fd);
s->has_uso = tap_probe_has_uso(s->fd);
s->has_tunnel = tap_probe_has_tunnel(s->fd);
@@ -493,8 +492,6 @@ static bool net_tap_set_fd(TAPState *s, int fd, int vnet_hdr, Error **errp)
if (vnet_hdr) {
tap_fd_set_vnet_hdr_len(s->fd, s->host_vnet_hdr_len);
}
- tap_read_poll(s, true);
- s->vhost_net = NULL;
if (s->sndbuf) {
Error **e = s->sndbuf_required ? errp : NULL;
@@ -795,6 +792,8 @@ static bool net_tap_setup(TAPState *s, int fd, int vnet_hdr, Error **errp)
return false;
}
+ tap_read_poll(s, true);
+
if (s->vhostfd != -1) {
VhostNetOptions options;
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 14/19] migration: introduce .pre_incoming() vmsd handler
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (12 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 13/19] net/tap: finalize net_tap_set_fd() logic Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 15/19] net/tap: postpone tap setup to pre-incoming Vladimir Sementsov-Ogievskiy
` (5 subsequent siblings)
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
Add possibility for devices to hook into top of migrate-incoming QMP
command. It's a place, where migration capabilities and parameters
are already set, but migration downtime is not yet started (source
is still running). So here devices may do some remaining initializations
dependent on migration capabilities. This will be used in further commit
to support backend-transfer migration feature for vhost-user-blk.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Acked-by: Peter Xu <peterx@redhat.com>
---
include/migration/vmstate.h | 1 +
migration/migration.c | 4 ++++
migration/savevm.c | 15 +++++++++++++++
migration/savevm.h | 1 +
4 files changed, 21 insertions(+)
diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
index 63ccaee07a..f243518fb5 100644
--- a/include/migration/vmstate.h
+++ b/include/migration/vmstate.h
@@ -217,6 +217,7 @@ struct VMStateDescription {
int version_id;
int minimum_version_id;
MigrationPriority priority;
+ bool (*pre_incoming)(void *opaque, Error **errp);
int (*pre_load)(void *opaque);
int (*pre_load_errp)(void *opaque, Error **errp);
int (*post_load)(void *opaque, int version_id);
diff --git a/migration/migration.c b/migration/migration.c
index a63b46bbef..6ed6a10f57 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1983,6 +1983,10 @@ void qmp_migrate_incoming(const char *uri, bool has_channels,
return;
}
+ if (!qemu_pre_incoming(errp)) {
+ return;
+ }
+
if (!yank_register_instance(MIGRATION_YANK_INSTANCE, errp)) {
return;
}
diff --git a/migration/savevm.c b/migration/savevm.c
index 7b35ec4dd0..6e240ea100 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1268,6 +1268,21 @@ bool qemu_savevm_state_blocked(Error **errp)
return false;
}
+bool qemu_pre_incoming(Error **errp)
+{
+ SaveStateEntry *se;
+
+ QTAILQ_FOREACH(se, &savevm_state.handlers, entry) {
+ if (se->vmsd && se->vmsd->pre_incoming) {
+ if (!se->vmsd->pre_incoming(se->opaque, errp)) {
+ return false;
+ }
+ }
+ }
+
+ return true;
+}
+
void qemu_savevm_non_migratable_list(strList **reasons)
{
SaveStateEntry *se;
diff --git a/migration/savevm.h b/migration/savevm.h
index c337e3e3d1..4ad8997f94 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -29,6 +29,7 @@
#define QEMU_VM_COMMAND 0x08
#define QEMU_VM_SECTION_FOOTER 0x7e
+bool qemu_pre_incoming(Error **errp);
bool qemu_savevm_state_blocked(Error **errp);
void qemu_savevm_non_migratable_list(strList **reasons);
int qemu_savevm_state_prepare(Error **errp);
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 15/19] net/tap: postpone tap setup to pre-incoming
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (13 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 14/19] migration: introduce .pre_incoming() vmsd handler Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter Vladimir Sementsov-Ogievskiy
` (4 subsequent siblings)
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
As described in previous commit, to support backend-transfer migration
for virtio-net/tap, we need to postpone the decision to open the device
or to wait for incoming fds up to pre-incoming point (when we actually
can decide).
This commit only postpones TAP-open case of initialization.
We don't try to postpone the all cases of initialization, as it will
require a lot more work of refactoring the code.
So we postpone only the simple case, for which we are going to support
fd-incoming migration:
1. No fds / fd parameters: obviously, if user give fd/fds the should
be used, no incoming backend-transfer migration is possible.
2. No helper: just for simplicity. It probably possible to allow it (and
just ignore in case of backend-transfer migration), to allow user use
same cmdline on target QEMU.. But that questionable, and postponable.
3. No sciprt/downscript. It's not simple to support downscript:
we should pass the responsiblity to call it on target QEMU with
migration.. And back to source QEMU on migration failure. It
feasible, but may be implemented later on demand.
3. Concrete ifname: to not try to share it between queues, when we only
can setup queues as separate entities. Supporting undecided ifname will
require to create some extra netdev state, connecting all the taps, to
be able to iterate through them.
No part of backend-transfer migration is here, we only prepare the code
for future implementation of it.
Are net-drivers prepared to postponed initialization of NICs?
For future feature of backend-transfer migration, we are mainly
interested in virtio-net. So, let's prepare virtio-net to work with
postponed initialization of TAP (two places about early set/get
features) and for other drivers let's simply finalize initialization on
setting netdev property. Support for other drivers may be added later if
needed.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
hw/net/virtio-net.c | 78 ++++++++++++++++++++++++-
include/net/tap.h | 3 +
net/tap-win32.c | 11 ++++
net/tap.c | 136 +++++++++++++++++++++++++++++++++++++++++++-
4 files changed, 226 insertions(+), 2 deletions(-)
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 33116712eb..661413c72f 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -719,6 +719,30 @@ default_value:
return VIRTIO_NET_TX_QUEUE_DEFAULT_SIZE;
}
+static bool peer_wait_incoming(VirtIONet *n)
+{
+ NetClientState *nc = qemu_get_queue(n->nic);
+
+ if (!nc->peer) {
+ return false;
+ }
+
+ if (nc->peer->info->type != NET_CLIENT_DRIVER_TAP) {
+ return false;
+ }
+
+ return tap_wait_incoming(nc->peer);
+}
+
+static bool peer_postponed_init(VirtIONet *n, int index, Error **errp)
+{
+ NetClientState *nc = qemu_get_subqueue(n->nic, index);
+
+ assert(nc->peer->info->type == NET_CLIENT_DRIVER_TAP);
+
+ return tap_postponed_init(nc->peer, errp);
+}
+
static int peer_attach(VirtIONet *n, int index)
{
NetClientState *nc = qemu_get_subqueue(n->nic, index);
@@ -3060,7 +3084,17 @@ static void virtio_net_set_multiqueue(VirtIONet *n, int multiqueue)
n->multiqueue = multiqueue;
virtio_net_change_num_queues(n, max * 2 + 1);
- virtio_net_set_queue_pairs(n);
+ /*
+ * virtio_net_set_multiqueue() called from set_features(0) on early
+ * reset, when peer may wait for incoming (and is not initialized
+ * yet).
+ * Don't worry about it: virtio_net_set_queue_pairs() will be called
+ * later form virtio_net_post_load_device(), and anyway will be
+ * noop for local incoming migration with live backend passing.
+ */
+ if (!peer_wait_incoming(n)) {
+ virtio_net_set_queue_pairs(n);
+ }
}
static int virtio_net_pre_load_queues(VirtIODevice *vdev, uint32_t n)
@@ -3089,6 +3123,17 @@ static void virtio_net_get_features(VirtIODevice *vdev, uint64_t *features,
virtio_add_feature_ex(features, VIRTIO_NET_F_MAC);
+ if (peer_wait_incoming(n)) {
+ /*
+ * Excessive feature set is OK for early initialization when
+ * we wait for local incoming migration: actual guest-negotiated
+ * features will come with migration stream anyway. And we are sure
+ * that we support same host-features as source, because the backend
+ * is the same (the same TAP device, for example).
+ */
+ return;
+ }
+
if (!peer_has_vnet_hdr(n)) {
virtio_clear_feature_ex(features, VIRTIO_NET_F_CSUM);
virtio_clear_feature_ex(features, VIRTIO_NET_F_HOST_TSO4);
@@ -3180,6 +3225,18 @@ static void virtio_net_get_features(VirtIODevice *vdev, uint64_t *features,
}
}
+static bool virtio_net_update_host_features(VirtIONet *n, Error **errp)
+{
+ ERRP_GUARD();
+ VirtIODevice *vdev = VIRTIO_DEVICE(n);
+
+ peer_test_vnet_hdr(n);
+
+ virtio_net_get_features(vdev, &vdev->host_features, errp);
+
+ return !*errp;
+}
+
static int virtio_net_post_load_device(void *opaque, int version_id)
{
VirtIONet *n = opaque;
@@ -4177,6 +4234,24 @@ static bool dev_unplug_pending(void *opaque)
return vdc->primary_unplug_pending(dev);
}
+static bool vhost_user_blk_pre_incoming(void *opaque, Error **errp)
+{
+ VirtIONet *n = opaque;
+ int i;
+
+ if (peer_wait_incoming(n)) {
+ for (i = 0; i < n->max_queue_pairs; i++) {
+ if (!peer_postponed_init(n, i, errp)) {
+ return false;
+ }
+ }
+
+ return virtio_net_update_host_features(n, errp);
+ }
+
+ return true;
+}
+
static const VMStateDescription vmstate_virtio_net = {
.name = "virtio-net",
.minimum_version_id = VIRTIO_NET_VM_VERSION,
@@ -4185,6 +4260,7 @@ static const VMStateDescription vmstate_virtio_net = {
VMSTATE_VIRTIO_DEVICE,
VMSTATE_END_OF_LIST()
},
+ .pre_incoming = vhost_user_blk_pre_incoming,
.pre_save = virtio_net_pre_save,
.dev_unplug_pending = dev_unplug_pending,
};
diff --git a/include/net/tap.h b/include/net/tap.h
index 6f34f13eae..5a926ba513 100644
--- a/include/net/tap.h
+++ b/include/net/tap.h
@@ -33,4 +33,7 @@ int tap_disable(NetClientState *nc);
int tap_get_fd(NetClientState *nc);
+bool tap_wait_incoming(NetClientState *nc);
+bool tap_postponed_init(NetClientState *nc, Error **errp);
+
#endif /* QEMU_NET_TAP_H */
diff --git a/net/tap-win32.c b/net/tap-win32.c
index 38baf90e0b..7430cdf6fa 100644
--- a/net/tap-win32.c
+++ b/net/tap-win32.c
@@ -766,3 +766,14 @@ int tap_disable(NetClientState *nc)
{
abort();
}
+
+bool tap_wait_incoming(NetClientState *nc)
+{
+ return false;
+}
+
+bool tap_postponed_init(NetClientState *nc, Error **errp)
+{
+ error_setg(errp, "win32 tap postponed init is not supported");
+ return false;
+}
diff --git a/net/tap.c b/net/tap.c
index 7e85444ace..8afbf3b407 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -35,7 +35,9 @@
#include "net/eth.h"
#include "net/net.h"
#include "clients.h"
+#include "migration/misc.h"
#include "monitor/monitor.h"
+#include "system/runstate.h"
#include "system/system.h"
#include "qapi/error.h"
#include "qemu/cutils.h"
@@ -88,6 +90,13 @@ typedef struct TAPState {
int sndbuf;
int vhostfd;
uint32_t vhost_busyloop_timeout;
+
+ /* for postponed setup */
+ QTAILQ_ENTRY(TAPState) next;
+ bool vnet_hdr_required;
+ int vnet_hdr;
+ bool mq_required;
+ char *ifname;
} TAPState;
static bool net_tap_setup(TAPState *s, int fd, int vnet_hdr, Error **errp);
@@ -366,6 +375,8 @@ static void tap_cleanup(NetClientState *nc)
close(s->vhostfd);
s->vhostfd = -1;
}
+
+ g_free(s->ifname);
}
static void tap_poll(NetClientState *nc, bool enable)
@@ -383,6 +394,25 @@ static bool tap_set_steering_ebpf(NetClientState *nc, int prog_fd)
return tap_fd_set_steering_ebpf(s->fd, prog_fd) == 0;
}
+static bool tap_check_peer_type(NetClientState *nc, ObjectClass *oc,
+ Error **errp)
+{
+ TAPState *s = DO_UPCAST(TAPState, nc, nc);
+ const char *driver = object_class_get_name(oc);
+
+ if (!g_str_has_prefix(driver, "virtio-net-")) {
+ /*
+ * Only virtio-net support postponed TAP initialization, so
+ * for other drivers let's finalize initialization now.
+ */
+ if (tap_wait_incoming(nc)) {
+ return tap_postponed_init(&s->nc, errp);
+ }
+ }
+
+ return true;
+}
+
int tap_get_fd(NetClientState *nc)
{
TAPState *s = DO_UPCAST(TAPState, nc, nc);
@@ -422,6 +452,7 @@ static NetClientInfo net_tap_info = {
.set_vnet_be = tap_set_vnet_be,
.set_steering_ebpf = tap_set_steering_ebpf,
.get_vhost_net = tap_get_vhost_net,
+ .check_peer_type = tap_check_peer_type,
};
static TAPState *net_tap_new(NetClientState *peer, const char *model,
@@ -845,6 +876,93 @@ static int get_fds(char *str, char *fds[], int max)
return i;
}
+#define TAP_OPEN_IFNAME_SZ 128
+
+bool tap_postponed_init(NetClientState *nc, Error **errp)
+{
+ TAPState *s = DO_UPCAST(TAPState, nc, nc);
+ char ifname[TAP_OPEN_IFNAME_SZ];
+ int vnet_hdr = s->vnet_hdr;
+ int fd;
+
+ pstrcpy(ifname, sizeof(ifname), s->ifname);
+ fd = net_tap_open(&vnet_hdr, s->vnet_hdr_required, NULL,
+ ifname, sizeof(ifname),
+ s->mq_required, errp);
+ if (fd < 0) {
+ goto fail;
+ }
+
+ if (!net_tap_setup(s, fd, vnet_hdr, errp)) {
+ goto fail;
+ }
+
+ return true;
+
+fail:
+ qemu_del_net_client(&s->nc);
+ return false;
+}
+
+static bool check_no_script(const char *script_arg)
+{
+ return script_arg &&
+ (script_arg[0] == '\0' || strcmp(script_arg, "no") == 0);
+}
+
+static bool tap_postpone_init(const NetdevTapOptions *tap,
+ const char *name, NetClientState *peer,
+ bool *postponed, Error **errp)
+{
+ int queues = tap->has_queues ? tap->queues : 1;
+
+ *postponed = false;
+
+ if (!runstate_check(RUN_STATE_INMIGRATE)) {
+ return true;
+ }
+
+ if (tap->fd || tap->fds || tap->helper || tap->vhostfds) {
+ return true;
+ }
+
+ if (!tap->ifname || tap->ifname[0] == '\0' ||
+ strstr(tap->ifname, "%d") != NULL) {
+ /*
+ * It's hard to postpone logic of parsing template or
+ * absent ifname
+ */
+ return true;
+ }
+
+ /*
+ * Supporting downscipt means understanding and realizing the logic of
+ * transfer of responsibility to call it in target QEMU process. Or in
+ * source QEMU process in case of migration failure. So for simplicity we
+ * don't support scripts together with fds migration.
+ */
+ if (!check_no_script(tap->script) || !check_no_script(tap->downscript)) {
+ return true;
+ }
+
+ for (int i = 0; i < queues; i++) {
+ TAPState *s = net_tap_new(peer, "tap", name, tap, NULL, errp);
+ if (!s) {
+ return false;
+ }
+
+ s->vnet_hdr_required = tap->has_vnet_hdr && tap->vnet_hdr;
+ s->vnet_hdr = tap->has_vnet_hdr ? tap->vnet_hdr : 1;
+ s->mq_required = queues > 1;
+ s->ifname = g_strdup(tap->ifname);
+ qemu_set_info_str(&s->nc, "ifname=%s,script=no,downscript=no",
+ tap->ifname);
+ }
+
+ *postponed = true;
+ return true;
+}
+
int net_init_tap(const Netdev *netdev, const char *name,
NetClientState *peer, Error **errp)
{
@@ -853,8 +971,9 @@ int net_init_tap(const Netdev *netdev, const char *name,
/* for the no-fd, no-helper case */
Error *err = NULL;
const char *vhostfdname;
- char ifname[128];
+ char ifname[TAP_OPEN_IFNAME_SZ];
int ret = 0;
+ bool postponed = false;
assert(netdev->type == NET_CLIENT_DRIVER_TAP);
tap = &netdev->u.tap;
@@ -873,6 +992,14 @@ int net_init_tap(const Netdev *netdev, const char *name,
return -1;
}
+ if (!tap_postpone_init(tap, name, peer, &postponed, errp)) {
+ return -1;
+ }
+
+ if (postponed) {
+ return 0;
+ }
+
if (tap->fd) {
if (tap->ifname || tap->script || tap->downscript ||
tap->has_vnet_hdr || tap->helper || tap->has_queues ||
@@ -1097,3 +1224,10 @@ int tap_disable(NetClientState *nc)
return ret;
}
}
+
+bool tap_wait_incoming(NetClientState *nc)
+{
+ TAPState *s = DO_UPCAST(TAPState, nc, nc);
+
+ return s->fd == -1;
+}
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (14 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 15/19] net/tap: postpone tap setup to pre-incoming Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 18:19 ` Peter Xu
2025-10-16 10:56 ` Markus Armbruster
2025-10-15 13:21 ` [PATCH v8 17/19] virtio-net: support backend-transfer migration for virtio-net/tap Vladimir Sementsov-Ogievskiy
` (3 subsequent siblings)
19 siblings, 2 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
This parameter enables backend-transfer feature: all devices
which support it will migrate their backends (for example a TAP
device, by passing open file descriptor to migration channel).
Currently no such devices, so the new parameter is a noop.
Next commit will add support for virtio-net, to migrate its
TAP backend.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
migration/options.c | 18 ++++++++++++++++++
migration/options.h | 2 ++
qapi/migration.json | 38 ++++++++++++++++++++++++++++++++------
3 files changed, 52 insertions(+), 6 deletions(-)
diff --git a/migration/options.c b/migration/options.c
index 5183112775..a461b07b54 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -262,6 +262,12 @@ bool migrate_mapped_ram(void)
return s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM];
}
+bool migrate_backend_transfer(void)
+{
+ MigrationState *s = migrate_get_current();
+ return s->parameters.backend_transfer;
+}
+
bool migrate_ignore_shared(void)
{
MigrationState *s = migrate_get_current();
@@ -963,6 +969,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
params->cpr_exec_command = QAPI_CLONE(strList,
s->parameters.cpr_exec_command);
+ params->has_backend_transfer = true;
+ params->backend_transfer = s->parameters.backend_transfer;
+
return params;
}
@@ -997,6 +1006,7 @@ void migrate_params_init(MigrationParameters *params)
params->has_zero_page_detection = true;
params->has_direct_io = true;
params->has_cpr_exec_command = true;
+ params->has_backend_transfer = true;
}
/*
@@ -1305,6 +1315,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
if (params->has_cpr_exec_command) {
dest->cpr_exec_command = params->cpr_exec_command;
}
+
+ if (params->has_backend_transfer) {
+ dest->backend_transfer = params->backend_transfer;
+ }
}
static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
@@ -1443,6 +1457,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
s->parameters.cpr_exec_command =
QAPI_CLONE(strList, params->cpr_exec_command);
}
+
+ if (params->has_backend_transfer) {
+ s->parameters.backend_transfer = params->backend_transfer;
+ }
}
void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
diff --git a/migration/options.h b/migration/options.h
index 82d839709e..755ba1c024 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -87,6 +87,8 @@ const char *migrate_tls_hostname(void);
uint64_t migrate_xbzrle_cache_size(void);
ZeroPageDetection migrate_zero_page_detection(void);
+bool migrate_backend_transfer(void);
+
/* parameters helpers */
bool migrate_params_check(MigrationParameters *params, Error **errp);
diff --git a/qapi/migration.json b/qapi/migration.json
index be0f3fcc12..35601a1f87 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -951,9 +951,16 @@
# is @cpr-exec. The first list element is the program's filename,
# the remainder its arguments. (Since 10.2)
#
+# @backend-transfer: Enable backend-transfer feature for devices that
+# supports it. In general that means that backend state and its
+# file descriptors are passed to the destination in the migraton
+# channel (which must be a UNIX socket). Individual devices
+# declare the support for backend-transfer by per-device
+# backend-transfer option. (Since 10.2)
+#
# Features:
#
-# @unstable: Members @x-checkpoint-delay and
+# @unstable: Members @backend-transfer, @x-checkpoint-delay and
# @x-vcpu-dirty-limit-period are experimental.
#
# Since: 2.4
@@ -978,7 +985,8 @@
'mode',
'zero-page-detection',
'direct-io',
- 'cpr-exec-command'] }
+ 'cpr-exec-command',
+ { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
##
# @MigrateSetParameters:
@@ -1137,9 +1145,16 @@
# is @cpr-exec. The first list element is the program's filename,
# the remainder its arguments. (Since 10.2)
#
+# @backend-transfer: Enable backend-transfer feature for devices that
+# supports it. In general that means that backend state and its
+# file descriptors are passed to the destination in the migraton
+# channel (which must be a UNIX socket). Individual devices
+# declare the support for backend-transfer by per-device
+# backend-transfer option. (Since 10.2)
+#
# Features:
#
-# @unstable: Members @x-checkpoint-delay and
+# @unstable: Members @backend-transfer, @x-checkpoint-delay and
# @x-vcpu-dirty-limit-period are experimental.
#
# TODO: either fuse back into `MigrationParameters`, or make
@@ -1179,7 +1194,9 @@
'*mode': 'MigMode',
'*zero-page-detection': 'ZeroPageDetection',
'*direct-io': 'bool',
- '*cpr-exec-command': [ 'str' ]} }
+ '*cpr-exec-command': [ 'str' ],
+ '*backend-transfer': { 'type': 'bool',
+ 'features': [ 'unstable' ] } } }
##
# @migrate-set-parameters:
@@ -1352,9 +1369,16 @@
# is @cpr-exec. The first list element is the program's filename,
# the remainder its arguments. (Since 10.2)
#
+# @backend-transfer: Enable backend-transfer feature for devices that
+# supports it. In general that means that backend state and its
+# file descriptors are passed to the destination in the migraton
+# channel (which must be a UNIX socket). Individual devices
+# declare the support for backend-transfer by per-device
+# backend-transfer option. (Since 10.2)
+#
# Features:
#
-# @unstable: Members @x-checkpoint-delay and
+# @unstable: Members @backend-transfer, @x-checkpoint-delay and
# @x-vcpu-dirty-limit-period are experimental.
#
# Since: 2.4
@@ -1391,7 +1415,9 @@
'*mode': 'MigMode',
'*zero-page-detection': 'ZeroPageDetection',
'*direct-io': 'bool',
- '*cpr-exec-command': [ 'str' ]} }
+ '*cpr-exec-command': [ 'str' ],
+ '*backend-transfer': { 'type': 'bool',
+ 'features': [ 'unstable' ] } } }
##
# @query-migrate-parameters:
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 17/19] virtio-net: support backend-transfer migration for virtio-net/tap
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (15 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-16 8:23 ` Daniel P. Berrangé
2025-10-15 13:21 ` [PATCH v8 18/19] tests/functional: add skipWithoutSudo() decorator Vladimir Sementsov-Ogievskiy
` (2 subsequent siblings)
19 siblings, 1 reply; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
Add virtio-net option backend-transfer, which is true by default,
but false for older machine types, which doesn't support the feature.
For backend-transfer migration, both global migration parameter
backend-transfer and virtio-net backend-transfer option should be
set to true.
With the parameters enabled (both on source and target) of-course, and
with unix-socket used as migration-channel, we do "migrate" the
virtio-net backend - TAP device, with all its fds.
This way management tool should not care about creating new TAP, and
should not handle switching to it. Migration downtime become shorter.
How it works:
1. For incoming migration, we postpone TAP initialization up to
pre-incoming point.
2. At pre-incoming point we see that "virtio-net-tap" is set for
backend-transfer, so we postpone TAP initialization up to
post-load
3. During virtio-load, we get TAP state (and fds) as part of
virtio-net state
4. In post-load we finalize TAP initialization
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
hw/core/machine.c | 1 +
hw/net/virtio-net.c | 75 +++++++++++++++++++++++++++++++++-
include/hw/virtio/virtio-net.h | 1 +
include/net/tap.h | 2 +
net/tap.c | 45 +++++++++++++++++++-
5 files changed, 122 insertions(+), 2 deletions(-)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 681adbb7ac..a3d77f5604 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -40,6 +40,7 @@
GlobalProperty hw_compat_10_1[] = {
{ TYPE_ACPI_GED, "x-has-hest-addr", "false" },
+ { TYPE_VIRTIO_NET, "backend-transfer", "false" },
};
const size_t hw_compat_10_1_len = G_N_ELEMENTS(hw_compat_10_1);
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 661413c72f..5f9711dee7 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -38,6 +38,7 @@
#include "qapi/qapi-events-migration.h"
#include "hw/virtio/virtio-access.h"
#include "migration/misc.h"
+#include "migration/options.h"
#include "standard-headers/linux/ethtool.h"
#include "system/system.h"
#include "system/replay.h"
@@ -3358,6 +3359,9 @@ struct VirtIONetMigTmp {
uint16_t curr_queue_pairs_1;
uint8_t has_ufo;
uint32_t has_vnet_hdr;
+
+ NetClientState *ncs;
+ uint32_t max_queue_pairs;
};
/* The 2nd and subsequent tx_waiting flags are loaded later than
@@ -3627,6 +3631,71 @@ static const VMStateDescription vhost_user_net_backend_state = {
}
};
+static bool virtio_net_is_tap_mig(void *opaque, int version_id)
+{
+ VirtIONet *n = opaque;
+ NetClientState *nc;
+
+ nc = qemu_get_queue(n->nic);
+
+ return migrate_backend_transfer() && n->backend_transfer && nc->peer &&
+ nc->peer->info->type == NET_CLIENT_DRIVER_TAP;
+}
+
+static int virtio_net_nic_pre_save(void *opaque)
+{
+ struct VirtIONetMigTmp *tmp = opaque;
+
+ tmp->ncs = tmp->parent->nic->ncs;
+ tmp->max_queue_pairs = tmp->parent->max_queue_pairs;
+
+ return 0;
+}
+
+static int virtio_net_nic_pre_load(void *opaque)
+{
+ /* Reuse the pointer setup from save */
+ virtio_net_nic_pre_save(opaque);
+
+ return 0;
+}
+
+static int virtio_net_nic_post_load(void *opaque, int version_id)
+{
+ struct VirtIONetMigTmp *tmp = opaque;
+ Error *local_err = NULL;
+
+ if (!virtio_net_update_host_features(tmp->parent, &local_err)) {
+ error_report_err(local_err);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static const VMStateDescription vmstate_virtio_net_nic_nc = {
+ .name = "virtio-net-nic-nc",
+ .fields = (const VMStateField[]) {
+ VMSTATE_STRUCT_POINTER(peer, NetClientState, vmstate_tap,
+ NetClientState),
+ VMSTATE_END_OF_LIST()
+ },
+};
+
+static const VMStateDescription vmstate_virtio_net_nic = {
+ .name = "virtio-net-nic",
+ .pre_load = virtio_net_nic_pre_load,
+ .pre_save = virtio_net_nic_pre_save,
+ .post_load = virtio_net_nic_post_load,
+ .fields = (const VMStateField[]) {
+ VMSTATE_STRUCT_VARRAY_POINTER_UINT32(ncs, struct VirtIONetMigTmp,
+ max_queue_pairs,
+ vmstate_virtio_net_nic_nc,
+ struct NetClientState),
+ VMSTATE_END_OF_LIST()
+ },
+};
+
static const VMStateDescription vmstate_virtio_net_device = {
.name = "virtio-net-device",
.version_id = VIRTIO_NET_VM_VERSION,
@@ -3658,6 +3727,9 @@ static const VMStateDescription vmstate_virtio_net_device = {
* but based on the uint.
*/
VMSTATE_BUFFER_POINTER_UNSAFE(vlans, VirtIONet, 0, MAX_VLAN >> 3),
+ VMSTATE_WITH_TMP_TEST(VirtIONet, virtio_net_is_tap_mig,
+ struct VirtIONetMigTmp,
+ vmstate_virtio_net_nic),
VMSTATE_WITH_TMP(VirtIONet, struct VirtIONetMigTmp,
vmstate_virtio_net_has_vnet),
VMSTATE_UINT8(mac_table.multi_overflow, VirtIONet),
@@ -4239,7 +4311,7 @@ static bool vhost_user_blk_pre_incoming(void *opaque, Error **errp)
VirtIONet *n = opaque;
int i;
- if (peer_wait_incoming(n)) {
+ if (!virtio_net_is_tap_mig(opaque, 0) && peer_wait_incoming(n)) {
for (i = 0; i < n->max_queue_pairs; i++) {
if (!peer_postponed_init(n, i, errp)) {
return false;
@@ -4389,6 +4461,7 @@ static const Property virtio_net_properties[] = {
host_features_ex,
VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM,
false),
+ DEFINE_PROP_BOOL("backend-transfer", VirtIONet, backend_transfer, true),
};
static void virtio_net_class_init(ObjectClass *klass, const void *data)
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 5b8ab7bda7..bf07f8a4cb 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -231,6 +231,7 @@ struct VirtIONet {
struct EBPFRSSContext ebpf_rss;
uint32_t nr_ebpf_rss_fds;
char **ebpf_rss_fds;
+ bool backend_transfer;
};
size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
diff --git a/include/net/tap.h b/include/net/tap.h
index 5a926ba513..506f7ab719 100644
--- a/include/net/tap.h
+++ b/include/net/tap.h
@@ -36,4 +36,6 @@ int tap_get_fd(NetClientState *nc);
bool tap_wait_incoming(NetClientState *nc);
bool tap_postponed_init(NetClientState *nc, Error **errp);
+extern const VMStateDescription vmstate_tap;
+
#endif /* QEMU_NET_TAP_H */
diff --git a/net/tap.c b/net/tap.c
index 8afbf3b407..b9c12dd64c 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -819,7 +819,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
static bool net_tap_setup(TAPState *s, int fd, int vnet_hdr, Error **errp)
{
- if (!net_tap_set_fd(s, fd, vnet_hdr, errp)) {
+ if (fd != -1 && !net_tap_set_fd(s, fd, vnet_hdr, errp)) {
return false;
}
@@ -1225,6 +1225,49 @@ int tap_disable(NetClientState *nc)
}
}
+static int tap_pre_load(void *opaque)
+{
+ TAPState *s = opaque;
+
+ if (s->fd != -1) {
+ error_report(
+ "TAP is already initialized and cannot receive incoming fd");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int tap_post_load(void *opaque, int version_id)
+{
+ TAPState *s = opaque;
+ Error *local_err = NULL;
+
+ if (!net_tap_setup(s, -1, -1, &local_err)) {
+ error_report_err(local_err);
+ qemu_del_net_client(&s->nc);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+const VMStateDescription vmstate_tap = {
+ .name = "net-tap",
+ .pre_load = tap_pre_load,
+ .post_load = tap_post_load,
+ .fields = (const VMStateField[]) {
+ VMSTATE_FD(fd, TAPState),
+ VMSTATE_BOOL(using_vnet_hdr, TAPState),
+ VMSTATE_BOOL(has_ufo, TAPState),
+ VMSTATE_BOOL(has_uso, TAPState),
+ VMSTATE_BOOL(has_tunnel, TAPState),
+ VMSTATE_BOOL(enabled, TAPState),
+ VMSTATE_UINT32(host_vnet_hdr_len, TAPState),
+ VMSTATE_END_OF_LIST()
+ }
+};
+
bool tap_wait_incoming(NetClientState *nc)
{
TAPState *s = DO_UPCAST(TAPState, nc, nc);
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 18/19] tests/functional: add skipWithoutSudo() decorator
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (16 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 17/19] virtio-net: support backend-transfer migration for virtio-net/tap Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 19/19] tests/functional: add test_x86_64_tap_migration Vladimir Sementsov-Ogievskiy
2025-10-18 15:38 ` [PATCH v8 00/19] virtio-net: live-TAP local migration Lei Yang
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
To be used in the next commit: that would be a test for TAP
networking, and it will need to setup TAP device.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Tested-by: Lei Yang <leiyang@redhat.com>
Reviewed-by: Maksim Davydov <davydov-max@yandex-team.ru>
---
tests/functional/qemu_test/decorators.py | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/tests/functional/qemu_test/decorators.py b/tests/functional/qemu_test/decorators.py
index b239295804..125d31dda6 100644
--- a/tests/functional/qemu_test/decorators.py
+++ b/tests/functional/qemu_test/decorators.py
@@ -6,6 +6,7 @@
import os
import platform
import resource
+import subprocess
from unittest import skipIf, skipUnless
from .cmd import which
@@ -167,3 +168,18 @@ def skipLockedMemoryTest(locked_memory):
ulimit_memory == resource.RLIM_INFINITY or ulimit_memory >= locked_memory * 1024,
f'Test required {locked_memory} kB of available locked memory',
)
+
+'''
+Decorator to skip execution of a test if passwordless
+sudo command is not available.
+'''
+def skipWithoutSudo():
+ proc = subprocess.run(["sudo", "-n", "/bin/true"],
+ stdin=subprocess.PIPE,
+ stdout=subprocess.PIPE,
+ stderr=subprocess.STDOUT,
+ universal_newlines=True,
+ check=False)
+
+ return skipUnless(proc.returncode == 0,
+ f'requires password-less sudo access: {proc.stdout}')
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* [PATCH v8 19/19] tests/functional: add test_x86_64_tap_migration
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (17 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 18/19] tests/functional: add skipWithoutSudo() decorator Vladimir Sementsov-Ogievskiy
@ 2025-10-15 13:21 ` Vladimir Sementsov-Ogievskiy
2025-10-18 15:38 ` [PATCH v8 00/19] virtio-net: live-TAP local migration Lei Yang
19 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 13:21 UTC (permalink / raw)
To: mst, jasowang
Cc: peterx, farosas, sw, eblake, armbru, thuth, philmd, berrange,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, vsementsov, raphael.s.norwitz
Add test for a new backend-transfer migration of virtio-net/tap, with fd
passing through unix socket.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
---
tests/functional/test_x86_64_tap_migration.py | 395 ++++++++++++++++++
1 file changed, 395 insertions(+)
create mode 100644 tests/functional/test_x86_64_tap_migration.py
diff --git a/tests/functional/test_x86_64_tap_migration.py b/tests/functional/test_x86_64_tap_migration.py
new file mode 100644
index 0000000000..1f88ff174c
--- /dev/null
+++ b/tests/functional/test_x86_64_tap_migration.py
@@ -0,0 +1,395 @@
+#!/usr/bin/env python3
+#
+# Functional test that tests TAP local migration
+# with fd passing
+#
+# Copyright (c) Yandex Technologies LLC, 2025
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+import os
+import time
+import subprocess
+from subprocess import run
+import signal
+from typing import Tuple
+
+from qemu_test import (
+ LinuxKernelTest,
+ Asset,
+ exec_command_and_wait_for_pattern,
+)
+from qemu_test.decorators import skipWithoutSudo
+
+GUEST_IP = "10.0.1.2"
+GUEST_IP_MASK = f"{GUEST_IP}/24"
+GUEST_MAC = "d6:0d:75:f8:0f:b7"
+HOST_IP = "10.0.1.1"
+HOST_IP_MASK = f"{HOST_IP}/24"
+TAP_ID = "tap0"
+TAP_ID2 = "tap1"
+TAP_MAC = "e6:1d:44:b5:03:5d"
+
+
+def ip(args, check=True) -> None:
+ """Run ip command with sudo"""
+ run(["sudo", "ip"] + args, check=check)
+
+
+def del_tap(tap_name: str = TAP_ID) -> None:
+ ip(["tuntap", "del", tap_name, "mode", "tap", "multi_queue"], check=False)
+
+
+def init_tap(tap_name: str = TAP_ID, with_ip: bool = True) -> None:
+ ip(["tuntap", "add", "dev", tap_name, "mode", "tap", "multi_queue"])
+ if with_ip:
+ ip(["link", "set", "dev", tap_name, "address", TAP_MAC])
+ ip(["addr", "add", HOST_IP_MASK, "dev", tap_name])
+ ip(["link", "set", tap_name, "up"])
+
+
+def switch_network_to_tap2() -> None:
+ ip(["link", "set", TAP_ID2, "down"])
+ ip(["link", "set", TAP_ID, "down"])
+ ip(["addr", "delete", HOST_IP_MASK, "dev", TAP_ID])
+ ip(["link", "set", "dev", TAP_ID2, "address", TAP_MAC])
+ ip(["addr", "add", HOST_IP_MASK, "dev", TAP_ID2])
+ ip(["link", "set", TAP_ID2, "up"])
+
+
+def parse_ping_line(line: str) -> float:
+ # suspect lines like
+ # [1748524876.590509] 64 bytes from 94.245.155.3 \
+ # (94.245.155.3): icmp_seq=1 ttl=250 time=101 ms
+ spl = line.split()
+ return float(spl[0][1:-1])
+
+
+def parse_ping_output(out) -> Tuple[bool, float, float]:
+ lines = [x for x in out.split("\n") if x.startswith("[")]
+
+ try:
+ first_no_ans = next(
+ (ind for ind in range(len(lines)) if lines[ind][20:26] == "no ans")
+ )
+ except StopIteration:
+ return False, parse_ping_line(lines[0]), parse_ping_line(lines[-1])
+
+ last_no_ans = next(
+ ind
+ for ind in range(len(lines) - 1, -1, -1)
+ if lines[ind][20:26] == "no ans"
+ )
+
+ return (
+ True,
+ parse_ping_line(lines[first_no_ans]),
+ parse_ping_line(lines[last_no_ans]),
+ )
+
+
+def wait_migration_finish(source_vm, target_vm):
+ migr_events = (
+ ("MIGRATION", {"data": {"status": "completed"}}),
+ ("MIGRATION", {"data": {"status": "failed"}}),
+ )
+
+ source_e = source_vm.events_wait(migr_events)["data"]
+ target_e = target_vm.events_wait(migr_events)["data"]
+
+ source_s = source_vm.cmd("query-status")["status"]
+ target_s = target_vm.cmd("query-status")["status"]
+
+ assert (
+ source_e["status"] == "completed"
+ and target_e["status"] == "completed"
+ and source_s == "postmigrate"
+ and target_s == "paused"
+ ), f"""Migration failed:
+ SRC status: {source_s}
+ SRC event: {source_e}
+ TGT status: {target_s}
+ TGT event:{target_e}"""
+
+
+@skipWithoutSudo()
+class VhostUserBlkFdMigration(LinuxKernelTest):
+
+ ASSET_KERNEL = Asset(
+ (
+ "https://archives.fedoraproject.org/pub/archive/fedora/linux/releases"
+ "/31/Server/x86_64/os/images/pxeboot/vmlinuz"
+ ),
+ "d4738d03dbbe083ca610d0821d0a8f1488bebbdccef54ce33e3adb35fda00129",
+ )
+
+ ASSET_INITRD = Asset(
+ (
+ "https://archives.fedoraproject.org/pub/archive/fedora/linux/releases"
+ "/31/Server/x86_64/os/images/pxeboot/initrd.img"
+ ),
+ "277cd6c7adf77c7e63d73bbb2cded8ef9e2d3a2f100000e92ff1f8396513cd8b",
+ )
+
+ ASSET_ALPINE_ISO = Asset(
+ (
+ "https://dl-cdn.alpinelinux.org/"
+ "alpine/v3.22/releases/x86_64/alpine-standard-3.22.1-x86_64.iso"
+ ),
+ "96d1b44ea1b8a5a884f193526d92edb4676054e9fa903ad2f016441a0fe13089",
+ )
+
+ def setUp(self):
+ super().setUp()
+
+ init_tap()
+
+ self.outer_ping_proc = None
+
+ def tearDown(self):
+ try:
+ del_tap(TAP_ID)
+ del_tap(TAP_ID2)
+
+ if self.outer_ping_proc:
+ self.stop_outer_ping()
+ finally:
+ super().tearDown()
+
+ def start_outer_ping(self) -> None:
+ assert self.outer_ping_proc is None
+ self.outer_ping_log = self.scratch_file("ping.log")
+ with open(self.outer_ping_log, "w") as f:
+ self.outer_ping_proc = subprocess.Popen(
+ ["ping", "-i", "0", "-O", "-D", GUEST_IP],
+ text=True,
+ stdout=f,
+ )
+
+ def stop_outer_ping(self) -> str:
+ assert self.outer_ping_proc
+ self.outer_ping_proc.send_signal(signal.SIGINT)
+
+ self.outer_ping_proc.communicate(timeout=5)
+ self.outer_ping_proc = None
+
+ with open(self.outer_ping_log) as f:
+ return f.read()
+
+ def stop_ping_and_check(self, stop_time, resume_time):
+ ping_res = self.stop_outer_ping()
+
+ discon, a, b = parse_ping_output(ping_res)
+
+ if not discon:
+ text = (
+ f"STOP: {stop_time}, RESUME: {resume_time}," f"PING: {a} - {b}"
+ )
+ if a > stop_time or b < resume_time:
+ self.fail(f"PING failed: {text}")
+ self.log.info(f"PING: no packets lost: {text}")
+ return
+
+ text = (
+ f"STOP: {stop_time}, RESUME: {resume_time},"
+ f"PING: disconnect: {a} - {b}"
+ )
+ self.log.info(text)
+ eps = 0.01
+ if a < stop_time - eps or b > resume_time + eps:
+ self.fail(text)
+
+ def one_ping_from_guest(self, vm) -> None:
+ exec_command_and_wait_for_pattern(
+ self,
+ f"ping -c 1 -W 1 {HOST_IP}",
+ "1 packets transmitted, 1 packets received",
+ "1 packets transmitted, 0 packets received",
+ vm=vm,
+ )
+ self.wait_for_console_pattern("# ", vm=vm)
+
+ def one_ping_from_host(self) -> None:
+ run(["ping", "-c", "1", "-W", "1", GUEST_IP])
+
+ def setup_shared_memory(self):
+ shm_path = f"/dev/shm/qemu_test_{os.getpid()}"
+
+ try:
+ with open(shm_path, "wb") as f:
+ f.write(b"\0" * (1024 * 1024 * 1024)) # 1GB
+ except Exception as e:
+ self.fail(f"Failed to create shared memory file: {e}")
+
+ return shm_path
+
+ def prepare_and_launch_vm(
+ self, shm_path, vhost, incoming=False, vm=None, backend_transfer=True
+ ):
+ if not vm:
+ vm = self.vm
+
+ vm.set_console()
+ vm.add_args("-accel", "kvm")
+ vm.add_args("-device", "pcie-pci-bridge,id=pci.1,bus=pcie.0")
+ vm.add_args("-m", "1G")
+
+ vm.add_args(
+ "-object",
+ f"memory-backend-file,id=ram0,size=1G,mem-path={shm_path},share=on",
+ )
+ vm.add_args("-machine", "memory-backend=ram0")
+
+ vm.add_args(
+ "-drive",
+ f"file={self.ASSET_ALPINE_ISO.fetch()},media=cdrom,format=raw",
+ )
+
+ vm.add_args("-S")
+
+ if incoming:
+ vm.add_args("-incoming", "defer")
+
+ vm_s = "target" if incoming else "source"
+ self.log.info(f"Launching {vm_s} VM")
+ vm.launch()
+
+ self.set_migration_capabilities(vm, backend_transfer)
+
+ if not backend_transfer:
+ tap_name = TAP_ID2 if incoming else TAP_ID
+ else:
+ tap_name = TAP_ID
+
+ self.add_virtio_net(vm, vhost, tap_name, backend_transfer)
+
+ def add_virtio_net(self, vm, vhost: bool, tap_name: str,
+ backend_transfer: bool):
+ netdev_params = {
+ "id": "netdev.1",
+ "vhost": vhost,
+ "type": "tap",
+ "ifname": tap_name,
+ "script": "no",
+ "downscript": "no",
+ "queues": 4,
+ "vnet_hdr": True,
+ }
+
+ vm.cmd("netdev_add", netdev_params)
+
+ vm.cmd(
+ "device_add",
+ driver="virtio-net-pci",
+ romfile="",
+ id="vnet.1",
+ netdev="netdev.1",
+ mq=True,
+ vectors=18,
+ bus="pci.1",
+ mac=GUEST_MAC,
+ disable_legacy="off",
+ backend_transfer=backend_transfer,
+ )
+
+ def set_migration_capabilities(self, vm, backend_transfer=True):
+ vm.cmd("migrate-set-capabilities", { "capabilities": [
+ {"capability": "events", "state": True},
+ {"capability": "x-ignore-shared", "state": True},
+ ]})
+ vm.cmd("migrate-set-parameters", {
+ "backend-transfer": backend_transfer
+ })
+
+ def setup_guest_network(self) -> None:
+ exec_command_and_wait_for_pattern(self, "ip addr", "# ")
+ exec_command_and_wait_for_pattern(
+ self,
+ f"ip addr add {GUEST_IP_MASK} dev eth0 && "
+ "ip link set eth0 up && echo OK",
+ "OK",
+ )
+ self.wait_for_console_pattern("# ")
+
+ def do_test_tap_fd_migration(self, vhost, backend_transfer=True):
+ self.require_accelerator("kvm")
+ self.set_machine("q35")
+
+ socket_dir = self.socket_dir()
+ migration_socket = os.path.join(socket_dir.name, "migration.sock")
+
+ shm_path = self.setup_shared_memory()
+
+ # Setup second TAP if needed
+ if not backend_transfer:
+ del_tap(TAP_ID2)
+ init_tap(TAP_ID2, with_ip=False)
+
+ self.prepare_and_launch_vm(
+ shm_path, vhost, backend_transfer=backend_transfer
+ )
+ self.vm.cmd("cont")
+ self.wait_for_console_pattern("login:")
+ exec_command_and_wait_for_pattern(self, "root", "# ")
+
+ self.setup_guest_network()
+
+ self.one_ping_from_guest(self.vm)
+ self.one_ping_from_host()
+ self.start_outer_ping()
+
+ # Get some successful pings before migration
+ time.sleep(0.5)
+
+ target_vm = self.get_vm(name="target")
+ self.prepare_and_launch_vm(
+ shm_path,
+ vhost,
+ incoming=True,
+ vm=target_vm,
+ backend_transfer=backend_transfer,
+ )
+
+ target_vm.cmd("migrate-incoming", {"uri": f"unix:{migration_socket}"})
+
+ self.log.info("Starting migration")
+ freeze_start = time.time()
+ self.vm.cmd("migrate", {"uri": f"unix:{migration_socket}"})
+
+ self.log.info("Waiting for migration completion")
+ wait_migration_finish(self.vm, target_vm)
+
+ # Switch network to tap1 if not using backend transfer
+ if not backend_transfer:
+ switch_network_to_tap2()
+
+ target_vm.cmd("cont")
+ freeze_end = time.time()
+
+ self.vm.shutdown()
+
+ self.log.info("Verifying PING on target VM after migration")
+ self.one_ping_from_guest(target_vm)
+ self.one_ping_from_host()
+
+ # And a bit more pings after source shutdown
+ time.sleep(0.3)
+ self.stop_ping_and_check(freeze_start, freeze_end)
+
+ target_vm.shutdown()
+
+ def test_tap_fd_migration(self):
+ self.do_test_tap_fd_migration(False)
+
+ def test_tap_fd_migration_vhost(self):
+ self.do_test_tap_fd_migration(True)
+
+ def test_tap_new_tap_migration(self):
+ self.do_test_tap_fd_migration(False, backend_transfer=False)
+
+ def test_tap_new_tap_migration_vhost(self):
+ self.do_test_tap_fd_migration(True, backend_transfer=False)
+
+
+if __name__ == "__main__":
+ LinuxKernelTest.main()
--
2.48.1
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-15 13:21 ` [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter Vladimir Sementsov-Ogievskiy
@ 2025-10-15 18:19 ` Peter Xu
2025-10-15 19:02 ` Vladimir Sementsov-Ogievskiy
2025-10-16 10:56 ` Markus Armbruster
1 sibling, 1 reply; 51+ messages in thread
From: Peter Xu @ 2025-10-15 18:19 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
davydov-max, yc-core, raphael.s.norwitz
On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> This parameter enables backend-transfer feature: all devices
> which support it will migrate their backends (for example a TAP
> device, by passing open file descriptor to migration channel).
>
> Currently no such devices, so the new parameter is a noop.
>
> Next commit will add support for virtio-net, to migrate its
> TAP backend.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
> migration/options.c | 18 ++++++++++++++++++
> migration/options.h | 2 ++
> qapi/migration.json | 38 ++++++++++++++++++++++++++++++++------
> 3 files changed, 52 insertions(+), 6 deletions(-)
>
> diff --git a/migration/options.c b/migration/options.c
> index 5183112775..a461b07b54 100644
> --- a/migration/options.c
> +++ b/migration/options.c
> @@ -262,6 +262,12 @@ bool migrate_mapped_ram(void)
> return s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM];
> }
>
> +bool migrate_backend_transfer(void)
> +{
> + MigrationState *s = migrate_get_current();
> + return s->parameters.backend_transfer;
> +}
> +
> bool migrate_ignore_shared(void)
> {
> MigrationState *s = migrate_get_current();
> @@ -963,6 +969,9 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
> params->cpr_exec_command = QAPI_CLONE(strList,
> s->parameters.cpr_exec_command);
>
> + params->has_backend_transfer = true;
> + params->backend_transfer = s->parameters.backend_transfer;
> +
> return params;
> }
>
> @@ -997,6 +1006,7 @@ void migrate_params_init(MigrationParameters *params)
> params->has_zero_page_detection = true;
> params->has_direct_io = true;
> params->has_cpr_exec_command = true;
> + params->has_backend_transfer = true;
> }
>
> /*
> @@ -1305,6 +1315,10 @@ static void migrate_params_test_apply(MigrateSetParameters *params,
> if (params->has_cpr_exec_command) {
> dest->cpr_exec_command = params->cpr_exec_command;
> }
> +
> + if (params->has_backend_transfer) {
> + dest->backend_transfer = params->backend_transfer;
> + }
> }
>
> static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
> @@ -1443,6 +1457,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
> s->parameters.cpr_exec_command =
> QAPI_CLONE(strList, params->cpr_exec_command);
> }
> +
> + if (params->has_backend_transfer) {
> + s->parameters.backend_transfer = params->backend_transfer;
> + }
> }
>
> void qmp_migrate_set_parameters(MigrateSetParameters *params, Error **errp)
> diff --git a/migration/options.h b/migration/options.h
> index 82d839709e..755ba1c024 100644
> --- a/migration/options.h
> +++ b/migration/options.h
> @@ -87,6 +87,8 @@ const char *migrate_tls_hostname(void);
> uint64_t migrate_xbzrle_cache_size(void);
> ZeroPageDetection migrate_zero_page_detection(void);
>
> +bool migrate_backend_transfer(void);
> +
> /* parameters helpers */
>
> bool migrate_params_check(MigrationParameters *params, Error **errp);
> diff --git a/qapi/migration.json b/qapi/migration.json
> index be0f3fcc12..35601a1f87 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -951,9 +951,16 @@
> # is @cpr-exec. The first list element is the program's filename,
> # the remainder its arguments. (Since 10.2)
> #
> +# @backend-transfer: Enable backend-transfer feature for devices that
> +# supports it. In general that means that backend state and its
> +# file descriptors are passed to the destination in the migraton
> +# channel (which must be a UNIX socket). Individual devices
> +# declare the support for backend-transfer by per-device
> +# backend-transfer option. (Since 10.2)
Thanks.
I still prefer the name "fd-passing" or anything more explicit than
"backend-transfer". Maybe the current name is fine for TAP, only because
TAP doesn't have its own VMSD to transfer?
Consider a device that would be a backend that supports VMSDs already to be
migrated, then if it starts to allow fd-passing, this name will stop being
suitable there, because it used to "transfer backend" already, now it's
just started to "fd-passing".
Meanwhile, consider another example - what if a device is not a backend at
all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
In general, I think "fd" is really a core concept of this whole thing. One
thing to complement that idea is, IMHO this patch misses one important
change, that migration framework should actually explicitly fail the
migration if this feature is enabled but it's not a unix socket protocol
(aka, fd-passing REQUIRES scm rights). Would that look more reliable?
Otherwise IIUC it'll throw weird errors when e.g. when we enabled this
feature and trying to migrate via either TCP or to a file..
> +#
> # Features:
> #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
> # @x-vcpu-dirty-limit-period are experimental.
> #
> # Since: 2.4
> @@ -978,7 +985,8 @@
> 'mode',
> 'zero-page-detection',
> 'direct-io',
> - 'cpr-exec-command'] }
> + 'cpr-exec-command',
> + { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
>
> ##
> # @MigrateSetParameters:
> @@ -1137,9 +1145,16 @@
> # is @cpr-exec. The first list element is the program's filename,
> # the remainder its arguments. (Since 10.2)
> #
> +# @backend-transfer: Enable backend-transfer feature for devices that
> +# supports it. In general that means that backend state and its
> +# file descriptors are passed to the destination in the migraton
> +# channel (which must be a UNIX socket). Individual devices
> +# declare the support for backend-transfer by per-device
> +# backend-transfer option. (Since 10.2)
> +#
> # Features:
> #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
> # @x-vcpu-dirty-limit-period are experimental.
> #
> # TODO: either fuse back into `MigrationParameters`, or make
> @@ -1179,7 +1194,9 @@
> '*mode': 'MigMode',
> '*zero-page-detection': 'ZeroPageDetection',
> '*direct-io': 'bool',
> - '*cpr-exec-command': [ 'str' ]} }
> + '*cpr-exec-command': [ 'str' ],
> + '*backend-transfer': { 'type': 'bool',
> + 'features': [ 'unstable' ] } } }
>
> ##
> # @migrate-set-parameters:
> @@ -1352,9 +1369,16 @@
> # is @cpr-exec. The first list element is the program's filename,
> # the remainder its arguments. (Since 10.2)
> #
> +# @backend-transfer: Enable backend-transfer feature for devices that
> +# supports it. In general that means that backend state and its
> +# file descriptors are passed to the destination in the migraton
> +# channel (which must be a UNIX socket). Individual devices
> +# declare the support for backend-transfer by per-device
> +# backend-transfer option. (Since 10.2)
> +#
> # Features:
> #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
> # @x-vcpu-dirty-limit-period are experimental.
> #
> # Since: 2.4
> @@ -1391,7 +1415,9 @@
> '*mode': 'MigMode',
> '*zero-page-detection': 'ZeroPageDetection',
> '*direct-io': 'bool',
> - '*cpr-exec-command': [ 'str' ]} }
> + '*cpr-exec-command': [ 'str' ],
> + '*backend-transfer': { 'type': 'bool',
> + 'features': [ 'unstable' ] } } }
>
> ##
> # @query-migrate-parameters:
> --
> 2.48.1
>
--
Peter Xu
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-15 18:19 ` Peter Xu
@ 2025-10-15 19:02 ` Vladimir Sementsov-Ogievskiy
2025-10-15 20:07 ` Peter Xu
0 siblings, 1 reply; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 19:02 UTC (permalink / raw)
To: Peter Xu
Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
davydov-max, yc-core, raphael.s.norwitz
On 15.10.25 21:19, Peter Xu wrote:
> On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> This parameter enables backend-transfer feature: all devices
>> which support it will migrate their backends (for example a TAP
>> device, by passing open file descriptor to migration channel).
>>
>> Currently no such devices, so the new parameter is a noop.
>>
>> Next commit will add support for virtio-net, to migrate its
>> TAP backend.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> ---
[..]
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -951,9 +951,16 @@
>> # is @cpr-exec. The first list element is the program's filename,
>> # the remainder its arguments. (Since 10.2)
>> #
>> +# @backend-transfer: Enable backend-transfer feature for devices that
>> +# supports it. In general that means that backend state and its
>> +# file descriptors are passed to the destination in the migraton
>> +# channel (which must be a UNIX socket). Individual devices
>> +# declare the support for backend-transfer by per-device
>> +# backend-transfer option. (Since 10.2)
>
> Thanks.
>
> I still prefer the name "fd-passing" or anything more explicit than
> "backend-transfer". Maybe the current name is fine for TAP, only because
> TAP doesn't have its own VMSD to transfer?
>
> Consider a device that would be a backend that supports VMSDs already to be
> migrated, then if it starts to allow fd-passing, this name will stop being
> suitable there, because it used to "transfer backend" already, now it's
> just started to "fd-passing".
>
> Meanwhile, consider another example - what if a device is not a backend at
> all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
Reasonable.
But consider also the discussion with Fabiano in v5, where he argues against fds
(reasonable too):
https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
(still, they were against my "fds" name for the parameter, which is
really too generic, fd-passing is not)
and the arguments for backend-transfer (to read similar with cpr-transfer)
https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
>
> In general, I think "fd" is really a core concept of this whole thing.
I think, we can call "backend" any external object, linked by the fd.
Still, backend/frontend terminology is so misleading, when applied to
complex systems (for me, at least), that I don't really like "-backend"
word here.
fd-passing is OK for me, I can resend with it, if arguments by Fabiano
not change your mind.
> One
> thing to complement that idea is, IMHO this patch misses one important
> change, that migration framework should actually explicitly fail the
> migration if this feature is enabled but it's not a unix socket protocol
> (aka, fd-passing REQUIRES scm rights). Would that look more reliable?
> Otherwise IIUC it'll throw weird errors when e.g. when we enabled this
> feature and trying to migrate via either TCP or to a file..
>
Right. I rely on checking in qemu_file_get_fd() / qemu_file_set_fd()
handlers.
But of course, earlier clean failure of qmp-migrate / qmp-incoming-migate
commands would be nice, will do.
Like this, I think:
diff --git a/migration/migration.c b/migration/migration.c
index 6ed6a10f57..0c73332706 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -255,6 +255,14 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
return false;
}
+ if (migrate_backend_transfer() &&
+ !(addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
+ addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX)) {
+ error_setg(errp, "Migration requires a UNIX domain socket as transport, "
+ "because backend-transfer is enabled");
+ return false;
+ }
+
return true;
}
--
Best regards,
Vladimir
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-15 19:02 ` Vladimir Sementsov-Ogievskiy
@ 2025-10-15 20:07 ` Peter Xu
2025-10-15 21:02 ` Vladimir Sementsov-Ogievskiy
0 siblings, 1 reply; 51+ messages in thread
From: Peter Xu @ 2025-10-15 20:07 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
davydov-max, yc-core, raphael.s.norwitz
On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.10.25 21:19, Peter Xu wrote:
> > On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > This parameter enables backend-transfer feature: all devices
> > > which support it will migrate their backends (for example a TAP
> > > device, by passing open file descriptor to migration channel).
> > >
> > > Currently no such devices, so the new parameter is a noop.
> > >
> > > Next commit will add support for virtio-net, to migrate its
> > > TAP backend.
> > >
> > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> > > ---
>
> [..]
>
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -951,9 +951,16 @@
> > > # is @cpr-exec. The first list element is the program's filename,
> > > # the remainder its arguments. (Since 10.2)
> > > #
> > > +# @backend-transfer: Enable backend-transfer feature for devices that
> > > +# supports it. In general that means that backend state and its
> > > +# file descriptors are passed to the destination in the migraton
> > > +# channel (which must be a UNIX socket). Individual devices
> > > +# declare the support for backend-transfer by per-device
> > > +# backend-transfer option. (Since 10.2)
> >
> > Thanks.
> >
> > I still prefer the name "fd-passing" or anything more explicit than
> > "backend-transfer". Maybe the current name is fine for TAP, only because
> > TAP doesn't have its own VMSD to transfer?
> >
> > Consider a device that would be a backend that supports VMSDs already to be
> > migrated, then if it starts to allow fd-passing, this name will stop being
> > suitable there, because it used to "transfer backend" already, now it's
> > just started to "fd-passing".
> >
> > Meanwhile, consider another example - what if a device is not a backend at
> > all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
>
> Reasonable.
>
> But consider also the discussion with Fabiano in v5, where he argues against fds
> (reasonable too):
>
> https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
>
> (still, they were against my "fds" name for the parameter, which is
> really too generic, fd-passing is not)
>
> and the arguments for backend-transfer (to read similar with cpr-transfer)
>
> https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
>
>
> >
> > In general, I think "fd" is really a core concept of this whole thing.
>
> I think, we can call "backend" any external object, linked by the fd.
>
> Still, backend/frontend terminology is so misleading, when applied to
> complex systems (for me, at least), that I don't really like "-backend"
> word here.
>
> fd-passing is OK for me, I can resend with it, if arguments by Fabiano
> not change your mind.
Ah, I didn't notice the name has been discussed.
I think it means you can vote for your own preference now because we have
one vote for each. :) Let's also see whether Fabiano will come up with
something better than both.
You mentioned explicitly the file descriptors in the qapi doc, that's what
I would strongly request for. The other thing is the unix socket check, it
looks all good below now with it, thanks. No strong feelings on the names.
>
> > One
> > thing to complement that idea is, IMHO this patch misses one important
> > change, that migration framework should actually explicitly fail the
> > migration if this feature is enabled but it's not a unix socket protocol
> > (aka, fd-passing REQUIRES scm rights). Would that look more reliable?
> > Otherwise IIUC it'll throw weird errors when e.g. when we enabled this
> > feature and trying to migrate via either TCP or to a file..
> >
>
> Right. I rely on checking in qemu_file_get_fd() / qemu_file_set_fd()
> handlers.
>
> But of course, earlier clean failure of qmp-migrate / qmp-incoming-migate
> commands would be nice, will do.
>
> Like this, I think:
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 6ed6a10f57..0c73332706 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -255,6 +255,14 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
> return false;
> }
>
> + if (migrate_backend_transfer() &&
> + !(addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
> + addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX)) {
> + error_setg(errp, "Migration requires a UNIX domain socket as transport, "
> + "because backend-transfer is enabled");
> + return false;
> + }
> +
> return true;
> }
>
>
>
>
>
> --
> Best regards,
> Vladimir
>
--
Peter Xu
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-15 20:07 ` Peter Xu
@ 2025-10-15 21:02 ` Vladimir Sementsov-Ogievskiy
2025-10-16 8:32 ` Daniel P. Berrangé
0 siblings, 1 reply; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-15 21:02 UTC (permalink / raw)
To: Peter Xu
Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
davydov-max, yc-core, raphael.s.norwitz
On 15.10.25 23:07, Peter Xu wrote:
> On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 15.10.25 21:19, Peter Xu wrote:
>>> On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> This parameter enables backend-transfer feature: all devices
>>>> which support it will migrate their backends (for example a TAP
>>>> device, by passing open file descriptor to migration channel).
>>>>
>>>> Currently no such devices, so the new parameter is a noop.
>>>>
>>>> Next commit will add support for virtio-net, to migrate its
>>>> TAP backend.
>>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>> ---
>>
>> [..]
>>
>>>> --- a/qapi/migration.json
>>>> +++ b/qapi/migration.json
>>>> @@ -951,9 +951,16 @@
>>>> # is @cpr-exec. The first list element is the program's filename,
>>>> # the remainder its arguments. (Since 10.2)
>>>> #
>>>> +# @backend-transfer: Enable backend-transfer feature for devices that
>>>> +# supports it. In general that means that backend state and its
>>>> +# file descriptors are passed to the destination in the migraton
>>>> +# channel (which must be a UNIX socket). Individual devices
>>>> +# declare the support for backend-transfer by per-device
>>>> +# backend-transfer option. (Since 10.2)
>>>
>>> Thanks.
>>>
>>> I still prefer the name "fd-passing" or anything more explicit than
>>> "backend-transfer". Maybe the current name is fine for TAP, only because
>>> TAP doesn't have its own VMSD to transfer?
>>>
>>> Consider a device that would be a backend that supports VMSDs already to be
>>> migrated, then if it starts to allow fd-passing, this name will stop being
>>> suitable there, because it used to "transfer backend" already, now it's
>>> just started to "fd-passing".
>>>
>>> Meanwhile, consider another example - what if a device is not a backend at
>>> all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
>>
>> Reasonable.
>>
>> But consider also the discussion with Fabiano in v5, where he argues against fds
>> (reasonable too):
>>
>> https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
>>
>> (still, they were against my "fds" name for the parameter, which is
>> really too generic, fd-passing is not)
>>
>> and the arguments for backend-transfer (to read similar with cpr-transfer)
>>
>> https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
>>
>>
>>>
>>> In general, I think "fd" is really a core concept of this whole thing.
>>
>> I think, we can call "backend" any external object, linked by the fd.
>>
>> Still, backend/frontend terminology is so misleading, when applied to
>> complex systems (for me, at least), that I don't really like "-backend"
>> word here.
>>
>> fd-passing is OK for me, I can resend with it, if arguments by Fabiano
>> not change your mind.
>
> Ah, I didn't notice the name has been discussed.
>
> I think it means you can vote for your own preference now because we have
> one vote for each. :) Let's also see whether Fabiano will come up with
> something better than both.
>
> You mentioned explicitly the file descriptors in the qapi doc, that's what
> I would strongly request for. The other thing is the unix socket check, it
> looks all good below now with it, thanks. No strong feelings on the names.
>
After a bit more thinking, I leaning towards keeping backend-transfer. I think
it's more meaningful for the user:
If we call it "fd-passing", user may ask:
Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
is it unix socket or not, and pass any fds it wants if it is?
Logical question is, why not just drop the global capability, and check only
is it unix socket or not? (OK, relying only on socket type is wrong anyway,
as it may be some complex tunneling, which includes unix sockets, but still
can't pass fds, but I think now about feature naming)
But we really want an explicit switch for the feature. As qemu-update is
not the only case of local migration. The another case is changing the
backend. So for the user's choice is:
1. Remote migration: we can't reuse backends (files, sockets, host devices), as
we are moving to another host. So, we don't enable "backend-transfer". We don't
transfer the backend, we have to initialize new backend on another host.
2. Local migration to update QEMU, with minimal freeze-time and minimal
extra actions: use "backend-transfer", exactly to keep the backends
(vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
as is.
3. Local migration, but we want to reconfigure some backend, or switch
to another backend. We disable "backend-transfer" for one device.
4. Some problem with "backend-transfer", may be some bug. Disable the whole
beackend-transfer feature, and do normal local migration to a new version
with bug fixed.
-
"backend-transfer" better reflects, what management layer should do, or
should not do with backends, depending on migration type.
>>
>>> One
>>> thing to complement that idea is, IMHO this patch misses one important
>>> change, that migration framework should actually explicitly fail the
>>> migration if this feature is enabled but it's not a unix socket protocol
>>> (aka, fd-passing REQUIRES scm rights). Would that look more reliable?
>>> Otherwise IIUC it'll throw weird errors when e.g. when we enabled this
>>> feature and trying to migrate via either TCP or to a file..
>>>
>>
>> Right. I rely on checking in qemu_file_get_fd() / qemu_file_set_fd()
>> handlers.
>>
>> But of course, earlier clean failure of qmp-migrate / qmp-incoming-migate
>> commands would be nice, will do.
>>
>> Like this, I think:
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 6ed6a10f57..0c73332706 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -255,6 +255,14 @@ migration_channels_and_transport_compatible(MigrationAddress *addr,
>> return false;
>> }
>>
>> + if (migrate_backend_transfer() &&
>> + !(addr->transport == MIGRATION_ADDRESS_TYPE_SOCKET &&
>> + addr->u.socket.type == SOCKET_ADDRESS_TYPE_UNIX)) {
>> + error_setg(errp, "Migration requires a UNIX domain socket as transport, "
>> + "because backend-transfer is enabled");
>> + return false;
>> + }
>> +
>> return true;
>> }
>>
>>
>>
>>
>>
>> --
>> Best regards,
>> Vladimir
>>
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 17/19] virtio-net: support backend-transfer migration for virtio-net/tap
2025-10-15 13:21 ` [PATCH v8 17/19] virtio-net: support backend-transfer migration for virtio-net/tap Vladimir Sementsov-Ogievskiy
@ 2025-10-16 8:23 ` Daniel P. Berrangé
2025-10-16 9:15 ` Vladimir Sementsov-Ogievskiy
0 siblings, 1 reply; 51+ messages in thread
From: Daniel P. Berrangé @ 2025-10-16 8:23 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: mst, jasowang, peterx, farosas, sw, eblake, armbru, thuth, philmd,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, raphael.s.norwitz
On Wed, Oct 15, 2025 at 04:21:33PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> Add virtio-net option backend-transfer, which is true by default,
> but false for older machine types, which doesn't support the feature.
>
> For backend-transfer migration, both global migration parameter
> backend-transfer and virtio-net backend-transfer option should be
> set to true.
>
> With the parameters enabled (both on source and target) of-course, and
> with unix-socket used as migration-channel, we do "migrate" the
> virtio-net backend - TAP device, with all its fds.
>
> This way management tool should not care about creating new TAP, and
> should not handle switching to it. Migration downtime become shorter.
>
> How it works:
>
> 1. For incoming migration, we postpone TAP initialization up to
> pre-incoming point.
>
> 2. At pre-incoming point we see that "virtio-net-tap" is set for
> backend-transfer, so we postpone TAP initialization up to
> post-load
>
> 3. During virtio-load, we get TAP state (and fds) as part of
> virtio-net state
>
> 4. In post-load we finalize TAP initialization
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> ---
> hw/core/machine.c | 1 +
> hw/net/virtio-net.c | 75 +++++++++++++++++++++++++++++++++-
> include/hw/virtio/virtio-net.h | 1 +
> include/net/tap.h | 2 +
> net/tap.c | 45 +++++++++++++++++++-
> 5 files changed, 122 insertions(+), 2 deletions(-)
>
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index 681adbb7ac..a3d77f5604 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -40,6 +40,7 @@
>
> GlobalProperty hw_compat_10_1[] = {
> { TYPE_ACPI_GED, "x-has-hest-addr", "false" },
> + { TYPE_VIRTIO_NET, "backend-transfer", "false" },
> };
> const size_t hw_compat_10_1_len = G_N_ELEMENTS(hw_compat_10_1);
>
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 661413c72f..5f9711dee7 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -38,6 +38,7 @@
> #include "qapi/qapi-events-migration.h"
> #include "hw/virtio/virtio-access.h"
> #include "migration/misc.h"
> +#include "migration/options.h"
> #include "standard-headers/linux/ethtool.h"
> #include "system/system.h"
> #include "system/replay.h"
> @@ -3358,6 +3359,9 @@ struct VirtIONetMigTmp {
> uint16_t curr_queue_pairs_1;
> uint8_t has_ufo;
> uint32_t has_vnet_hdr;
> +
> + NetClientState *ncs;
> + uint32_t max_queue_pairs;
> };
>
> /* The 2nd and subsequent tx_waiting flags are loaded later than
> @@ -3627,6 +3631,71 @@ static const VMStateDescription vhost_user_net_backend_state = {
> }
> };
>
> +static bool virtio_net_is_tap_mig(void *opaque, int version_id)
> +{
> + VirtIONet *n = opaque;
> + NetClientState *nc;
> +
> + nc = qemu_get_queue(n->nic);
> +
> + return migrate_backend_transfer() && n->backend_transfer && nc->peer &&
> + nc->peer->info->type == NET_CLIENT_DRIVER_TAP;
> +}
> +
> +static int virtio_net_nic_pre_save(void *opaque)
> +{
> + struct VirtIONetMigTmp *tmp = opaque;
> +
> + tmp->ncs = tmp->parent->nic->ncs;
> + tmp->max_queue_pairs = tmp->parent->max_queue_pairs;
> +
> + return 0;
> +}
> +
> +static int virtio_net_nic_pre_load(void *opaque)
> +{
> + /* Reuse the pointer setup from save */
> + virtio_net_nic_pre_save(opaque);
> +
> + return 0;
> +}
> +
> +static int virtio_net_nic_post_load(void *opaque, int version_id)
> +{
> + struct VirtIONetMigTmp *tmp = opaque;
> + Error *local_err = NULL;
> +
> + if (!virtio_net_update_host_features(tmp->parent, &local_err)) {
> + error_report_err(local_err);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static const VMStateDescription vmstate_virtio_net_nic_nc = {
> + .name = "virtio-net-nic-nc",
> + .fields = (const VMStateField[]) {
> + VMSTATE_STRUCT_POINTER(peer, NetClientState, vmstate_tap,
> + NetClientState),
> + VMSTATE_END_OF_LIST()
> + },
> +};
> +
> +static const VMStateDescription vmstate_virtio_net_nic = {
> + .name = "virtio-net-nic",
> + .pre_load = virtio_net_nic_pre_load,
> + .pre_save = virtio_net_nic_pre_save,
> + .post_load = virtio_net_nic_post_load,
> + .fields = (const VMStateField[]) {
> + VMSTATE_STRUCT_VARRAY_POINTER_UINT32(ncs, struct VirtIONetMigTmp,
> + max_queue_pairs,
> + vmstate_virtio_net_nic_nc,
> + struct NetClientState),
> + VMSTATE_END_OF_LIST()
> + },
> +};
> +
> static const VMStateDescription vmstate_virtio_net_device = {
> .name = "virtio-net-device",
> .version_id = VIRTIO_NET_VM_VERSION,
> @@ -3658,6 +3727,9 @@ static const VMStateDescription vmstate_virtio_net_device = {
> * but based on the uint.
> */
> VMSTATE_BUFFER_POINTER_UNSAFE(vlans, VirtIONet, 0, MAX_VLAN >> 3),
> + VMSTATE_WITH_TMP_TEST(VirtIONet, virtio_net_is_tap_mig,
> + struct VirtIONetMigTmp,
> + vmstate_virtio_net_nic),
> VMSTATE_WITH_TMP(VirtIONet, struct VirtIONetMigTmp,
> vmstate_virtio_net_has_vnet),
> VMSTATE_UINT8(mac_table.multi_overflow, VirtIONet),
> @@ -4239,7 +4311,7 @@ static bool vhost_user_blk_pre_incoming(void *opaque, Error **errp)
> VirtIONet *n = opaque;
> int i;
>
> - if (peer_wait_incoming(n)) {
> + if (!virtio_net_is_tap_mig(opaque, 0) && peer_wait_incoming(n)) {
> for (i = 0; i < n->max_queue_pairs; i++) {
> if (!peer_postponed_init(n, i, errp)) {
> return false;
> @@ -4389,6 +4461,7 @@ static const Property virtio_net_properties[] = {
> host_features_ex,
> VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM,
> false),
> + DEFINE_PROP_BOOL("backend-transfer", VirtIONet, backend_transfer, true),
> };
>
> static void virtio_net_class_init(ObjectClass *klass, const void *data)
I really don't like this approach, because it is requiring the frontend
device to know about every different backend implementation that is able
to do state transfer. This really violates the separation from the
frontend and backend. The choice of specific backend should generally
be opaque to the frontend.
This really ought to be redesigned to work in terms of an formal API
exposed by the backend, not poking at TAP backend specific details.
eg an API that operates on NetClientState, for which each backend
can provide an optional implementation.
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index 5b8ab7bda7..bf07f8a4cb 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -231,6 +231,7 @@ struct VirtIONet {
> struct EBPFRSSContext ebpf_rss;
> uint32_t nr_ebpf_rss_fds;
> char **ebpf_rss_fds;
> + bool backend_transfer;
> };
>
> size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
> diff --git a/include/net/tap.h b/include/net/tap.h
> index 5a926ba513..506f7ab719 100644
> --- a/include/net/tap.h
> +++ b/include/net/tap.h
> @@ -36,4 +36,6 @@ int tap_get_fd(NetClientState *nc);
> bool tap_wait_incoming(NetClientState *nc);
> bool tap_postponed_init(NetClientState *nc, Error **errp);
>
> +extern const VMStateDescription vmstate_tap;
> +
> #endif /* QEMU_NET_TAP_H */
> diff --git a/net/tap.c b/net/tap.c
> index 8afbf3b407..b9c12dd64c 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -819,7 +819,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
>
> static bool net_tap_setup(TAPState *s, int fd, int vnet_hdr, Error **errp)
> {
> - if (!net_tap_set_fd(s, fd, vnet_hdr, errp)) {
> + if (fd != -1 && !net_tap_set_fd(s, fd, vnet_hdr, errp)) {
> return false;
> }
>
> @@ -1225,6 +1225,49 @@ int tap_disable(NetClientState *nc)
> }
> }
>
> +static int tap_pre_load(void *opaque)
> +{
> + TAPState *s = opaque;
> +
> + if (s->fd != -1) {
> + error_report(
> + "TAP is already initialized and cannot receive incoming fd");
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static int tap_post_load(void *opaque, int version_id)
> +{
> + TAPState *s = opaque;
> + Error *local_err = NULL;
> +
> + if (!net_tap_setup(s, -1, -1, &local_err)) {
> + error_report_err(local_err);
> + qemu_del_net_client(&s->nc);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +const VMStateDescription vmstate_tap = {
> + .name = "net-tap",
> + .pre_load = tap_pre_load,
> + .post_load = tap_post_load,
> + .fields = (const VMStateField[]) {
> + VMSTATE_FD(fd, TAPState),
> + VMSTATE_BOOL(using_vnet_hdr, TAPState),
> + VMSTATE_BOOL(has_ufo, TAPState),
> + VMSTATE_BOOL(has_uso, TAPState),
> + VMSTATE_BOOL(has_tunnel, TAPState),
> + VMSTATE_BOOL(enabled, TAPState),
> + VMSTATE_UINT32(host_vnet_hdr_len, TAPState),
> + VMSTATE_END_OF_LIST()
> + }
> +};
> +
> bool tap_wait_incoming(NetClientState *nc)
> {
> TAPState *s = DO_UPCAST(TAPState, nc, nc);
IMHO implementing state transfer in the backends ought to be separate
commit from adding support for using that in the frontend.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-15 21:02 ` Vladimir Sementsov-Ogievskiy
@ 2025-10-16 8:32 ` Daniel P. Berrangé
2025-10-16 9:23 ` Vladimir Sementsov-Ogievskiy
0 siblings, 1 reply; 51+ messages in thread
From: Daniel P. Berrangé @ 2025-10-16 8:32 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: Peter Xu, mst, jasowang, farosas, sw, eblake, armbru, thuth,
philmd, qemu-devel, michael.roth, steven.sistare, leiyang,
davydov-max, yc-core, raphael.s.norwitz
On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 15.10.25 23:07, Peter Xu wrote:
> > On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > On 15.10.25 21:19, Peter Xu wrote:
> > > > On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > This parameter enables backend-transfer feature: all devices
> > > > > which support it will migrate their backends (for example a TAP
> > > > > device, by passing open file descriptor to migration channel).
> > > > >
> > > > > Currently no such devices, so the new parameter is a noop.
> > > > >
> > > > > Next commit will add support for virtio-net, to migrate its
> > > > > TAP backend.
> > > > >
> > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> > > > > ---
> > >
> > > [..]
> > >
> > > > > --- a/qapi/migration.json
> > > > > +++ b/qapi/migration.json
> > > > > @@ -951,9 +951,16 @@
> > > > > # is @cpr-exec. The first list element is the program's filename,
> > > > > # the remainder its arguments. (Since 10.2)
> > > > > #
> > > > > +# @backend-transfer: Enable backend-transfer feature for devices that
> > > > > +# supports it. In general that means that backend state and its
> > > > > +# file descriptors are passed to the destination in the migraton
> > > > > +# channel (which must be a UNIX socket). Individual devices
> > > > > +# declare the support for backend-transfer by per-device
> > > > > +# backend-transfer option. (Since 10.2)
> > > >
> > > > Thanks.
> > > >
> > > > I still prefer the name "fd-passing" or anything more explicit than
> > > > "backend-transfer". Maybe the current name is fine for TAP, only because
> > > > TAP doesn't have its own VMSD to transfer?
> > > >
> > > > Consider a device that would be a backend that supports VMSDs already to be
> > > > migrated, then if it starts to allow fd-passing, this name will stop being
> > > > suitable there, because it used to "transfer backend" already, now it's
> > > > just started to "fd-passing".
> > > >
> > > > Meanwhile, consider another example - what if a device is not a backend at
> > > > all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
> > >
> > > Reasonable.
> > >
> > > But consider also the discussion with Fabiano in v5, where he argues against fds
> > > (reasonable too):
> > >
> > > https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
> > >
> > > (still, they were against my "fds" name for the parameter, which is
> > > really too generic, fd-passing is not)
> > >
> > > and the arguments for backend-transfer (to read similar with cpr-transfer)
> > >
> > > https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
> > >
> > >
> > > >
> > > > In general, I think "fd" is really a core concept of this whole thing.
> > >
> > > I think, we can call "backend" any external object, linked by the fd.
> > >
> > > Still, backend/frontend terminology is so misleading, when applied to
> > > complex systems (for me, at least), that I don't really like "-backend"
> > > word here.
> > >
> > > fd-passing is OK for me, I can resend with it, if arguments by Fabiano
> > > not change your mind.
> >
> > Ah, I didn't notice the name has been discussed.
> >
> > I think it means you can vote for your own preference now because we have
> > one vote for each. :) Let's also see whether Fabiano will come up with
> > something better than both.
> >
> > You mentioned explicitly the file descriptors in the qapi doc, that's what
> > I would strongly request for. The other thing is the unix socket check, it
> > looks all good below now with it, thanks. No strong feelings on the names.
> >
>
> After a bit more thinking, I leaning towards keeping backend-transfer. I think
> it's more meaningful for the user:
>
> If we call it "fd-passing", user may ask:
>
> Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
> supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
> is it unix socket or not, and pass any fds it wants if it is?
>
> Logical question is, why not just drop the global capability, and check only
> is it unix socket or not? (OK, relying only on socket type is wrong anyway,
> as it may be some complex tunneling, which includes unix sockets, but still
> can't pass fds, but I think now about feature naming)
>
> But we really want an explicit switch for the feature. As qemu-update is
> not the only case of local migration. The another case is changing the
> backend. So for the user's choice is:
>
> 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> we are moving to another host. So, we don't enable "backend-transfer". We don't
> transfer the backend, we have to initialize new backend on another host.
>
> 2. Local migration to update QEMU, with minimal freeze-time and minimal
> extra actions: use "backend-transfer", exactly to keep the backends
> (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> as is.
>
> 3. Local migration, but we want to reconfigure some backend, or switch
> to another backend. We disable "backend-transfer" for one device.
This implies that you're changing 'backend-transfer' against the
device at time of each migration.
This takes us back to the situation we've had historically where the
behaviour of migration depends on global properties the mgmt app has
set prior to the 'migrate' command being run. We've just tried to get
away from that model by passing everything as parameters to the
migrate command, so I'm loathe to see us invent a new way to have
global state properties changing migration behaviour.
This 'backend-transfer' device property is not really a device property,
it is an indirect parameter to the 'migrate' command.
Ergo, if we need the ability to selectively migrate the backend state
of individal devices, then instead of a property on the device, we
should pass a list of device IDs as a parameter to the migrate
command in QMP.
>
> 4. Some problem with "backend-transfer", may be some bug. Disable the whole
> beackend-transfer feature, and do normal local migration to a new version
> with bug fixed.
>
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 17/19] virtio-net: support backend-transfer migration for virtio-net/tap
2025-10-16 8:23 ` Daniel P. Berrangé
@ 2025-10-16 9:15 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-16 9:15 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: mst, jasowang, peterx, farosas, sw, eblake, armbru, thuth, philmd,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, raphael.s.norwitz
On 16.10.25 11:23, Daniel P. Berrangé wrote:
> On Wed, Oct 15, 2025 at 04:21:33PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> Add virtio-net option backend-transfer, which is true by default,
>> but false for older machine types, which doesn't support the feature.
>>
>> For backend-transfer migration, both global migration parameter
>> backend-transfer and virtio-net backend-transfer option should be
>> set to true.
>>
>> With the parameters enabled (both on source and target) of-course, and
>> with unix-socket used as migration-channel, we do "migrate" the
>> virtio-net backend - TAP device, with all its fds.
>>
>> This way management tool should not care about creating new TAP, and
>> should not handle switching to it. Migration downtime become shorter.
>>
>> How it works:
>>
>> 1. For incoming migration, we postpone TAP initialization up to
>> pre-incoming point.
>>
>> 2. At pre-incoming point we see that "virtio-net-tap" is set for
>> backend-transfer, so we postpone TAP initialization up to
>> post-load
>>
>> 3. During virtio-load, we get TAP state (and fds) as part of
>> virtio-net state
>>
>> 4. In post-load we finalize TAP initialization
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>> ---
>> hw/core/machine.c | 1 +
>> hw/net/virtio-net.c | 75 +++++++++++++++++++++++++++++++++-
>> include/hw/virtio/virtio-net.h | 1 +
>> include/net/tap.h | 2 +
>> net/tap.c | 45 +++++++++++++++++++-
>> 5 files changed, 122 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/core/machine.c b/hw/core/machine.c
>> index 681adbb7ac..a3d77f5604 100644
>> --- a/hw/core/machine.c
>> +++ b/hw/core/machine.c
>> @@ -40,6 +40,7 @@
>>
>> GlobalProperty hw_compat_10_1[] = {
>> { TYPE_ACPI_GED, "x-has-hest-addr", "false" },
>> + { TYPE_VIRTIO_NET, "backend-transfer", "false" },
>> };
>> const size_t hw_compat_10_1_len = G_N_ELEMENTS(hw_compat_10_1);
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 661413c72f..5f9711dee7 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -38,6 +38,7 @@
>> #include "qapi/qapi-events-migration.h"
>> #include "hw/virtio/virtio-access.h"
>> #include "migration/misc.h"
>> +#include "migration/options.h"
>> #include "standard-headers/linux/ethtool.h"
>> #include "system/system.h"
>> #include "system/replay.h"
>> @@ -3358,6 +3359,9 @@ struct VirtIONetMigTmp {
>> uint16_t curr_queue_pairs_1;
>> uint8_t has_ufo;
>> uint32_t has_vnet_hdr;
>> +
>> + NetClientState *ncs;
>> + uint32_t max_queue_pairs;
>> };
>>
>> /* The 2nd and subsequent tx_waiting flags are loaded later than
>> @@ -3627,6 +3631,71 @@ static const VMStateDescription vhost_user_net_backend_state = {
>> }
>> };
>>
>> +static bool virtio_net_is_tap_mig(void *opaque, int version_id)
>> +{
>> + VirtIONet *n = opaque;
>> + NetClientState *nc;
>> +
>> + nc = qemu_get_queue(n->nic);
>> +
>> + return migrate_backend_transfer() && n->backend_transfer && nc->peer &&
>> + nc->peer->info->type == NET_CLIENT_DRIVER_TAP;
>> +}
>> +
>> +static int virtio_net_nic_pre_save(void *opaque)
>> +{
>> + struct VirtIONetMigTmp *tmp = opaque;
>> +
>> + tmp->ncs = tmp->parent->nic->ncs;
>> + tmp->max_queue_pairs = tmp->parent->max_queue_pairs;
>> +
>> + return 0;
>> +}
>> +
>> +static int virtio_net_nic_pre_load(void *opaque)
>> +{
>> + /* Reuse the pointer setup from save */
>> + virtio_net_nic_pre_save(opaque);
>> +
>> + return 0;
>> +}
>> +
>> +static int virtio_net_nic_post_load(void *opaque, int version_id)
>> +{
>> + struct VirtIONetMigTmp *tmp = opaque;
>> + Error *local_err = NULL;
>> +
>> + if (!virtio_net_update_host_features(tmp->parent, &local_err)) {
>> + error_report_err(local_err);
>> + return -EINVAL;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static const VMStateDescription vmstate_virtio_net_nic_nc = {
>> + .name = "virtio-net-nic-nc",
>> + .fields = (const VMStateField[]) {
>> + VMSTATE_STRUCT_POINTER(peer, NetClientState, vmstate_tap,
>> + NetClientState),
>> + VMSTATE_END_OF_LIST()
>> + },
>> +};
>> +
>> +static const VMStateDescription vmstate_virtio_net_nic = {
>> + .name = "virtio-net-nic",
>> + .pre_load = virtio_net_nic_pre_load,
>> + .pre_save = virtio_net_nic_pre_save,
>> + .post_load = virtio_net_nic_post_load,
>> + .fields = (const VMStateField[]) {
>> + VMSTATE_STRUCT_VARRAY_POINTER_UINT32(ncs, struct VirtIONetMigTmp,
>> + max_queue_pairs,
>> + vmstate_virtio_net_nic_nc,
>> + struct NetClientState),
>> + VMSTATE_END_OF_LIST()
>> + },
>> +};
>> +
>> static const VMStateDescription vmstate_virtio_net_device = {
>> .name = "virtio-net-device",
>> .version_id = VIRTIO_NET_VM_VERSION,
>> @@ -3658,6 +3727,9 @@ static const VMStateDescription vmstate_virtio_net_device = {
>> * but based on the uint.
>> */
>> VMSTATE_BUFFER_POINTER_UNSAFE(vlans, VirtIONet, 0, MAX_VLAN >> 3),
>> + VMSTATE_WITH_TMP_TEST(VirtIONet, virtio_net_is_tap_mig,
>> + struct VirtIONetMigTmp,
>> + vmstate_virtio_net_nic),
>> VMSTATE_WITH_TMP(VirtIONet, struct VirtIONetMigTmp,
>> vmstate_virtio_net_has_vnet),
>> VMSTATE_UINT8(mac_table.multi_overflow, VirtIONet),
>> @@ -4239,7 +4311,7 @@ static bool vhost_user_blk_pre_incoming(void *opaque, Error **errp)
>> VirtIONet *n = opaque;
>> int i;
>>
>> - if (peer_wait_incoming(n)) {
>> + if (!virtio_net_is_tap_mig(opaque, 0) && peer_wait_incoming(n)) {
>> for (i = 0; i < n->max_queue_pairs; i++) {
>> if (!peer_postponed_init(n, i, errp)) {
>> return false;
>> @@ -4389,6 +4461,7 @@ static const Property virtio_net_properties[] = {
>> host_features_ex,
>> VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM,
>> false),
>> + DEFINE_PROP_BOOL("backend-transfer", VirtIONet, backend_transfer, true),
>> };
>>
>> static void virtio_net_class_init(ObjectClass *klass, const void *data)
>
> I really don't like this approach, because it is requiring the frontend
> device to know about every different backend implementation that is able
> to do state transfer. This really violates the separation from the
> frontend and backend. The choice of specific backend should generally
> be opaque to the frontend.
>
> This really ought to be redesigned to work in terms of an formal API
> exposed by the backend, not poking at TAP backend specific details.
> eg an API that operates on NetClientState, for which each backend
> can provide an optional implementation.
Agree, I'll try.
>
>
>> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
>> index 5b8ab7bda7..bf07f8a4cb 100644
>> --- a/include/hw/virtio/virtio-net.h
>> +++ b/include/hw/virtio/virtio-net.h
>> @@ -231,6 +231,7 @@ struct VirtIONet {
>> struct EBPFRSSContext ebpf_rss;
>> uint32_t nr_ebpf_rss_fds;
>> char **ebpf_rss_fds;
>> + bool backend_transfer;
>> };
>>
>> size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
>> diff --git a/include/net/tap.h b/include/net/tap.h
>> index 5a926ba513..506f7ab719 100644
>> --- a/include/net/tap.h
>> +++ b/include/net/tap.h
>> @@ -36,4 +36,6 @@ int tap_get_fd(NetClientState *nc);
>> bool tap_wait_incoming(NetClientState *nc);
>> bool tap_postponed_init(NetClientState *nc, Error **errp);
>>
>> +extern const VMStateDescription vmstate_tap;
>> +
>> #endif /* QEMU_NET_TAP_H */
>> diff --git a/net/tap.c b/net/tap.c
>> index 8afbf3b407..b9c12dd64c 100644
>> --- a/net/tap.c
>> +++ b/net/tap.c
>> @@ -819,7 +819,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
>>
>> static bool net_tap_setup(TAPState *s, int fd, int vnet_hdr, Error **errp)
>> {
>> - if (!net_tap_set_fd(s, fd, vnet_hdr, errp)) {
>> + if (fd != -1 && !net_tap_set_fd(s, fd, vnet_hdr, errp)) {
>> return false;
>> }
>>
>> @@ -1225,6 +1225,49 @@ int tap_disable(NetClientState *nc)
>> }
>> }
>>
>> +static int tap_pre_load(void *opaque)
>> +{
>> + TAPState *s = opaque;
>> +
>> + if (s->fd != -1) {
>> + error_report(
>> + "TAP is already initialized and cannot receive incoming fd");
>> + return -EINVAL;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static int tap_post_load(void *opaque, int version_id)
>> +{
>> + TAPState *s = opaque;
>> + Error *local_err = NULL;
>> +
>> + if (!net_tap_setup(s, -1, -1, &local_err)) {
>> + error_report_err(local_err);
>> + qemu_del_net_client(&s->nc);
>> + return -EINVAL;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +const VMStateDescription vmstate_tap = {
>> + .name = "net-tap",
>> + .pre_load = tap_pre_load,
>> + .post_load = tap_post_load,
>> + .fields = (const VMStateField[]) {
>> + VMSTATE_FD(fd, TAPState),
>> + VMSTATE_BOOL(using_vnet_hdr, TAPState),
>> + VMSTATE_BOOL(has_ufo, TAPState),
>> + VMSTATE_BOOL(has_uso, TAPState),
>> + VMSTATE_BOOL(has_tunnel, TAPState),
>> + VMSTATE_BOOL(enabled, TAPState),
>> + VMSTATE_UINT32(host_vnet_hdr_len, TAPState),
>> + VMSTATE_END_OF_LIST()
>> + }
>> +};
>> +
>> bool tap_wait_incoming(NetClientState *nc)
>> {
>> TAPState *s = DO_UPCAST(TAPState, nc, nc);
>
> IMHO implementing state transfer in the backends ought to be separate
> commit from adding support for using that in the frontend.
>
Will do.
Thanks for reviewing!
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 8:32 ` Daniel P. Berrangé
@ 2025-10-16 9:23 ` Vladimir Sementsov-Ogievskiy
2025-10-16 10:38 ` Vladimir Sementsov-Ogievskiy
` (2 more replies)
0 siblings, 3 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-16 9:23 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Peter Xu, mst, jasowang, farosas, sw, eblake, armbru, thuth,
philmd, qemu-devel, michael.roth, steven.sistare, leiyang,
davydov-max, yc-core, raphael.s.norwitz
On 16.10.25 11:32, Daniel P. Berrangé wrote:
> On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 15.10.25 23:07, Peter Xu wrote:
>>> On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> On 15.10.25 21:19, Peter Xu wrote:
>>>>> On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> This parameter enables backend-transfer feature: all devices
>>>>>> which support it will migrate their backends (for example a TAP
>>>>>> device, by passing open file descriptor to migration channel).
>>>>>>
>>>>>> Currently no such devices, so the new parameter is a noop.
>>>>>>
>>>>>> Next commit will add support for virtio-net, to migrate its
>>>>>> TAP backend.
>>>>>>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>>>> ---
>>>>
>>>> [..]
>>>>
>>>>>> --- a/qapi/migration.json
>>>>>> +++ b/qapi/migration.json
>>>>>> @@ -951,9 +951,16 @@
>>>>>> # is @cpr-exec. The first list element is the program's filename,
>>>>>> # the remainder its arguments. (Since 10.2)
>>>>>> #
>>>>>> +# @backend-transfer: Enable backend-transfer feature for devices that
>>>>>> +# supports it. In general that means that backend state and its
>>>>>> +# file descriptors are passed to the destination in the migraton
>>>>>> +# channel (which must be a UNIX socket). Individual devices
>>>>>> +# declare the support for backend-transfer by per-device
>>>>>> +# backend-transfer option. (Since 10.2)
>>>>>
>>>>> Thanks.
>>>>>
>>>>> I still prefer the name "fd-passing" or anything more explicit than
>>>>> "backend-transfer". Maybe the current name is fine for TAP, only because
>>>>> TAP doesn't have its own VMSD to transfer?
>>>>>
>>>>> Consider a device that would be a backend that supports VMSDs already to be
>>>>> migrated, then if it starts to allow fd-passing, this name will stop being
>>>>> suitable there, because it used to "transfer backend" already, now it's
>>>>> just started to "fd-passing".
>>>>>
>>>>> Meanwhile, consider another example - what if a device is not a backend at
>>>>> all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
>>>>
>>>> Reasonable.
>>>>
>>>> But consider also the discussion with Fabiano in v5, where he argues against fds
>>>> (reasonable too):
>>>>
>>>> https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
>>>>
>>>> (still, they were against my "fds" name for the parameter, which is
>>>> really too generic, fd-passing is not)
>>>>
>>>> and the arguments for backend-transfer (to read similar with cpr-transfer)
>>>>
>>>> https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
>>>>
>>>>
>>>>>
>>>>> In general, I think "fd" is really a core concept of this whole thing.
>>>>
>>>> I think, we can call "backend" any external object, linked by the fd.
>>>>
>>>> Still, backend/frontend terminology is so misleading, when applied to
>>>> complex systems (for me, at least), that I don't really like "-backend"
>>>> word here.
>>>>
>>>> fd-passing is OK for me, I can resend with it, if arguments by Fabiano
>>>> not change your mind.
>>>
>>> Ah, I didn't notice the name has been discussed.
>>>
>>> I think it means you can vote for your own preference now because we have
>>> one vote for each. :) Let's also see whether Fabiano will come up with
>>> something better than both.
>>>
>>> You mentioned explicitly the file descriptors in the qapi doc, that's what
>>> I would strongly request for. The other thing is the unix socket check, it
>>> looks all good below now with it, thanks. No strong feelings on the names.
>>>
>>
>> After a bit more thinking, I leaning towards keeping backend-transfer. I think
>> it's more meaningful for the user:
>>
>> If we call it "fd-passing", user may ask:
>>
>> Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
>> supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
>> is it unix socket or not, and pass any fds it wants if it is?
>>
>> Logical question is, why not just drop the global capability, and check only
>> is it unix socket or not? (OK, relying only on socket type is wrong anyway,
>> as it may be some complex tunneling, which includes unix sockets, but still
>> can't pass fds, but I think now about feature naming)
>>
>> But we really want an explicit switch for the feature. As qemu-update is
>> not the only case of local migration. The another case is changing the
>> backend. So for the user's choice is:
>>
>> 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
>> we are moving to another host. So, we don't enable "backend-transfer". We don't
>> transfer the backend, we have to initialize new backend on another host.
>>
>> 2. Local migration to update QEMU, with minimal freeze-time and minimal
>> extra actions: use "backend-transfer", exactly to keep the backends
>> (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
>> as is.
>>
>> 3. Local migration, but we want to reconfigure some backend, or switch
>> to another backend. We disable "backend-transfer" for one device.
>
> This implies that you're changing 'backend-transfer' against the
> device at time of each migration.
>
> This takes us back to the situation we've had historically where the
> behaviour of migration depends on global properties the mgmt app has
> set prior to the 'migrate' command being run. We've just tried to get
> away from that model by passing everything as parameters to the
> migrate command, so I'm loathe to see us invent a new way to have
> global state properties changing migration behaviour.
>
> This 'backend-transfer' device property is not really a device property,
> it is an indirect parameter to the 'migrate' command.
>
> Ergo, if we need the ability to selectively migrate the backend state
> of individal devices, then instead of a property on the device, we
> should pass a list of device IDs as a parameter to the migrate
> command in QMP.
Understand.
So, it will look like
# @backend-transfer: List of devices IDs or QOM paths, to enable
# backend-transfer for. In general that means that backend
# states and their file descriptors are passed to the destination
# in the migration channel (which must be a UNIX socket), and
# management tool doesn't have to configure new backends for
# target QEMU (like vhost-user server, or TAP device in the kernel).
# Default is no backend-transfer migration (Since 10.2)
Peter, is it OK for you?
>
>>
>> 4. Some problem with "backend-transfer", may be some bug. Disable the whole
>> beackend-transfer feature, and do normal local migration to a new version
>> with bug fixed.
>>
>
> With regards,
> Daniel
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 9:23 ` Vladimir Sementsov-Ogievskiy
@ 2025-10-16 10:38 ` Vladimir Sementsov-Ogievskiy
2025-10-16 10:55 ` Daniel P. Berrangé
2025-10-16 18:40 ` Peter Xu
2025-10-16 20:26 ` Vladimir Sementsov-Ogievskiy
2 siblings, 1 reply; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-16 10:38 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Peter Xu, mst, jasowang, farosas, sw, eblake, armbru, thuth,
philmd, qemu-devel, michael.roth, steven.sistare, leiyang,
davydov-max, yc-core, raphael.s.norwitz
On 16.10.25 12:23, Vladimir Sementsov-Ogievskiy wrote:
> On 16.10.25 11:32, Daniel P. Berrangé wrote:
>> On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> On 15.10.25 23:07, Peter Xu wrote:
>>>> On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>> On 15.10.25 21:19, Peter Xu wrote:
>>>>>> On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> This parameter enables backend-transfer feature: all devices
>>>>>>> which support it will migrate their backends (for example a TAP
>>>>>>> device, by passing open file descriptor to migration channel).
>>>>>>>
>>>>>>> Currently no such devices, so the new parameter is a noop.
>>>>>>>
>>>>>>> Next commit will add support for virtio-net, to migrate its
>>>>>>> TAP backend.
>>>>>>>
>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>>>>> ---
>>>>>
>>>>> [..]
>>>>>
>>>>>>> --- a/qapi/migration.json
>>>>>>> +++ b/qapi/migration.json
>>>>>>> @@ -951,9 +951,16 @@
>>>>>>> # is @cpr-exec. The first list element is the program's filename,
>>>>>>> # the remainder its arguments. (Since 10.2)
>>>>>>> #
>>>>>>> +# @backend-transfer: Enable backend-transfer feature for devices that
>>>>>>> +# supports it. In general that means that backend state and its
>>>>>>> +# file descriptors are passed to the destination in the migraton
>>>>>>> +# channel (which must be a UNIX socket). Individual devices
>>>>>>> +# declare the support for backend-transfer by per-device
>>>>>>> +# backend-transfer option. (Since 10.2)
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> I still prefer the name "fd-passing" or anything more explicit than
>>>>>> "backend-transfer". Maybe the current name is fine for TAP, only because
>>>>>> TAP doesn't have its own VMSD to transfer?
>>>>>>
>>>>>> Consider a device that would be a backend that supports VMSDs already to be
>>>>>> migrated, then if it starts to allow fd-passing, this name will stop being
>>>>>> suitable there, because it used to "transfer backend" already, now it's
>>>>>> just started to "fd-passing".
>>>>>>
>>>>>> Meanwhile, consider another example - what if a device is not a backend at
>>>>>> all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
>>>>>
>>>>> Reasonable.
>>>>>
>>>>> But consider also the discussion with Fabiano in v5, where he argues against fds
>>>>> (reasonable too):
>>>>>
>>>>> https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
>>>>>
>>>>> (still, they were against my "fds" name for the parameter, which is
>>>>> really too generic, fd-passing is not)
>>>>>
>>>>> and the arguments for backend-transfer (to read similar with cpr-transfer)
>>>>>
>>>>> https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
>>>>>
>>>>>
>>>>>>
>>>>>> In general, I think "fd" is really a core concept of this whole thing.
>>>>>
>>>>> I think, we can call "backend" any external object, linked by the fd.
>>>>>
>>>>> Still, backend/frontend terminology is so misleading, when applied to
>>>>> complex systems (for me, at least), that I don't really like "-backend"
>>>>> word here.
>>>>>
>>>>> fd-passing is OK for me, I can resend with it, if arguments by Fabiano
>>>>> not change your mind.
>>>>
>>>> Ah, I didn't notice the name has been discussed.
>>>>
>>>> I think it means you can vote for your own preference now because we have
>>>> one vote for each. :) Let's also see whether Fabiano will come up with
>>>> something better than both.
>>>>
>>>> You mentioned explicitly the file descriptors in the qapi doc, that's what
>>>> I would strongly request for. The other thing is the unix socket check, it
>>>> looks all good below now with it, thanks. No strong feelings on the names.
>>>>
>>>
>>> After a bit more thinking, I leaning towards keeping backend-transfer. I think
>>> it's more meaningful for the user:
>>>
>>> If we call it "fd-passing", user may ask:
>>>
>>> Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
>>> supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
>>> is it unix socket or not, and pass any fds it wants if it is?
>>>
>>> Logical question is, why not just drop the global capability, and check only
>>> is it unix socket or not? (OK, relying only on socket type is wrong anyway,
>>> as it may be some complex tunneling, which includes unix sockets, but still
>>> can't pass fds, but I think now about feature naming)
>>>
>>> But we really want an explicit switch for the feature. As qemu-update is
>>> not the only case of local migration. The another case is changing the
>>> backend. So for the user's choice is:
>>>
>>> 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
>>> we are moving to another host. So, we don't enable "backend-transfer". We don't
>>> transfer the backend, we have to initialize new backend on another host.
>>>
>>> 2. Local migration to update QEMU, with minimal freeze-time and minimal
>>> extra actions: use "backend-transfer", exactly to keep the backends
>>> (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
>>> as is.
>>>
>>> 3. Local migration, but we want to reconfigure some backend, or switch
>>> to another backend. We disable "backend-transfer" for one device.
>>
>> This implies that you're changing 'backend-transfer' against the
>> device at time of each migration.
>>
>> This takes us back to the situation we've had historically where the
>> behaviour of migration depends on global properties the mgmt app has
>> set prior to the 'migrate' command being run. We've just tried to get
>> away from that model by passing everything as parameters to the
>> migrate command, so I'm loathe to see us invent a new way to have
>> global state properties changing migration behaviour.
>>
>> This 'backend-transfer' device property is not really a device property,
>> it is an indirect parameter to the 'migrate' command.
>>
>> Ergo, if we need the ability to selectively migrate the backend state
>> of individal devices, then instead of a property on the device, we
>> should pass a list of device IDs as a parameter to the migrate
>> command in QMP.
>
> Understand.
>
> So, it will look like
>
> # @backend-transfer: List of devices IDs or QOM paths, to enable
> # backend-transfer for. In general that means that backend
> # states and their file descriptors are passed to the destination
> # in the migration channel (which must be a UNIX socket), and
> # management tool doesn't have to configure new backends for
> # target QEMU (like vhost-user server, or TAP device in the kernel).
> # Default is no backend-transfer migration (Since 10.2)
>
>
> Peter, is it OK for you?
>
>
Or, may be, we just can continue with two simple experimental boolean parameters:
@backend-transfer-vhost-user-blk
and
@backend-transfer-virtio-net-tap
and not care to implement good-final-complex-API, while it's unstable anyway?
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 10:38 ` Vladimir Sementsov-Ogievskiy
@ 2025-10-16 10:55 ` Daniel P. Berrangé
0 siblings, 0 replies; 51+ messages in thread
From: Daniel P. Berrangé @ 2025-10-16 10:55 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: Peter Xu, mst, jasowang, farosas, sw, eblake, armbru, thuth,
philmd, qemu-devel, michael.roth, steven.sistare, leiyang,
davydov-max, yc-core, raphael.s.norwitz
On Thu, Oct 16, 2025 at 01:38:25PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 16.10.25 12:23, Vladimir Sementsov-Ogievskiy wrote:
> > On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > On 15.10.25 23:07, Peter Xu wrote:
> > > > > On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > > On 15.10.25 21:19, Peter Xu wrote:
> > > > > > > On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > > > > This parameter enables backend-transfer feature: all devices
> > > > > > > > which support it will migrate their backends (for example a TAP
> > > > > > > > device, by passing open file descriptor to migration channel).
> > > > > > > >
> > > > > > > > Currently no such devices, so the new parameter is a noop.
> > > > > > > >
> > > > > > > > Next commit will add support for virtio-net, to migrate its
> > > > > > > > TAP backend.
> > > > > > > >
> > > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> > > > > > > > ---
> > > > > >
> > > > > > [..]
> > > > > >
> > > > > > > > --- a/qapi/migration.json
> > > > > > > > +++ b/qapi/migration.json
> > > > > > > > @@ -951,9 +951,16 @@
> > > > > > > > # is @cpr-exec. The first list element is the program's filename,
> > > > > > > > # the remainder its arguments. (Since 10.2)
> > > > > > > > #
> > > > > > > > +# @backend-transfer: Enable backend-transfer feature for devices that
> > > > > > > > +# supports it. In general that means that backend state and its
> > > > > > > > +# file descriptors are passed to the destination in the migraton
> > > > > > > > +# channel (which must be a UNIX socket). Individual devices
> > > > > > > > +# declare the support for backend-transfer by per-device
> > > > > > > > +# backend-transfer option. (Since 10.2)
> > > > > > >
> > > > > > > Thanks.
> > > > > > >
> > > > > > > I still prefer the name "fd-passing" or anything more explicit than
> > > > > > > "backend-transfer". Maybe the current name is fine for TAP, only because
> > > > > > > TAP doesn't have its own VMSD to transfer?
> > > > > > >
> > > > > > > Consider a device that would be a backend that supports VMSDs already to be
> > > > > > > migrated, then if it starts to allow fd-passing, this name will stop being
> > > > > > > suitable there, because it used to "transfer backend" already, now it's
> > > > > > > just started to "fd-passing".
> > > > > > >
> > > > > > > Meanwhile, consider another example - what if a device is not a backend at
> > > > > > > all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
> > > > > >
> > > > > > Reasonable.
> > > > > >
> > > > > > But consider also the discussion with Fabiano in v5, where he argues against fds
> > > > > > (reasonable too):
> > > > > >
> > > > > > https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
> > > > > >
> > > > > > (still, they were against my "fds" name for the parameter, which is
> > > > > > really too generic, fd-passing is not)
> > > > > >
> > > > > > and the arguments for backend-transfer (to read similar with cpr-transfer)
> > > > > >
> > > > > > https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > In general, I think "fd" is really a core concept of this whole thing.
> > > > > >
> > > > > > I think, we can call "backend" any external object, linked by the fd.
> > > > > >
> > > > > > Still, backend/frontend terminology is so misleading, when applied to
> > > > > > complex systems (for me, at least), that I don't really like "-backend"
> > > > > > word here.
> > > > > >
> > > > > > fd-passing is OK for me, I can resend with it, if arguments by Fabiano
> > > > > > not change your mind.
> > > > >
> > > > > Ah, I didn't notice the name has been discussed.
> > > > >
> > > > > I think it means you can vote for your own preference now because we have
> > > > > one vote for each. :) Let's also see whether Fabiano will come up with
> > > > > something better than both.
> > > > >
> > > > > You mentioned explicitly the file descriptors in the qapi doc, that's what
> > > > > I would strongly request for. The other thing is the unix socket check, it
> > > > > looks all good below now with it, thanks. No strong feelings on the names.
> > > > >
> > > >
> > > > After a bit more thinking, I leaning towards keeping backend-transfer. I think
> > > > it's more meaningful for the user:
> > > >
> > > > If we call it "fd-passing", user may ask:
> > > >
> > > > Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
> > > > supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
> > > > is it unix socket or not, and pass any fds it wants if it is?
> > > >
> > > > Logical question is, why not just drop the global capability, and check only
> > > > is it unix socket or not? (OK, relying only on socket type is wrong anyway,
> > > > as it may be some complex tunneling, which includes unix sockets, but still
> > > > can't pass fds, but I think now about feature naming)
> > > >
> > > > But we really want an explicit switch for the feature. As qemu-update is
> > > > not the only case of local migration. The another case is changing the
> > > > backend. So for the user's choice is:
> > > >
> > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > > transfer the backend, we have to initialize new backend on another host.
> > > >
> > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > > extra actions: use "backend-transfer", exactly to keep the backends
> > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > > as is.
> > > >
> > > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > > to another backend. We disable "backend-transfer" for one device.
> > >
> > > This implies that you're changing 'backend-transfer' against the
> > > device at time of each migration.
> > >
> > > This takes us back to the situation we've had historically where the
> > > behaviour of migration depends on global properties the mgmt app has
> > > set prior to the 'migrate' command being run. We've just tried to get
> > > away from that model by passing everything as parameters to the
> > > migrate command, so I'm loathe to see us invent a new way to have
> > > global state properties changing migration behaviour.
> > >
> > > This 'backend-transfer' device property is not really a device property,
> > > it is an indirect parameter to the 'migrate' command.
> > >
> > > Ergo, if we need the ability to selectively migrate the backend state
> > > of individal devices, then instead of a property on the device, we
> > > should pass a list of device IDs as a parameter to the migrate
> > > command in QMP.
> >
> > Understand.
> >
> > So, it will look like
> >
> > # @backend-transfer: List of devices IDs or QOM paths, to enable
> > # backend-transfer for. In general that means that backend
> > # states and their file descriptors are passed to the destination
> > # in the migration channel (which must be a UNIX socket), and
> > # management tool doesn't have to configure new backends for
> > # target QEMU (like vhost-user server, or TAP device in the kernel).
> > # Default is no backend-transfer migration (Since 10.2)
> >
> >
> > Peter, is it OK for you?
>
> Or, may be, we just can continue with two simple experimental boolean parameters:
>
> @backend-transfer-vhost-user-blk
>
> and
>
> @backend-transfer-virtio-net-tap
>
>
> and not care to implement good-final-complex-API, while it's unstable anyway?
Even if declared unstable, that still has a negative impact on the internal
code structure because its putting special cases for certain device types
into the migration framework and the device code, with no time limit on how
long this technical debt will last.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-15 13:21 ` [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter Vladimir Sementsov-Ogievskiy
2025-10-15 18:19 ` Peter Xu
@ 2025-10-16 10:56 ` Markus Armbruster
2025-10-16 12:07 ` Vladimir Sementsov-Ogievskiy
1 sibling, 1 reply; 51+ messages in thread
From: Markus Armbruster @ 2025-10-16 10:56 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: mst, jasowang, peterx, farosas, sw, eblake, thuth, philmd,
berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
davydov-max, yc-core, raphael.s.norwitz
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:
> This parameter enables backend-transfer feature: all devices
> which support it will migrate their backends (for example a TAP
> device, by passing open file descriptor to migration channel).
>
> Currently no such devices, so the new parameter is a noop.
>
> Next commit will add support for virtio-net, to migrate its
> TAP backend.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
[...]
> diff --git a/qapi/migration.json b/qapi/migration.json
> index be0f3fcc12..35601a1f87 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -951,9 +951,16 @@
> # is @cpr-exec. The first list element is the program's filename,
> # the remainder its arguments. (Since 10.2)
> #
> +# @backend-transfer: Enable backend-transfer feature for devices that
Either "Enable the backend transfer feature" or "Enable backend transfer"
> +# supports it. In general that means that backend state and its
support
> +# file descriptors are passed to the destination in the migraton
> +# channel (which must be a UNIX socket). Individual devices
> +# declare the support for backend-transfer by per-device
> +# backend-transfer option. (Since 10.2)
> +#
I'm not sure I understand this.
What is a "per-device backend-transfer option"? Is it a device
property?
If yes, I guess the device declares its capability to do this by having
this property. Correct?
Does the property's value matter? How?
> # Features:
> #
> -# @unstable: Members @x-checkpoint-delay and
> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
> # @x-vcpu-dirty-limit-period are experimental.
> #
> # Since: 2.4
> @@ -978,7 +985,8 @@
> 'mode',
> 'zero-page-detection',
> 'direct-io',
> - 'cpr-exec-command'] }
> + 'cpr-exec-command',
> + { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
>
> ##
> # @MigrateSetParameters:
[...]
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 10:56 ` Markus Armbruster
@ 2025-10-16 12:07 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-16 12:07 UTC (permalink / raw)
To: Markus Armbruster
Cc: mst, jasowang, peterx, farosas, sw, eblake, thuth, philmd,
berrange, qemu-devel, michael.roth, steven.sistare, leiyang,
davydov-max, yc-core, raphael.s.norwitz
On 16.10.25 13:56, Markus Armbruster wrote:
> Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru> writes:
>
>> This parameter enables backend-transfer feature: all devices
>> which support it will migrate their backends (for example a TAP
>> device, by passing open file descriptor to migration channel).
>>
>> Currently no such devices, so the new parameter is a noop.
>>
>> Next commit will add support for virtio-net, to migrate its
>> TAP backend.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>
> [...]
>
>> diff --git a/qapi/migration.json b/qapi/migration.json
>> index be0f3fcc12..35601a1f87 100644
>> --- a/qapi/migration.json
>> +++ b/qapi/migration.json
>> @@ -951,9 +951,16 @@
>> # is @cpr-exec. The first list element is the program's filename,
>> # the remainder its arguments. (Since 10.2)
>> #
>> +# @backend-transfer: Enable backend-transfer feature for devices that
>
> Either "Enable the backend transfer feature" or "Enable backend transfer"
then, "Enable the backend-transfer feature"
>
>> +# supports it. In general that means that backend state and its
>
> support
>
>> +# file descriptors are passed to the destination in the migraton
>> +# channel (which must be a UNIX socket). Individual devices
>> +# declare the support for backend-transfer by per-device
>> +# backend-transfer option. (Since 10.2)
>> +#
>
> I'm not sure I understand this.
>
> What is a "per-device backend-transfer option"? Is it a device
> property?
>
> If yes, I guess the device declares its capability to do this by having
> this property. Correct?
No, user may set/unset this property to say, should device participate
in backend-transfer or not.
Still, as you can see in parallel thread, Daniel have strong arguments
against such API, so seems it will change again in v9.
https://lore.kernel.org/qemu-devel/aPCtkB-GvFNuqlHn@redhat.com/
>
> Does the property's value matter? How?
>
>> # Features:
>> #
>> -# @unstable: Members @x-checkpoint-delay and
>> +# @unstable: Members @backend-transfer, @x-checkpoint-delay and
>> # @x-vcpu-dirty-limit-period are experimental.
>> #
>> # Since: 2.4
>> @@ -978,7 +985,8 @@
>> 'mode',
>> 'zero-page-detection',
>> 'direct-io',
>> - 'cpr-exec-command'] }
>> + 'cpr-exec-command',
>> + { 'name': 'backend-transfer', 'features': ['unstable'] } ] }
>>
>> ##
>> # @MigrateSetParameters:
>
> [...]
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 9:23 ` Vladimir Sementsov-Ogievskiy
2025-10-16 10:38 ` Vladimir Sementsov-Ogievskiy
@ 2025-10-16 18:40 ` Peter Xu
2025-10-16 18:51 ` Daniel P. Berrangé
2025-10-16 20:26 ` Vladimir Sementsov-Ogievskiy
2 siblings, 1 reply; 51+ messages in thread
From: Peter Xu @ 2025-10-16 18:40 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: Daniel P. Berrangé, mst, jasowang, farosas, sw, eblake,
armbru, thuth, philmd, qemu-devel, michael.roth, steven.sistare,
leiyang, davydov-max, yc-core, raphael.s.norwitz
On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > On 15.10.25 23:07, Peter Xu wrote:
> > > > On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > On 15.10.25 21:19, Peter Xu wrote:
> > > > > > On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > > > This parameter enables backend-transfer feature: all devices
> > > > > > > which support it will migrate their backends (for example a TAP
> > > > > > > device, by passing open file descriptor to migration channel).
> > > > > > >
> > > > > > > Currently no such devices, so the new parameter is a noop.
> > > > > > >
> > > > > > > Next commit will add support for virtio-net, to migrate its
> > > > > > > TAP backend.
> > > > > > >
> > > > > > > Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
> > > > > > > ---
> > > > >
> > > > > [..]
> > > > >
> > > > > > > --- a/qapi/migration.json
> > > > > > > +++ b/qapi/migration.json
> > > > > > > @@ -951,9 +951,16 @@
> > > > > > > # is @cpr-exec. The first list element is the program's filename,
> > > > > > > # the remainder its arguments. (Since 10.2)
> > > > > > > #
> > > > > > > +# @backend-transfer: Enable backend-transfer feature for devices that
> > > > > > > +# supports it. In general that means that backend state and its
> > > > > > > +# file descriptors are passed to the destination in the migraton
> > > > > > > +# channel (which must be a UNIX socket). Individual devices
> > > > > > > +# declare the support for backend-transfer by per-device
> > > > > > > +# backend-transfer option. (Since 10.2)
> > > > > >
> > > > > > Thanks.
> > > > > >
> > > > > > I still prefer the name "fd-passing" or anything more explicit than
> > > > > > "backend-transfer". Maybe the current name is fine for TAP, only because
> > > > > > TAP doesn't have its own VMSD to transfer?
> > > > > >
> > > > > > Consider a device that would be a backend that supports VMSDs already to be
> > > > > > migrated, then if it starts to allow fd-passing, this name will stop being
> > > > > > suitable there, because it used to "transfer backend" already, now it's
> > > > > > just started to "fd-passing".
> > > > > >
> > > > > > Meanwhile, consider another example - what if a device is not a backend at
> > > > > > all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
> > > > >
> > > > > Reasonable.
> > > > >
> > > > > But consider also the discussion with Fabiano in v5, where he argues against fds
> > > > > (reasonable too):
> > > > >
> > > > > https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
> > > > >
> > > > > (still, they were against my "fds" name for the parameter, which is
> > > > > really too generic, fd-passing is not)
> > > > >
> > > > > and the arguments for backend-transfer (to read similar with cpr-transfer)
> > > > >
> > > > > https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
> > > > >
> > > > >
> > > > > >
> > > > > > In general, I think "fd" is really a core concept of this whole thing.
> > > > >
> > > > > I think, we can call "backend" any external object, linked by the fd.
> > > > >
> > > > > Still, backend/frontend terminology is so misleading, when applied to
> > > > > complex systems (for me, at least), that I don't really like "-backend"
> > > > > word here.
> > > > >
> > > > > fd-passing is OK for me, I can resend with it, if arguments by Fabiano
> > > > > not change your mind.
> > > >
> > > > Ah, I didn't notice the name has been discussed.
> > > >
> > > > I think it means you can vote for your own preference now because we have
> > > > one vote for each. :) Let's also see whether Fabiano will come up with
> > > > something better than both.
> > > >
> > > > You mentioned explicitly the file descriptors in the qapi doc, that's what
> > > > I would strongly request for. The other thing is the unix socket check, it
> > > > looks all good below now with it, thanks. No strong feelings on the names.
> > > >
> > >
> > > After a bit more thinking, I leaning towards keeping backend-transfer. I think
> > > it's more meaningful for the user:
> > >
> > > If we call it "fd-passing", user may ask:
> > >
> > > Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
> > > supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
> > > is it unix socket or not, and pass any fds it wants if it is?
> > >
> > > Logical question is, why not just drop the global capability, and check only
> > > is it unix socket or not? (OK, relying only on socket type is wrong anyway,
> > > as it may be some complex tunneling, which includes unix sockets, but still
> > > can't pass fds, but I think now about feature naming)
> > >
> > > But we really want an explicit switch for the feature. As qemu-update is
> > > not the only case of local migration. The another case is changing the
> > > backend. So for the user's choice is:
> > >
> > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > transfer the backend, we have to initialize new backend on another host.
> > >
> > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > extra actions: use "backend-transfer", exactly to keep the backends
> > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > as is.
> > >
> > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > to another backend. We disable "backend-transfer" for one device.
> >
> > This implies that you're changing 'backend-transfer' against the
> > device at time of each migration.
> >
> > This takes us back to the situation we've had historically where the
> > behaviour of migration depends on global properties the mgmt app has
> > set prior to the 'migrate' command being run. We've just tried to get
> > away from that model by passing everything as parameters to the
> > migrate command, so I'm loathe to see us invent a new way to have
> > global state properties changing migration behaviour.
> >
> > This 'backend-transfer' device property is not really a device property,
> > it is an indirect parameter to the 'migrate' command.
I was not seeing it like that.
I was treating per-device parameter to be a flag showing whether the device
is capable of passing over FDs, which is more like a device attribute.
Those things (after set by machine type) should never change, and the only
thing to be changed is the global "backend-transfer" boolean that can be
set in the "migrate" QMP command, and should be decided by the admin when
one wants to initiate the migration process.
> >
> > Ergo, if we need the ability to selectively migrate the backend state
> > of individal devices, then instead of a property on the device, we
> > should pass a list of device IDs as a parameter to the migrate
> > command in QMP.
I doubt whether we would really need that in reality.
Likely the admin should only worry about whether setting the global
"backend-transfer", the admin may not even need to know which device, and
how many devices, will be beneficial to this feature enabled.
It just says, "we're doing local migration and via unix sockets, so
whatever devices can try to reuse their backends if possible".
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 18:40 ` Peter Xu
@ 2025-10-16 18:51 ` Daniel P. Berrangé
2025-10-16 19:19 ` Daniel P. Berrangé
2025-10-16 19:29 ` Peter Xu
0 siblings, 2 replies; 51+ messages in thread
From: Daniel P. Berrangé @ 2025-10-16 18:51 UTC (permalink / raw)
To: Peter Xu
Cc: Vladimir Sementsov-Ogievskiy, mst, jasowang, farosas, sw, eblake,
armbru, thuth, philmd, qemu-devel, michael.roth, steven.sistare,
leiyang, davydov-max, yc-core, raphael.s.norwitz
On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote:
> On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > > transfer the backend, we have to initialize new backend on another host.
> > > >
> > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > > extra actions: use "backend-transfer", exactly to keep the backends
> > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > > as is.
> > > >
> > > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > > to another backend. We disable "backend-transfer" for one device.
> > >
> > > This implies that you're changing 'backend-transfer' against the
> > > device at time of each migration.
> > >
> > > This takes us back to the situation we've had historically where the
> > > behaviour of migration depends on global properties the mgmt app has
> > > set prior to the 'migrate' command being run. We've just tried to get
> > > away from that model by passing everything as parameters to the
> > > migrate command, so I'm loathe to see us invent a new way to have
> > > global state properties changing migration behaviour.
> > >
> > > This 'backend-transfer' device property is not really a device property,
> > > it is an indirect parameter to the 'migrate' command.
>
> I was not seeing it like that.
>
> I was treating per-device parameter to be a flag showing whether the device
> is capable of passing over FDs, which is more like a device attribute.
>
> Those things (after set by machine type) should never change, and the only
> thing to be changed is the global "backend-transfer" boolean that can be
> set in the "migrate" QMP command, and should be decided by the admin when
> one wants to initiate the migration process.
>
> > >
> > > Ergo, if we need the ability to selectively migrate the backend state
> > > of individal devices, then instead of a property on the device, we
> > > should pass a list of device IDs as a parameter to the migrate
> > > command in QMP.
>
> I doubt whether we would really need that in reality.
>
> Likely the admin should only worry about whether setting the global
> "backend-transfer", the admin may not even need to know which device, and
> how many devices, will be beneficial to this feature enabled.
>
> It just says, "we're doing local migration and via unix sockets, so
> whatever devices can try to reuse their backends if possible".
An individual device can only use backend transfer if both the old and
new QEMU agree that it can be done. At the time we start the origin
QEMU we know which set of devices are capable of doing an outgoing
backend transfer, but we don't know what set of devices are capable
of doing an incoming backend transfer.
If we don't have a per-device toggle at time of migration, then we
have to assume that the target QEMU can always support at least the
same set of incoming backends as the src QEMU outgoing backend. This
feels like a potentially risky assumption.
Another scenario is where you are doing a localhost migration as a
mechanism to let you change a device backend. In that case you'll
want to do a backend transfer of all devices, except the one that
you want to change.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 18:51 ` Daniel P. Berrangé
@ 2025-10-16 19:19 ` Daniel P. Berrangé
2025-10-16 19:39 ` Peter Xu
2025-10-16 19:29 ` Peter Xu
1 sibling, 1 reply; 51+ messages in thread
From: Daniel P. Berrangé @ 2025-10-16 19:19 UTC (permalink / raw)
To: Peter Xu, Vladimir Sementsov-Ogievskiy, mst, jasowang, farosas,
sw, eblake, armbru, thuth, philmd, qemu-devel, michael.roth,
steven.sistare, leiyang, davydov-max, yc-core, raphael.s.norwitz
On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote:
> On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote:
> > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > > > transfer the backend, we have to initialize new backend on another host.
> > > > >
> > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > > > extra actions: use "backend-transfer", exactly to keep the backends
> > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > > > as is.
> > > > >
> > > > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > > > to another backend. We disable "backend-transfer" for one device.
> > > >
> > > > This implies that you're changing 'backend-transfer' against the
> > > > device at time of each migration.
> > > >
> > > > This takes us back to the situation we've had historically where the
> > > > behaviour of migration depends on global properties the mgmt app has
> > > > set prior to the 'migrate' command being run. We've just tried to get
> > > > away from that model by passing everything as parameters to the
> > > > migrate command, so I'm loathe to see us invent a new way to have
> > > > global state properties changing migration behaviour.
> > > >
> > > > This 'backend-transfer' device property is not really a device property,
> > > > it is an indirect parameter to the 'migrate' command.
> >
> > I was not seeing it like that.
> >
> > I was treating per-device parameter to be a flag showing whether the device
> > is capable of passing over FDs, which is more like a device attribute.
Whether a backend is technically capable of transfer shouldn't require a
user specified property - there should be an internal API to query whether
the current backend configuration is transferrable or not, based on the
code implementation. Allowing a mgmt app to specify this can only lead
to mistakes, because they don't know the internal constraints of the
implementation.
The mgmt app should only be concerned with whether they want to transfer
a backend or not which is a time-of-use decision rather than launch time
decision.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 18:51 ` Daniel P. Berrangé
2025-10-16 19:19 ` Daniel P. Berrangé
@ 2025-10-16 19:29 ` Peter Xu
2025-10-16 19:57 ` Daniel P. Berrangé
1 sibling, 1 reply; 51+ messages in thread
From: Peter Xu @ 2025-10-16 19:29 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Vladimir Sementsov-Ogievskiy, mst, jasowang, farosas, sw, eblake,
armbru, thuth, philmd, qemu-devel, michael.roth, steven.sistare,
leiyang, davydov-max, yc-core, raphael.s.norwitz
On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote:
> On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote:
> > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > > > transfer the backend, we have to initialize new backend on another host.
> > > > >
> > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > > > extra actions: use "backend-transfer", exactly to keep the backends
> > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > > > as is.
> > > > >
> > > > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > > > to another backend. We disable "backend-transfer" for one device.
> > > >
> > > > This implies that you're changing 'backend-transfer' against the
> > > > device at time of each migration.
> > > >
> > > > This takes us back to the situation we've had historically where the
> > > > behaviour of migration depends on global properties the mgmt app has
> > > > set prior to the 'migrate' command being run. We've just tried to get
> > > > away from that model by passing everything as parameters to the
> > > > migrate command, so I'm loathe to see us invent a new way to have
> > > > global state properties changing migration behaviour.
> > > >
> > > > This 'backend-transfer' device property is not really a device property,
> > > > it is an indirect parameter to the 'migrate' command.
> >
> > I was not seeing it like that.
> >
> > I was treating per-device parameter to be a flag showing whether the device
> > is capable of passing over FDs, which is more like a device attribute.
> >
> > Those things (after set by machine type) should never change, and the only
> > thing to be changed is the global "backend-transfer" boolean that can be
> > set in the "migrate" QMP command, and should be decided by the admin when
> > one wants to initiate the migration process.
> >
> > > >
> > > > Ergo, if we need the ability to selectively migrate the backend state
> > > > of individal devices, then instead of a property on the device, we
> > > > should pass a list of device IDs as a parameter to the migrate
> > > > command in QMP.
> >
> > I doubt whether we would really need that in reality.
> >
> > Likely the admin should only worry about whether setting the global
> > "backend-transfer", the admin may not even need to know which device, and
> > how many devices, will be beneficial to this feature enabled.
> >
> > It just says, "we're doing local migration and via unix sockets, so
> > whatever devices can try to reuse their backends if possible".
>
> An individual device can only use backend transfer if both the old and
> new QEMU agree that it can be done. At the time we start the origin
> QEMU we know which set of devices are capable of doing an outgoing
> backend transfer, but we don't know what set of devices are capable
> of doing an incoming backend transfer.
>
> If we don't have a per-device toggle at time of migration, then we
> have to assume that the target QEMU can always support at least the
> same set of incoming backends as the src QEMU outgoing backend. This
> feels like a potentially risky assumption.
When using machine properties, these things should already be set by the
machine types.
E.g. if this is a new QEMU with an old machine type, we should have this
per-device property set to OFF forever when booting the VM, and should keep
it like that after any rounds of migrations. Because any VM using the old
machine type _might_ be migrated back to an older QEMU that won't support
it. So IIUC that strictly follows how we use versioned machine types.
What Vladimir mentioned previously would be something very special, but
indeed when there's no machine type versioning we may need to toggle this
before each migration. However since upstream is following the machine
type properties way of doing this since N years ago, do we need to worry
about that?
>
> Another scenario is where you are doing a localhost migration as a
> mechanism to let you change a device backend. In that case you'll
> want to do a backend transfer of all devices, except the one that
> you want to change.
Right, this might be a real need if it exists. Said that, it's so special
that I'm not sure whether the admin can easily migrate with global
backend-transfer to OFF in this rare case.
In general, I would prefer avoiding to introduce any form of list of
devices into the migration system if ever possible. I agree if we must
introduce that it should at least be a list of IDs rather than adhoc array
of strings. However I still want to see whether we can completely avoid
it.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 19:19 ` Daniel P. Berrangé
@ 2025-10-16 19:39 ` Peter Xu
2025-10-16 20:00 ` Daniel P. Berrangé
0 siblings, 1 reply; 51+ messages in thread
From: Peter Xu @ 2025-10-16 19:39 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Vladimir Sementsov-Ogievskiy, mst, jasowang, farosas, sw, eblake,
armbru, thuth, philmd, qemu-devel, michael.roth, steven.sistare,
leiyang, davydov-max, yc-core, raphael.s.norwitz
On Thu, Oct 16, 2025 at 08:19:37PM +0100, Daniel P. Berrangé wrote:
> On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote:
> > > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > > > > transfer the backend, we have to initialize new backend on another host.
> > > > > >
> > > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > > > > extra actions: use "backend-transfer", exactly to keep the backends
> > > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > > > > as is.
> > > > > >
> > > > > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > > > > to another backend. We disable "backend-transfer" for one device.
> > > > >
> > > > > This implies that you're changing 'backend-transfer' against the
> > > > > device at time of each migration.
> > > > >
> > > > > This takes us back to the situation we've had historically where the
> > > > > behaviour of migration depends on global properties the mgmt app has
> > > > > set prior to the 'migrate' command being run. We've just tried to get
> > > > > away from that model by passing everything as parameters to the
> > > > > migrate command, so I'm loathe to see us invent a new way to have
> > > > > global state properties changing migration behaviour.
> > > > >
> > > > > This 'backend-transfer' device property is not really a device property,
> > > > > it is an indirect parameter to the 'migrate' command.
> > >
> > > I was not seeing it like that.
> > >
> > > I was treating per-device parameter to be a flag showing whether the device
> > > is capable of passing over FDs, which is more like a device attribute.
>
> Whether a backend is technically capable of transfer shouldn't require a
> user specified property - there should be an internal API to query whether
> the current backend configuration is transferrable or not, based on the
> code implementation. Allowing a mgmt app to specify this can only lead
> to mistakes, because they don't know the internal constraints of the
> implementation.
>
> The mgmt app should only be concerned with whether they want to transfer
> a backend or not which is a time-of-use decision rather than launch time
> decision.
IMHO the per-device property, when available, should always mean it fully
support the feature, when it is turned ON.
I also think above statement matches exactly how I see it.. I never
expected mgmt to toggle the per-device properties, as I just left similar
statements in another reply.
That's also why I think the global backend-transfer should be the only
thing exposed to mgmt. So even if the device properties would exist, they
should only be used in compat properties for the upstream QEMUs.
They're still needed, and be helpful when other devices introduce some
similar concepts to support fd passover, then on some machine types when
the global feature enabled, QEMU will automatically do fd-pass for some
devices and some not, based on the machine type.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 19:29 ` Peter Xu
@ 2025-10-16 19:57 ` Daniel P. Berrangé
2025-10-16 20:28 ` Peter Xu
0 siblings, 1 reply; 51+ messages in thread
From: Daniel P. Berrangé @ 2025-10-16 19:57 UTC (permalink / raw)
To: Peter Xu
Cc: Vladimir Sementsov-Ogievskiy, mst, jasowang, farosas, sw, eblake,
armbru, thuth, philmd, qemu-devel, michael.roth, steven.sistare,
leiyang, davydov-max, yc-core, raphael.s.norwitz
On Thu, Oct 16, 2025 at 03:29:27PM -0400, Peter Xu wrote:
> On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote:
> > > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > > > > transfer the backend, we have to initialize new backend on another host.
> > > > > >
> > > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > > > > extra actions: use "backend-transfer", exactly to keep the backends
> > > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > > > > as is.
> > > > > >
> > > > > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > > > > to another backend. We disable "backend-transfer" for one device.
> > > > >
> > > > > This implies that you're changing 'backend-transfer' against the
> > > > > device at time of each migration.
> > > > >
> > > > > This takes us back to the situation we've had historically where the
> > > > > behaviour of migration depends on global properties the mgmt app has
> > > > > set prior to the 'migrate' command being run. We've just tried to get
> > > > > away from that model by passing everything as parameters to the
> > > > > migrate command, so I'm loathe to see us invent a new way to have
> > > > > global state properties changing migration behaviour.
> > > > >
> > > > > This 'backend-transfer' device property is not really a device property,
> > > > > it is an indirect parameter to the 'migrate' command.
> > >
> > > I was not seeing it like that.
> > >
> > > I was treating per-device parameter to be a flag showing whether the device
> > > is capable of passing over FDs, which is more like a device attribute.
> > >
> > > Those things (after set by machine type) should never change, and the only
> > > thing to be changed is the global "backend-transfer" boolean that can be
> > > set in the "migrate" QMP command, and should be decided by the admin when
> > > one wants to initiate the migration process.
> > >
> > > > >
> > > > > Ergo, if we need the ability to selectively migrate the backend state
> > > > > of individal devices, then instead of a property on the device, we
> > > > > should pass a list of device IDs as a parameter to the migrate
> > > > > command in QMP.
> > >
> > > I doubt whether we would really need that in reality.
> > >
> > > Likely the admin should only worry about whether setting the global
> > > "backend-transfer", the admin may not even need to know which device, and
> > > how many devices, will be beneficial to this feature enabled.
> > >
> > > It just says, "we're doing local migration and via unix sockets, so
> > > whatever devices can try to reuse their backends if possible".
> >
> > An individual device can only use backend transfer if both the old and
> > new QEMU agree that it can be done. At the time we start the origin
> > QEMU we know which set of devices are capable of doing an outgoing
> > backend transfer, but we don't know what set of devices are capable
> > of doing an incoming backend transfer.
> >
> > If we don't have a per-device toggle at time of migration, then we
> > have to assume that the target QEMU can always support at least the
> > same set of incoming backends as the src QEMU outgoing backend. This
> > feels like a potentially risky assumption.
>
> When using machine properties, these things should already be set by the
> machine types.
Errm, machine types apply to devices, but this is about transferring
backends which are outside the scope of machine types.
> E.g. if this is a new QEMU with an old machine type, we should have this
> per-device property set to OFF forever when booting the VM, and should keep
> it like that after any rounds of migrations. Because any VM using the old
> machine type _might_ be migrated back to an older QEMU that won't support
> it. So IIUC that strictly follows how we use versioned machine types.
That makes no conceptual sense. Whether or not a particular backend
can be transferred is determined by the choice of backend and its
configuration. A "backend-transfer" property against the device
frontend cannot be set from the machine type definition, as the
machine type has no knowledge of what backend configuration will
be used.
> In general, I would prefer avoiding to introduce any form of list of
> devices into the migration system if ever possible. I agree if we must
> introduce that it should at least be a list of IDs rather than adhoc array
> of strings. However I still want to see whether we can completely avoid
> it.
Yes, anything in the migrate API would have to directly correspond
to an ID of a device frontend or backend.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 19:39 ` Peter Xu
@ 2025-10-16 20:00 ` Daniel P. Berrangé
0 siblings, 0 replies; 51+ messages in thread
From: Daniel P. Berrangé @ 2025-10-16 20:00 UTC (permalink / raw)
To: Peter Xu
Cc: Vladimir Sementsov-Ogievskiy, mst, jasowang, farosas, sw, eblake,
armbru, thuth, philmd, qemu-devel, michael.roth, steven.sistare,
leiyang, davydov-max, yc-core, raphael.s.norwitz
On Thu, Oct 16, 2025 at 03:39:03PM -0400, Peter Xu wrote:
> On Thu, Oct 16, 2025 at 08:19:37PM +0100, Daniel P. Berrangé wrote:
> > On Thu, Oct 16, 2025 at 07:51:42PM +0100, Daniel P. Berrangé wrote:
> > > On Thu, Oct 16, 2025 at 02:40:58PM -0400, Peter Xu wrote:
> > > > On Thu, Oct 16, 2025 at 12:23:35PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > On 16.10.25 11:32, Daniel P. Berrangé wrote:
> > > > > > On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> > > > > > > 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
> > > > > > > we are moving to another host. So, we don't enable "backend-transfer". We don't
> > > > > > > transfer the backend, we have to initialize new backend on another host.
> > > > > > >
> > > > > > > 2. Local migration to update QEMU, with minimal freeze-time and minimal
> > > > > > > extra actions: use "backend-transfer", exactly to keep the backends
> > > > > > > (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
> > > > > > > as is.
> > > > > > >
> > > > > > > 3. Local migration, but we want to reconfigure some backend, or switch
> > > > > > > to another backend. We disable "backend-transfer" for one device.
> > > > > >
> > > > > > This implies that you're changing 'backend-transfer' against the
> > > > > > device at time of each migration.
> > > > > >
> > > > > > This takes us back to the situation we've had historically where the
> > > > > > behaviour of migration depends on global properties the mgmt app has
> > > > > > set prior to the 'migrate' command being run. We've just tried to get
> > > > > > away from that model by passing everything as parameters to the
> > > > > > migrate command, so I'm loathe to see us invent a new way to have
> > > > > > global state properties changing migration behaviour.
> > > > > >
> > > > > > This 'backend-transfer' device property is not really a device property,
> > > > > > it is an indirect parameter to the 'migrate' command.
> > > >
> > > > I was not seeing it like that.
> > > >
> > > > I was treating per-device parameter to be a flag showing whether the device
> > > > is capable of passing over FDs, which is more like a device attribute.
> >
> > Whether a backend is technically capable of transfer shouldn't require a
> > user specified property - there should be an internal API to query whether
> > the current backend configuration is transferrable or not, based on the
> > code implementation. Allowing a mgmt app to specify this can only lead
> > to mistakes, because they don't know the internal constraints of the
> > implementation.
> >
> > The mgmt app should only be concerned with whether they want to transfer
> > a backend or not which is a time-of-use decision rather than launch time
> > decision.
>
> IMHO the per-device property, when available, should always mean it fully
> support the feature, when it is turned ON.
That can't be expressed in a property in the device.
Consider the virtio-net device. The backend transfer is only
possible of the virtio-net is associated with a netdev using
the vhost-user backend, and the vhost-user backend must be
using a chardev with a socket backend, and the socket backend
must not have TLS or websockets enabled.
Migratability of the backend requires an API against the
NetClientInfo object, which will in turn require calling
out to an API against the Chardv object.
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 9:23 ` Vladimir Sementsov-Ogievskiy
2025-10-16 10:38 ` Vladimir Sementsov-Ogievskiy
2025-10-16 18:40 ` Peter Xu
@ 2025-10-16 20:26 ` Vladimir Sementsov-Ogievskiy
2025-10-16 20:30 ` Vladimir Sementsov-Ogievskiy
2 siblings, 1 reply; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-16 20:26 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Peter Xu, mst, jasowang, farosas, sw, eblake, armbru, thuth,
philmd, qemu-devel, michael.roth, leiyang, davydov-max, yc-core,
raphael.s.norwitz
On 16.10.25 12:23, Vladimir Sementsov-Ogievskiy wrote:
> On 16.10.25 11:32, Daniel P. Berrangé wrote:
>> On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>> On 15.10.25 23:07, Peter Xu wrote:
>>>> On Wed, Oct 15, 2025 at 10:02:14PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>> On 15.10.25 21:19, Peter Xu wrote:
>>>>>> On Wed, Oct 15, 2025 at 04:21:32PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> This parameter enables backend-transfer feature: all devices
>>>>>>> which support it will migrate their backends (for example a TAP
>>>>>>> device, by passing open file descriptor to migration channel).
>>>>>>>
>>>>>>> Currently no such devices, so the new parameter is a noop.
>>>>>>>
>>>>>>> Next commit will add support for virtio-net, to migrate its
>>>>>>> TAP backend.
>>>>>>>
>>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
>>>>>>> ---
>>>>>
>>>>> [..]
>>>>>
>>>>>>> --- a/qapi/migration.json
>>>>>>> +++ b/qapi/migration.json
>>>>>>> @@ -951,9 +951,16 @@
>>>>>>> # is @cpr-exec. The first list element is the program's filename,
>>>>>>> # the remainder its arguments. (Since 10.2)
>>>>>>> #
>>>>>>> +# @backend-transfer: Enable backend-transfer feature for devices that
>>>>>>> +# supports it. In general that means that backend state and its
>>>>>>> +# file descriptors are passed to the destination in the migraton
>>>>>>> +# channel (which must be a UNIX socket). Individual devices
>>>>>>> +# declare the support for backend-transfer by per-device
>>>>>>> +# backend-transfer option. (Since 10.2)
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> I still prefer the name "fd-passing" or anything more explicit than
>>>>>> "backend-transfer". Maybe the current name is fine for TAP, only because
>>>>>> TAP doesn't have its own VMSD to transfer?
>>>>>>
>>>>>> Consider a device that would be a backend that supports VMSDs already to be
>>>>>> migrated, then if it starts to allow fd-passing, this name will stop being
>>>>>> suitable there, because it used to "transfer backend" already, now it's
>>>>>> just started to "fd-passing".
>>>>>>
>>>>>> Meanwhile, consider another example - what if a device is not a backend at
>>>>>> all (e.g. vfio?), has its own VMSD, then want to do fd-passing?
>>>>>
>>>>> Reasonable.
>>>>>
>>>>> But consider also the discussion with Fabiano in v5, where he argues against fds
>>>>> (reasonable too):
>>>>>
>>>>> https://lore.kernel.org/qemu-devel/87y0qatqoa.fsf@suse.de/
>>>>>
>>>>> (still, they were against my "fds" name for the parameter, which is
>>>>> really too generic, fd-passing is not)
>>>>>
>>>>> and the arguments for backend-transfer (to read similar with cpr-transfer)
>>>>>
>>>>> https://lore.kernel.org/qemu-devel/87ms6qtlgf.fsf@suse.de/
>>>>>
>>>>>
>>>>>>
>>>>>> In general, I think "fd" is really a core concept of this whole thing.
>>>>>
>>>>> I think, we can call "backend" any external object, linked by the fd.
>>>>>
>>>>> Still, backend/frontend terminology is so misleading, when applied to
>>>>> complex systems (for me, at least), that I don't really like "-backend"
>>>>> word here.
>>>>>
>>>>> fd-passing is OK for me, I can resend with it, if arguments by Fabiano
>>>>> not change your mind.
>>>>
>>>> Ah, I didn't notice the name has been discussed.
>>>>
>>>> I think it means you can vote for your own preference now because we have
>>>> one vote for each. :) Let's also see whether Fabiano will come up with
>>>> something better than both.
>>>>
>>>> You mentioned explicitly the file descriptors in the qapi doc, that's what
>>>> I would strongly request for. The other thing is the unix socket check, it
>>>> looks all good below now with it, thanks. No strong feelings on the names.
>>>>
>>>
>>> After a bit more thinking, I leaning towards keeping backend-transfer. I think
>>> it's more meaningful for the user:
>>>
>>> If we call it "fd-passing", user may ask:
>>>
>>> Ok, what is it? Allow QEMU to pass some fds through migration stream, if it
>>> supports fds? Which fds? Why to pass them? Finally, why QEMU can't just check
>>> is it unix socket or not, and pass any fds it wants if it is?
>>>
>>> Logical question is, why not just drop the global capability, and check only
>>> is it unix socket or not? (OK, relying only on socket type is wrong anyway,
>>> as it may be some complex tunneling, which includes unix sockets, but still
>>> can't pass fds, but I think now about feature naming)
>>>
>>> But we really want an explicit switch for the feature. As qemu-update is
>>> not the only case of local migration. The another case is changing the
>>> backend. So for the user's choice is:
>>>
>>> 1. Remote migration: we can't reuse backends (files, sockets, host devices), as
>>> we are moving to another host. So, we don't enable "backend-transfer". We don't
>>> transfer the backend, we have to initialize new backend on another host.
>>>
>>> 2. Local migration to update QEMU, with minimal freeze-time and minimal
>>> extra actions: use "backend-transfer", exactly to keep the backends
>>> (vhost-user-server, TAP device in kernel, in-kernel vfio device state, etc)
>>> as is.
>>>
>>> 3. Local migration, but we want to reconfigure some backend, or switch
>>> to another backend. We disable "backend-transfer" for one device.
>>
>> This implies that you're changing 'backend-transfer' against the
>> device at time of each migration.
>>
>> This takes us back to the situation we've had historically where the
>> behaviour of migration depends on global properties the mgmt app has
>> set prior to the 'migrate' command being run. We've just tried to get
>> away from that model by passing everything as parameters to the
>> migrate command, so I'm loathe to see us invent a new way to have
>> global state properties changing migration behaviour.
>>
>> This 'backend-transfer' device property is not really a device property,
>> it is an indirect parameter to the 'migrate' command.
>>
>> Ergo, if we need the ability to selectively migrate the backend state
>> of individal devices, then instead of a property on the device, we
>> should pass a list of device IDs as a parameter to the migrate
>> command in QMP.
>
> Understand.
>
> So, it will look like
>
> # @backend-transfer: List of devices IDs or QOM paths, to enable
> # backend-transfer for. In general that means that backend
> # states and their file descriptors are passed to the destination
> # in the migration channel (which must be a UNIX socket), and
> # management tool doesn't have to configure new backends for
> # target QEMU (like vhost-user server, or TAP device in the kernel).
> # Default is no backend-transfer migration (Since 10.2)
>
RFC diff to these series, to switch the API to list of IDs:
diff --git a/hw/core/machine.c b/hw/core/machine.c
index a3d77f5604..681adbb7ac 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -40,7 +40,6 @@
GlobalProperty hw_compat_10_1[] = {
{ TYPE_ACPI_GED, "x-has-hest-addr", "false" },
- { TYPE_VIRTIO_NET, "backend-transfer", "false" },
};
const size_t hw_compat_10_1_len = G_N_ELEMENTS(hw_compat_10_1);
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 5f9711dee7..a895b26e5d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3638,7 +3638,7 @@ static bool virtio_net_is_tap_mig(void *opaque, int version_id)
nc = qemu_get_queue(n->nic);
- return migrate_backend_transfer() && n->backend_transfer && nc->peer &&
+ return migrate_backend_transfer(DEVICE(n)) && nc->peer &&
nc->peer->info->type == NET_CLIENT_DRIVER_TAP;
}
@@ -4461,7 +4461,6 @@ static const Property virtio_net_properties[] = {
host_features_ex,
VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM,
false),
- DEFINE_PROP_BOOL("backend-transfer", VirtIONet, backend_transfer, true),
};
static void virtio_net_class_init(ObjectClass *klass, const void *data)
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index a7bfb10dc7..0f3b7aa55e 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -1160,4 +1160,7 @@ typedef enum MachineInitPhase {
bool phase_check(MachineInitPhase phase);
void phase_advance(MachineInitPhase phase);
+bool migrate_backend_transfer(DeviceState *dev);
+bool migrate_backend_transfer_check_list(const strList *list, Error **errp);
+
#endif
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index bf07f8a4cb..5b8ab7bda7 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -231,7 +231,6 @@ struct VirtIONet {
struct EBPFRSSContext ebpf_rss;
uint32_t nr_ebpf_rss_fds;
char **ebpf_rss_fds;
- bool backend_transfer;
};
size_t virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 592b93021e..7f931bed17 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -152,4 +152,6 @@ bool multifd_device_state_save_thread_should_exit(void);
void multifd_abort_device_state_save_threads(void);
bool multifd_join_device_state_save_threads(void);
+const strList *migrate_backend_transfer_list(void);
+
#endif
diff --git a/migration/options.c b/migration/options.c
index a461b07b54..1644728ed7 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -13,6 +13,7 @@
#include "qemu/osdep.h"
#include "qemu/error-report.h"
+#include "qapi/util.h"
#include "exec/target_page.h"
#include "qapi/clone-visitor.h"
#include "qapi/error.h"
@@ -24,6 +25,7 @@
#include "migration/colo.h"
#include "migration/cpr.h"
#include "migration/misc.h"
+#include "migration/options.h"
#include "migration.h"
#include "migration-stats.h"
#include "qemu-file.h"
@@ -262,7 +264,7 @@ bool migrate_mapped_ram(void)
return s->capabilities[MIGRATION_CAPABILITY_MAPPED_RAM];
}
-bool migrate_backend_transfer(void)
+const strList *migrate_backend_transfer_list(void)
{
MigrationState *s = migrate_get_current();
return s->parameters.backend_transfer;
@@ -969,8 +971,11 @@ MigrationParameters *qmp_query_migrate_parameters(Error **errp)
params->cpr_exec_command = QAPI_CLONE(strList,
s->parameters.cpr_exec_command);
- params->has_backend_transfer = true;
- params->backend_transfer = s->parameters.backend_transfer;
+ if (s->parameters.backend_transfer) {
+ params->has_backend_transfer = true;
+ params->backend_transfer = QAPI_CLONE(strList,
+ s->parameters.backend_transfer);
+ }
return params;
}
@@ -1193,6 +1198,11 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
return false;
}
+ if (params->has_backend_transfer &&
+ !migrate_backend_transfer_check_list(params->backend_transfer, errp)) {
+ return false;
+ }
+
return true;
}
@@ -1459,7 +1469,10 @@ static void migrate_params_apply(MigrateSetParameters *params, Error **errp)
}
if (params->has_backend_transfer) {
- s->parameters.backend_transfer = params->backend_transfer;
+ qapi_free_strList(s->parameters.backend_transfer);
+
+ s->parameters.backend_transfer = QAPI_CLONE(strList,
+ params->backend_transfer);
}
}
diff --git a/migration/options.h b/migration/options.h
index 755ba1c024..82d839709e 100644
--- a/migration/options.h
+++ b/migration/options.h
@@ -87,8 +87,6 @@ const char *migrate_tls_hostname(void);
uint64_t migrate_xbzrle_cache_size(void);
ZeroPageDetection migrate_zero_page_detection(void);
-bool migrate_backend_transfer(void);
-
/* parameters helpers */
bool migrate_params_check(MigrationParameters *params, Error **errp);
diff --git a/qapi/migration.json b/qapi/migration.json
index 35601a1f87..9478c4ddab 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -951,12 +951,11 @@
# is @cpr-exec. The first list element is the program's filename,
# the remainder its arguments. (Since 10.2)
#
-# @backend-transfer: Enable backend-transfer feature for devices that
-# supports it. In general that means that backend state and its
-# file descriptors are passed to the destination in the migraton
-# channel (which must be a UNIX socket). Individual devices
-# declare the support for backend-transfer by per-device
-# backend-transfer option. (Since 10.2)
+# @backend-transfer: List of devices (IDs or QOM paths) for
+# backend-transfer migration. When enabled, device backends
+# including opened fds will be passed to the destination in the
+# migration channel (which must be a UNIX domain socket). Default
+# is no backend-transfer migration. (Since 10.2)
#
# Features:
#
@@ -1145,12 +1144,11 @@
# is @cpr-exec. The first list element is the program's filename,
# the remainder its arguments. (Since 10.2)
#
-# @backend-transfer: Enable backend-transfer feature for devices that
-# supports it. In general that means that backend state and its
-# file descriptors are passed to the destination in the migraton
-# channel (which must be a UNIX socket). Individual devices
-# declare the support for backend-transfer by per-device
-# backend-transfer option. (Since 10.2)
+# @backend-transfer: List of devices (IDs or QOM paths) for
+# backend-transfer migration. When enabled, device backends
+# including opened fds will be passed to the destination in the
+# migration channel (which must be a UNIX domain socket). Default
+# is no backend-transfer migration. (Since 10.2)
#
# Features:
#
@@ -1195,7 +1193,7 @@
'*zero-page-detection': 'ZeroPageDetection',
'*direct-io': 'bool',
'*cpr-exec-command': [ 'str' ],
- '*backend-transfer': { 'type': 'bool',
+ '*backend-transfer': { 'type': [ 'str' ],
'features': [ 'unstable' ] } } }
##
@@ -1369,12 +1367,11 @@
# is @cpr-exec. The first list element is the program's filename,
# the remainder its arguments. (Since 10.2)
#
-# @backend-transfer: Enable backend-transfer feature for devices that
-# supports it. In general that means that backend state and its
-# file descriptors are passed to the destination in the migraton
-# channel (which must be a UNIX socket). Individual devices
-# declare the support for backend-transfer by per-device
-# backend-transfer option. (Since 10.2)
+# @backend-transfer: List of devices (IDs or QOM paths) for
+# backend-transfer migration. When enabled, device backends
+# including opened fds will be passed to the destination in the
+# migration channel (which must be a UNIX domain socket). Default
+# is no backend-transfer migration. (Since 10.2)
#
# Features:
#
@@ -1416,7 +1413,7 @@
'*zero-page-detection': 'ZeroPageDetection',
'*direct-io': 'bool',
'*cpr-exec-command': [ 'str' ],
- '*backend-transfer': { 'type': 'bool',
+ '*backend-transfer': { 'type': [ 'str' ],
'features': [ 'unstable' ] } } }
##
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 2ac92d0a07..b4a1a88992 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -939,6 +939,32 @@ void qmp_device_del(const char *id, Error **errp)
}
}
+bool migrate_backend_transfer(DeviceState *dev)
+{
+ const strList *el = migrate_backend_transfer_list();
+
+ for ( ; el; el = el->next) {
+ if (find_device_state(el->value, false, NULL) == dev) {
+ return true;
+ }
+ }
+
+ return false;
+}
+
+bool migrate_backend_transfer_check_list(const strList *list, Error **errp)
+{
+ const strList *el = list;
+
+ for ( ; el; el = el->next) {
+ if (!find_device_state(el->value, false, errp)) {
+ return false;
+ }
+ }
+
+ return true;
+}
+
int qdev_sync_config(DeviceState *dev, Error **errp)
{
DeviceClass *dc = DEVICE_GET_CLASS(dev);
diff --git a/tests/functional/test_x86_64_tap_migration.py b/tests/functional/test_x86_64_tap_migration.py
index 1f88ff174c..a324b0f374 100644
--- a/tests/functional/test_x86_64_tap_migration.py
+++ b/tests/functional/test_x86_64_tap_migration.py
@@ -254,17 +254,16 @@ def prepare_and_launch_vm(
self.log.info(f"Launching {vm_s} VM")
vm.launch()
- self.set_migration_capabilities(vm, backend_transfer)
-
if not backend_transfer:
tap_name = TAP_ID2 if incoming else TAP_ID
else:
tap_name = TAP_ID
- self.add_virtio_net(vm, vhost, tap_name, backend_transfer)
+ self.add_virtio_net(vm, vhost, tap_name)
+
+ self.set_migration_capabilities(vm, backend_transfer)
- def add_virtio_net(self, vm, vhost: bool, tap_name: str,
- backend_transfer: bool):
+ def add_virtio_net(self, vm, vhost: bool, tap_name: str = "tap0"):
netdev_params = {
"id": "netdev.1",
"vhost": vhost,
@@ -289,17 +288,19 @@ def add_virtio_net(self, vm, vhost: bool, tap_name: str,
bus="pci.1",
mac=GUEST_MAC,
disable_legacy="off",
- backend_transfer=backend_transfer,
)
def set_migration_capabilities(self, vm, backend_transfer=True):
- vm.cmd("migrate-set-capabilities", { "capabilities": [
+ capabilities = [
{"capability": "events", "state": True},
{"capability": "x-ignore-shared", "state": True},
- ]})
- vm.cmd("migrate-set-parameters", {
- "backend-transfer": backend_transfer
- })
+ ]
+ vm.cmd("migrate-set-capabilities", {"capabilities": capabilities})
+ if backend_transfer:
+ vm.cmd(
+ "migrate-set-parameters",
+ {"backend-transfer": ["/machine/peripheral/vnet.1/virtio-backend"]},
+ )
def setup_guest_network(self) -> None:
exec_command_and_wait_for_pattern(self, "ip addr", "# ")
--
Best regards,
Vladimir
^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 19:57 ` Daniel P. Berrangé
@ 2025-10-16 20:28 ` Peter Xu
2025-10-17 6:51 ` Vladimir Sementsov-Ogievskiy
2025-10-17 8:10 ` Daniel P. Berrangé
0 siblings, 2 replies; 51+ messages in thread
From: Peter Xu @ 2025-10-16 20:28 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Vladimir Sementsov-Ogievskiy, mst, jasowang, farosas, sw, eblake,
armbru, thuth, philmd, qemu-devel, michael.roth, steven.sistare,
leiyang, davydov-max, yc-core, raphael.s.norwitz
On Thu, Oct 16, 2025 at 08:57:18PM +0100, Daniel P. Berrangé wrote:
> Errm, machine types apply to devices, but this is about transferring
> backends which are outside the scope of machine types.
Ah.. I didn't notice that net backends are not inherited by default from
qdev, hence not applicable to machine type properties.
Is it possible we enable it somehow, so that backends can have compat
properties similarly to frontends?
If we go with a list of devices in the migration parameters, to me it'll
only be a way to workaround the missing of such capability of net backends.
Meanwhile, the admin will need to manage the list of devices even if the
admin doesn't really needed to, IMHO.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 20:26 ` Vladimir Sementsov-Ogievskiy
@ 2025-10-16 20:30 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-16 20:30 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Peter Xu, mst, jasowang, farosas, sw, eblake, armbru, thuth,
philmd, qemu-devel, michael.roth, leiyang, davydov-max, yc-core,
raphael.s.norwitz
On 16.10.25 23:26, Vladimir Sementsov-Ogievskiy wrote:
> On 16.10.25 12:23, Vladimir Sementsov-Ogievskiy wrote:
>> On 16.10.25 11:32, Daniel P. Berrangé wrote:
>>> On Thu, Oct 16, 2025 at 12:02:45AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>>>> On 15.10.25 23:07, Peter Xu wrote:
[..]
>>>> 3. Local migration, but we want to reconfigure some backend, or switch
>>>> to another backend. We disable "backend-transfer" for one device.
>>>
>>> This implies that you're changing 'backend-transfer' against the
>>> device at time of each migration.
>>>
>>> This takes us back to the situation we've had historically where the
>>> behaviour of migration depends on global properties the mgmt app has
>>> set prior to the 'migrate' command being run. We've just tried to get
>>> away from that model by passing everything as parameters to the
>>> migrate command, so I'm loathe to see us invent a new way to have
>>> global state properties changing migration behaviour.
>>>
>>> This 'backend-transfer' device property is not really a device property,
>>> it is an indirect parameter to the 'migrate' command.
>>>
>>> Ergo, if we need the ability to selectively migrate the backend state
>>> of individal devices, then instead of a property on the device, we
>>> should pass a list of device IDs as a parameter to the migrate
>>> command in QMP.
>>
>> Understand.
>>
>> So, it will look like
>>
>> # @backend-transfer: List of devices IDs or QOM paths, to enable
>> # backend-transfer for. In general that means that backend
>> # states and their file descriptors are passed to the destination
>> # in the migration channel (which must be a UNIX socket), and
>> # management tool doesn't have to configure new backends for
>> # target QEMU (like vhost-user server, or TAP device in the kernel).
>> # Default is no backend-transfer migration (Since 10.2)
>>
>
>
> RFC diff to these series, to switch the API to list of IDs:
>
[..]
> @@ -1193,6 +1198,11 @@ bool migrate_params_check(MigrationParameters *params, Error **errp)
> return false;
> }
>
> + if (params->has_backend_transfer &&
> + !migrate_backend_transfer_check_list(params->backend_transfer, errp)) {
> + return false;
> + }
This made me to move capabilities setup after device add in the test. Not a problem.
> +
> return true;
> }
>
[..]
> - vm.cmd("migrate-set-parameters", {
> - "backend-transfer": backend_transfer
> - })
> + ]
> + vm.cmd("migrate-set-capabilities", {"capabilities": capabilities})
> + if backend_transfer:
> + vm.cmd(
> + "migrate-set-parameters",
> + {"backend-transfer": ["/machine/peripheral/vnet.1/virtio-backend"]},
If write just "vnet.1" it doesn't work, of course. Is there some way get pointer to
proxy device from virtio-net.c? But maybe, it's OK as is.
> + )
>
> def setup_guest_network(self) -> None:
> exec_command_and_wait_for_pattern(self, "ip addr", "# ")
>
>
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 20:28 ` Peter Xu
@ 2025-10-17 6:51 ` Vladimir Sementsov-Ogievskiy
2025-10-17 15:55 ` Peter Xu
2025-10-17 8:10 ` Daniel P. Berrangé
1 sibling, 1 reply; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-17 6:51 UTC (permalink / raw)
To: Peter Xu, Daniel P. Berrangé
Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, raphael.s.norwitz
On 16.10.25 23:28, Peter Xu wrote:
> On Thu, Oct 16, 2025 at 08:57:18PM +0100, Daniel P. Berrangé wrote:
>> Errm, machine types apply to devices, but this is about transferring
>> backends which are outside the scope of machine types.
>
> Ah.. I didn't notice that net backends are not inherited by default from
> qdev, hence not applicable to machine type properties.
>
> Is it possible we enable it somehow, so that backends can have compat
> properties similarly to frontends?
But that would mean, that we can't reconfigure a backend during live migration.
In my understanding, machine type properties are visible to the guest,
and that's why we can't change them for running vm, even during live
migration.
Bringing here another type of properties, which we _can_ change for
running vm (even if changing is not very comfortable for admin), will
be like tying ourselves hands.
And yes, there is a way to change any properties by qom-set. But it
lays out of paradigm of machine types, and normally we can't change
most of properties in flight.
Or in other words: if we _can_ go on only with migration parameters,
that actually shows, that what we are talking about is definitely
property of migration, not property of device.
And final note: if we can use one mechanism instead of two mechanisms,
it makes the architecture twice simpler. Trying to go on with _only_
device properties would mean run a bench of qom-set commands before
every migration (as we have to distinguish local and remote migrations
anyway), that looks bad. On the other hand, go on with _only_ migration
parameter is feasible and looks better.
And very final note: making global parameter + per-device parameters,
actually, global parameter become a workaround to the fact that we
don't want run a bench of qom-set commands. So, global parameter is
an additional API to hide inconvenience of the main API.
>
> If we go with a list of devices in the migration parameters, to me it'll
> only be a way to workaround the missing of such capability of net backends.
> Meanwhile, the admin will need to manage the list of devices even if the
> admin doesn't really needed to, IMHO.
>
> Thanks,
>
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-16 20:28 ` Peter Xu
2025-10-17 6:51 ` Vladimir Sementsov-Ogievskiy
@ 2025-10-17 8:10 ` Daniel P. Berrangé
2025-10-17 8:26 ` Vladimir Sementsov-Ogievskiy
` (2 more replies)
1 sibling, 3 replies; 51+ messages in thread
From: Daniel P. Berrangé @ 2025-10-17 8:10 UTC (permalink / raw)
To: Peter Xu
Cc: Vladimir Sementsov-Ogievskiy, mst, jasowang, farosas, sw, eblake,
armbru, thuth, philmd, qemu-devel, michael.roth, steven.sistare,
leiyang, davydov-max, yc-core, raphael.s.norwitz
On Thu, Oct 16, 2025 at 04:28:10PM -0400, Peter Xu wrote:
> On Thu, Oct 16, 2025 at 08:57:18PM +0100, Daniel P. Berrangé wrote:
> > Errm, machine types apply to devices, but this is about transferring
> > backends which are outside the scope of machine types.
>
> Ah.. I didn't notice that net backends are not inherited by default from
> qdev, hence not applicable to machine type properties.
>
> Is it possible we enable it somehow, so that backends can have compat
> properties similarly to frontends?
That is a technical limitation, but the problem here is bigger than
just the lack of qdev. It is a conceptual one - where a device is
implemented, its behaviour is determined exclusively by the QEMU
code. There are some rare exceptions, like host PCI device assignment
where functionality is partly in the host hardware, or external
device backends where impl is offloaded to an external process, but
most pure QEMU impls are able to be made always migratable and compat
can be easily ensured long term via machine types props.
With backends, alot of behaviour is offloaded to either the host
OS, or to external libraries or services. Certain narrow configs
may be able to transfer state, but there will always be configs
were state transfer is impossible. There can be no coarse rule
that a backend is migratable or not - it will usually be highly
dependent on the particular configuration choices of the backend
in use. Machine types props can't magically make all backend
config scenarios migratable. We need to be able to interrogate
backends at the time migration is required.
> If we go with a list of devices in the migration parameters, to me it'll
> only be a way to workaround the missing of such capability of net backends.
> Meanwhile, the admin will need to manage the list of devices even if the
> admin doesn't really needed to, IMHO.
We shouldn't need to list devices in every scenario. We need to focus on
the internal API design. We need to have suitable APIs exposed by backends
to allow us to query migratability and process vmstate a mere property
'backend-transfer' is insufficient, whether set by QEMU code, or set by
the mgmt app.
If we have proper APIs each device should be able to query whether its
backend can be transferred, and so "do the right thing" if backend
transfer is requested by migration. The ability to list devices in the
migrate command is only needed to be able to exclude some backends if
the purpose of migration is to change a backend
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-17 8:10 ` Daniel P. Berrangé
@ 2025-10-17 8:26 ` Vladimir Sementsov-Ogievskiy
2025-10-17 8:50 ` Daniel P. Berrangé
2025-10-17 8:39 ` Vladimir Sementsov-Ogievskiy
2025-10-17 16:08 ` Peter Xu
2 siblings, 1 reply; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-17 8:26 UTC (permalink / raw)
To: Daniel P. Berrangé, Peter Xu
Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, raphael.s.norwitz
On 17.10.25 11:10, Daniel P. Berrangé wrote:
>> Meanwhile, the admin will need to manage the list of devices even if the
>> admin doesn't really needed to, IMHO.
> We shouldn't need to list devices in every scenario.
Do you mean, we may make union,
backend-transfer = true | false | [list of IDs]
Where true means, enable backend-transfer for all supporting devices?
So that normally, we'll not list all devices, but just set it to true?
But this way, migration will fail, if target version doesn't support
backend-transfer for some of used devices, or support for some
another, where source lack the support. So that's a way to create a
situation, where two QEMUs, with same device options, same machine
types, same configurations and same migration parameters / capabilities
define incompatible migration states..
> We need to focus on
> the internal API design. We need to have suitable APIs exposed by backends
> to allow us to query migratability and process vmstate a mere property
> 'backend-transfer' is insufficient, whether set by QEMU code, or set by
> the mgmt app.
>
> If we have proper APIs each device should be able to query whether its
> backend can be transferred, and so "do the right thing" if backend
> transfer is requested by migration. The ability to list devices in the
> migrate command is only needed to be able to exclude some backends if
> the purpose of migration is to change a backend
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-17 8:10 ` Daniel P. Berrangé
2025-10-17 8:26 ` Vladimir Sementsov-Ogievskiy
@ 2025-10-17 8:39 ` Vladimir Sementsov-Ogievskiy
2025-10-17 16:08 ` Peter Xu
2 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-17 8:39 UTC (permalink / raw)
To: Daniel P. Berrangé, Peter Xu
Cc: mst, jasowang, farosas, sw, eblake, armbru, thuth, philmd,
qemu-devel, michael.roth, steven.sistare, leiyang, davydov-max,
yc-core, raphael.s.norwitz
On 17.10.25 11:10, Daniel P. Berrangé wrote:
>> If we go with a list of devices in the migration parameters, to me it'll
>> only be a way to workaround the missing of such capability of net backends.
>> Meanwhile, the admin will need to manage the list of devices even if the
>> admin doesn't really needed to, IMHO.
> We shouldn't need to list devices in every scenario. We need to focus on
> the internal API design. We need to have suitable APIs exposed by backends
> to allow us to query migratability and process vmstate a mere property
> 'backend-transfer' is insufficient, whether set by QEMU code, or set by
> the mgmt app.
I now imagine the following:
I already need an additional .pre_incoming migration handler for the feature,
see patch
[PATCH v8 14/19] migration: introduce .pre_incoming() vmsd handler
.
I can add a boolean backend_transfer parameter to that handler, so that it
informs the device, that it should get the backend state from the migration
stream. And that's a good point to fail, if device doesn't support backend
transfer in current configuration.
If so, it seems logical to add symmetrical .pre_outgoing() vmsd handler,
with same backend_transfer parameter, to inform source devices (or get errors
from them).
Or, otherwise, make a separate VMSD handler .supports_backend_transfer(),
which should be called at start of incoming and outgoing migrations to
check the specified list of IDs, as well as we can also call it on
migrate-set-parameters, to get an earlier failure. And keep the devices
to call some migrate_backend_transfer(dev), to understand, should they
do backend-transfer or not (like in a diff, which I've sent yesterday
in this thread).
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-17 8:26 ` Vladimir Sementsov-Ogievskiy
@ 2025-10-17 8:50 ` Daniel P. Berrangé
2025-10-17 9:18 ` Vladimir Sementsov-Ogievskiy
0 siblings, 1 reply; 51+ messages in thread
From: Daniel P. Berrangé @ 2025-10-17 8:50 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: Peter Xu, mst, jasowang, farosas, sw, eblake, armbru, thuth,
philmd, qemu-devel, michael.roth, steven.sistare, leiyang,
davydov-max, yc-core, raphael.s.norwitz
On Fri, Oct 17, 2025 at 11:26:59AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 17.10.25 11:10, Daniel P. Berrangé wrote:
> > > Meanwhile, the admin will need to manage the list of devices even if the
> > > admin doesn't really needed to, IMHO.
> > We shouldn't need to list devices in every scenario.
>
> Do you mean, we may make union,
>
> backend-transfer = true | false | [list of IDs]
>
> Where true means, enable backend-transfer for all supporting devices?
> So that normally, we'll not list all devices, but just set it to true?
Well I was thinking separate parameters
backend-transfer: bool
backend-transfer-devices: [str] (optional list of IDs)
but it amounts to the same thing
> But this way, migration will fail, if target version doesn't support
> backend-transfer for some of used devices, or support for some
> another, where source lack the support. So that's a way to create a
> situation, where two QEMUs, with same device options, same machine
> types, same configurations and same migration parameters / capabilities
> define incompatible migration states..
It is worse - the backend on both sides may support transfer,
but may none the less be incompatible due to changed configuration,
so this needs mgmt app input too.
The challenge we have is that whether or not a backend supports
transfer requires fairly detailed know of QEMU and the specific
configuration of the backend. It is pretty undesirable for mgmt
apps to have to that knowledge, as the matrix of possibilities
is quite large and liable to change over time.
If we consider 'backend transfer' to be a performance optimization,
then really we want QEMU to "do the right thing" as much as is
possible.
Source and dst QEMUs don't have a bi-directional channel though,
so they can't negotiate the common subset of backends they both
support - it'll need help from the mgmt app.
One possibility is a new QMP command "query-migratable-backends"
which lists all device IDs, whose current backend configuration
is reporting the ability to transfer state. The mgmt app could
run that on both sides of the migration, take the intersection
of the two lists, and then further subtract any devices where
it has delibrately changed the backend configuration on the dst.
If we had that, then we could always pass the ID list to the
migrate command, while also avoiding hardcoding knowledge of
QEMU backend impl details - it would largely "just work".
> > We need to focus on
> > the internal API design. We need to have suitable APIs exposed by backends
> > to allow us to query migratability and process vmstate a mere property
> > 'backend-transfer' is insufficient, whether set by QEMU code, or set by
> > the mgmt app.
> >
> > If we have proper APIs each device should be able to query whether its
> > backend can be transferred, and so "do the right thing" if backend
> > transfer is requested by migration. The ability to list devices in the
> > migrate command is only needed to be able to exclude some backends if
> > the purpose of migration is to change a backend
With regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-17 8:50 ` Daniel P. Berrangé
@ 2025-10-17 9:18 ` Vladimir Sementsov-Ogievskiy
0 siblings, 0 replies; 51+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2025-10-17 9:18 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Peter Xu, mst, jasowang, farosas, sw, eblake, armbru, thuth,
philmd, qemu-devel, michael.roth, steven.sistare, leiyang,
davydov-max, yc-core, raphael.s.norwitz
On 17.10.25 11:50, Daniel P. Berrangé wrote:
> On Fri, Oct 17, 2025 at 11:26:59AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> On 17.10.25 11:10, Daniel P. Berrangé wrote:
>>>> Meanwhile, the admin will need to manage the list of devices even if the
>>>> admin doesn't really needed to, IMHO.
>>> We shouldn't need to list devices in every scenario.
>>
>> Do you mean, we may make union,
>>
>> backend-transfer = true | false | [list of IDs]
>>
>> Where true means, enable backend-transfer for all supporting devices?
>> So that normally, we'll not list all devices, but just set it to true?
>
> Well I was thinking separate parameters
>
> backend-transfer: bool
> backend-transfer-devices: [str] (optional list of IDs)
>
> but it amounts to the same thing
>
>> But this way, migration will fail, if target version doesn't support
>> backend-transfer for some of used devices, or support for some
>> another, where source lack the support. So that's a way to create a
>> situation, where two QEMUs, with same device options, same machine
>> types, same configurations and same migration parameters / capabilities
>> define incompatible migration states..
>
> It is worse - the backend on both sides may support transfer,
> but may none the less be incompatible due to changed configuration,
> so this needs mgmt app input too.
>
> The challenge we have is that whether or not a backend supports
> transfer requires fairly detailed know of QEMU and the specific
> configuration of the backend. It is pretty undesirable for mgmt
> apps to have to that knowledge, as the matrix of possibilities
> is quite large and liable to change over time.
>
> If we consider 'backend transfer' to be a performance optimization,
> then really we want QEMU to "do the right thing" as much as is
> possible.
>
> Source and dst QEMUs don't have a bi-directional channel though,
> so they can't negotiate the common subset of backends they both
> support - it'll need help from the mgmt app.
As I heard from Peter, there a future plans to create such channel
https://wiki.qemu.org/ToDo/LiveMigration#Migration_handshake
>
> One possibility is a new QMP command "query-migratable-backends"
> which lists all device IDs, whose current backend configuration
> is reporting the ability to transfer state. The mgmt app could
> run that on both sides of the migration, take the intersection
> of the two lists, and then further subtract any devices where
> it has delibrately changed the backend configuration on the dst.
>
> If we had that, then we could always pass the ID list to the
> migrate command, while also avoiding hardcoding knowledge of
> QEMU backend impl details - it would largely "just work".
Yes "query + get intersection + set the list" works good for me.
That's enough abstract, the management app should not even care
what these IDs are.
And if migration-handshake realized, that (as many other
paraameters) may be simplified. We may finally have
backend-transfer = "off" | "auto" | [list of IDs]
, where "auto" means exactly negotiate with target the maximal set
of devices, for which we can do backend-transfer.
>
>>> We need to focus on
>>> the internal API design. We need to have suitable APIs exposed by backends
>>> to allow us to query migratability and process vmstate a mere property
>>> 'backend-transfer' is insufficient, whether set by QEMU code, or set by
>>> the mgmt app.
>>>
>>> If we have proper APIs each device should be able to query whether its
>>> backend can be transferred, and so "do the right thing" if backend
>>> transfer is requested by migration. The ability to list devices in the
>>> migrate command is only needed to be able to exclude some backends if
>>> the purpose of migration is to change a backend
>
> With regards,
> Daniel
--
Best regards,
Vladimir
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-17 6:51 ` Vladimir Sementsov-Ogievskiy
@ 2025-10-17 15:55 ` Peter Xu
0 siblings, 0 replies; 51+ messages in thread
From: Peter Xu @ 2025-10-17 15:55 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: Daniel P. Berrangé, mst, jasowang, farosas, sw, eblake,
armbru, thuth, philmd, qemu-devel, michael.roth, steven.sistare,
leiyang, davydov-max, yc-core, raphael.s.norwitz
On Fri, Oct 17, 2025 at 09:51:26AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 16.10.25 23:28, Peter Xu wrote:
> > On Thu, Oct 16, 2025 at 08:57:18PM +0100, Daniel P. Berrangé wrote:
> > > Errm, machine types apply to devices, but this is about transferring
> > > backends which are outside the scope of machine types.
> >
> > Ah.. I didn't notice that net backends are not inherited by default from
> > qdev, hence not applicable to machine type properties.
> >
> > Is it possible we enable it somehow, so that backends can have compat
> > properties similarly to frontends?
>
> But that would mean, that we can't reconfigure a backend during live migration.
>
> In my understanding, machine type properties are visible to the guest,
> and that's why we can't change them for running vm, even during live
> migration.
IIUC machine type properties may or may not be visible to the guest. It
should depend on whether it is relevant to a guest-visible behavior. Here
a flag showing "whether TAP, as a backend, can migrate" shouldn't be
exposed to guest.
I was indeed expecting that one will need to qom-set it for each device if
you want to get rid of versioned machine types. It's not ideal interfacing
as what Dan was looking for, but it should still work so far, and I think
it might still be fair if it's only needed without machine type versionings.
>
> Bringing here another type of properties, which we _can_ change for
> running vm (even if changing is not very comfortable for admin), will
> be like tying ourselves hands.
>
> And yes, there is a way to change any properties by qom-set. But it
> lays out of paradigm of machine types, and normally we can't change
> most of properties in flight.
>
>
> Or in other words: if we _can_ go on only with migration parameters,
> that actually shows, that what we are talking about is definitely
> property of migration, not property of device.
>
>
> And final note: if we can use one mechanism instead of two mechanisms,
> it makes the architecture twice simpler. Trying to go on with _only_
> device properties would mean run a bench of qom-set commands before
> every migration (as we have to distinguish local and remote migrations
> anyway), that looks bad. On the other hand, go on with _only_ migration
> parameter is feasible and looks better.
>
>
> And very final note: making global parameter + per-device parameters,
> actually, global parameter become a workaround to the fact that we
> don't want run a bench of qom-set commands. So, global parameter is
> an additional API to hide inconvenience of the main API.
IMHO it's not a workaround. To me, it's a better way of abstraction,
because the migration side provides the capability of passing FDs, and
whatever is generic about that should be attached to the global knob.
Migration shouldn't care about behavior or attributes of a specific device.
Listing the devices in any way in migration's QAPI is a workaround instead.
But I agree I do not know whether it's easy to have net backends support
machine types properties. I think it still makes sense logically that a
net backend is a TYPE_DEVICE, even if it's a backend device which is not
directly visible to the guest.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter
2025-10-17 8:10 ` Daniel P. Berrangé
2025-10-17 8:26 ` Vladimir Sementsov-Ogievskiy
2025-10-17 8:39 ` Vladimir Sementsov-Ogievskiy
@ 2025-10-17 16:08 ` Peter Xu
2 siblings, 0 replies; 51+ messages in thread
From: Peter Xu @ 2025-10-17 16:08 UTC (permalink / raw)
To: Daniel P. Berrangé
Cc: Vladimir Sementsov-Ogievskiy, mst, jasowang, farosas, sw, eblake,
armbru, thuth, philmd, qemu-devel, michael.roth, steven.sistare,
leiyang, davydov-max, yc-core, raphael.s.norwitz
On Fri, Oct 17, 2025 at 09:10:38AM +0100, Daniel P. Berrangé wrote:
> On Thu, Oct 16, 2025 at 04:28:10PM -0400, Peter Xu wrote:
> > On Thu, Oct 16, 2025 at 08:57:18PM +0100, Daniel P. Berrangé wrote:
> > > Errm, machine types apply to devices, but this is about transferring
> > > backends which are outside the scope of machine types.
> >
> > Ah.. I didn't notice that net backends are not inherited by default from
> > qdev, hence not applicable to machine type properties.
> >
> > Is it possible we enable it somehow, so that backends can have compat
> > properties similarly to frontends?
>
> That is a technical limitation, but the problem here is bigger than
> just the lack of qdev. It is a conceptual one - where a device is
> implemented, its behaviour is determined exclusively by the QEMU
> code. There are some rare exceptions, like host PCI device assignment
> where functionality is partly in the host hardware, or external
> device backends where impl is offloaded to an external process, but
> most pure QEMU impls are able to be made always migratable and compat
> can be easily ensured long term via machine types props.
>
> With backends, alot of behaviour is offloaded to either the host
> OS, or to external libraries or services. Certain narrow configs
> may be able to transfer state, but there will always be configs
> were state transfer is impossible. There can be no coarse rule
> that a backend is migratable or not - it will usually be highly
> dependent on the particular configuration choices of the backend
> in use. Machine types props can't magically make all backend
> config scenarios migratable. We need to be able to interrogate
> backends at the time migration is required.
I believe we have similar things already, like USO, which relies on the
kernel feature set that QEMU runs on. What we do right now, afaiu, is we
make it a per-device property ON/OFF. Then when unknown remote information
is required, we make it ON/OFF/AUTO. When it's AUTO, it may prefer ON and
probe the kernel, dynamically decide the value on realize.
I didn't check the code if it's explicitly done like that, but I think
that's doable at least when a backend relies on such remote information.
>
> > If we go with a list of devices in the migration parameters, to me it'll
> > only be a way to workaround the missing of such capability of net backends.
> > Meanwhile, the admin will need to manage the list of devices even if the
> > admin doesn't really needed to, IMHO.
>
> We shouldn't need to list devices in every scenario. We need to focus on
> the internal API design. We need to have suitable APIs exposed by backends
> to allow us to query migratability and process vmstate a mere property
> 'backend-transfer' is insufficient, whether set by QEMU code, or set by
> the mgmt app.
>
> If we have proper APIs each device should be able to query whether its
> backend can be transferred, and so "do the right thing" if backend
> transfer is requested by migration. The ability to list devices in the
> migrate command is only needed to be able to exclude some backends if
> the purpose of migration is to change a backend
IIUC, it is a proposal of using exclude-list, which should in most cases be
empty.
Yes, I agree it's at least better than query all the devices and having
mgmt specify each backend to enable backend-transfer.
However IIUC it also means the query API will be internal, so that
migration will need to be able to query that from device.
Then we have similar issue on what happens if we migrate from a new QEMU to
an old QEMU, that new QEMU (when migration module queries TAP) reports
per-device ON, however it won't actually work because dest QEMU is OFF.
IOW, we're still missing the functionality that we leverage from machine
type properties..
Or if we make the query to be visible to QMP / mgmt, then it'll at least
need to be a include-list, not exclude-list.
Then, we're literally bypassing the machine type versioning mechanism,
offloading all these to mgmt.
It should work, which I agree. But it also means we're reinventing the
wheel of what machine type properties were designed for... because if we
expose all these caps on all devices (as long as mutable after device
realize), we do not need machine type properties anymore. They're
fundamentally solving the same problem, IMHO, on providing a working value
for migration no matter what the dest QEMU binary is.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [PATCH v8 00/19] virtio-net: live-TAP local migration
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
` (18 preceding siblings ...)
2025-10-15 13:21 ` [PATCH v8 19/19] tests/functional: add test_x86_64_tap_migration Vladimir Sementsov-Ogievskiy
@ 2025-10-18 15:38 ` Lei Yang
19 siblings, 0 replies; 51+ messages in thread
From: Lei Yang @ 2025-10-18 15:38 UTC (permalink / raw)
To: Vladimir Sementsov-Ogievskiy
Cc: mst, jasowang, peterx, farosas, sw, eblake, armbru, thuth, philmd,
berrange, qemu-devel, michael.roth, steven.sistare, davydov-max,
yc-core, raphael.s.norwitz
Tested this series of patches with virtio-net regression tests,
everything works fine.
Tested-by: Lei Yang <leiyang@redhat.com>
On Wed, Oct 15, 2025 at 9:21 PM Vladimir Sementsov-Ogievskiy
<vsementsov@yandex-team.ru> wrote:
>
> Hi all!
>
> Here is a new migration parameter backend-transfer, which, being
> assisted by new device property backend-transfer, allows to
> enable local migration of TAP virtio-net backend, including its
> properties and open fds.
>
> With this new option, management software doesn't need to
> initialize new TAP and do a switch to it. Nothing should be
> done around virtio-net in local migration: it just migrates
> and continues to use same TAP device. So we avoid extra logic
> in management software, extra allocations in kernel (for new TAP),
> and corresponding extra delay in migration downtime.
>
> v8:
> 14: add a-b by Peter
> 16: rework to one boolean parameter
> 17: rework to use per-device property
> 19: update to use new API
>
> Vladimir Sementsov-Ogievskiy (19):
> net/tap: net_init_tap_one(): drop extra error propagation
> net/tap: net_init_tap_one(): move parameter checking earlier
> net/tap: rework net_tap_init()
> net/tap: pass NULL to net_init_tap_one() in cases when scripts are
> NULL
> net/tap: rework scripts handling
> net/tap: setup exit notifier only when needed
> net/tap: split net_tap_fd_init()
> net/tap: tap_set_sndbuf(): add return value
> net/tap: rework tap_set_sndbuf()
> net/tap: rework sndbuf handling
> net/tap: introduce net_tap_setup()
> net/tap: move vhost fd initialization to net_tap_new()
> net/tap: finalize net_tap_set_fd() logic
> migration: introduce .pre_incoming() vmsd handler
> net/tap: postpone tap setup to pre-incoming
> qapi: introduce backend-transfer migration parameter
> virtio-net: support backend-transfer migration for virtio-net/tap
> tests/functional: add skipWithoutSudo() decorator
> tests/functional: add test_x86_64_tap_migration
>
> hw/core/machine.c | 1 +
> hw/net/virtio-net.c | 151 ++++++-
> include/hw/virtio/virtio-net.h | 1 +
> include/migration/vmstate.h | 1 +
> include/net/tap.h | 5 +
> migration/migration.c | 4 +
> migration/options.c | 18 +
> migration/options.h | 2 +
> migration/savevm.c | 15 +
> migration/savevm.h | 1 +
> net/tap-bsd.c | 3 +-
> net/tap-linux.c | 19 +-
> net/tap-solaris.c | 3 +-
> net/tap-stub.c | 3 +-
> net/tap-win32.c | 11 +
> net/tap.c | 425 +++++++++++++-----
> net/tap_int.h | 3 +-
> qapi/migration.json | 38 +-
> tests/functional/qemu_test/decorators.py | 16 +
> tests/functional/test_x86_64_tap_migration.py | 395 ++++++++++++++++
> 20 files changed, 984 insertions(+), 131 deletions(-)
> create mode 100644 tests/functional/test_x86_64_tap_migration.py
>
> --
> 2.48.1
>
^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2025-10-18 15:39 UTC | newest]
Thread overview: 51+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-15 13:21 [PATCH v8 00/19] virtio-net: live-TAP local migration Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 01/19] net/tap: net_init_tap_one(): drop extra error propagation Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 02/19] net/tap: net_init_tap_one(): move parameter checking earlier Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 03/19] net/tap: rework net_tap_init() Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 04/19] net/tap: pass NULL to net_init_tap_one() in cases when scripts are NULL Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 05/19] net/tap: rework scripts handling Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 06/19] net/tap: setup exit notifier only when needed Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 07/19] net/tap: split net_tap_fd_init() Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 08/19] net/tap: tap_set_sndbuf(): add return value Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 09/19] net/tap: rework tap_set_sndbuf() Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 10/19] net/tap: rework sndbuf handling Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 11/19] net/tap: introduce net_tap_setup() Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 12/19] net/tap: move vhost fd initialization to net_tap_new() Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 13/19] net/tap: finalize net_tap_set_fd() logic Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 14/19] migration: introduce .pre_incoming() vmsd handler Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 15/19] net/tap: postpone tap setup to pre-incoming Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 16/19] qapi: introduce backend-transfer migration parameter Vladimir Sementsov-Ogievskiy
2025-10-15 18:19 ` Peter Xu
2025-10-15 19:02 ` Vladimir Sementsov-Ogievskiy
2025-10-15 20:07 ` Peter Xu
2025-10-15 21:02 ` Vladimir Sementsov-Ogievskiy
2025-10-16 8:32 ` Daniel P. Berrangé
2025-10-16 9:23 ` Vladimir Sementsov-Ogievskiy
2025-10-16 10:38 ` Vladimir Sementsov-Ogievskiy
2025-10-16 10:55 ` Daniel P. Berrangé
2025-10-16 18:40 ` Peter Xu
2025-10-16 18:51 ` Daniel P. Berrangé
2025-10-16 19:19 ` Daniel P. Berrangé
2025-10-16 19:39 ` Peter Xu
2025-10-16 20:00 ` Daniel P. Berrangé
2025-10-16 19:29 ` Peter Xu
2025-10-16 19:57 ` Daniel P. Berrangé
2025-10-16 20:28 ` Peter Xu
2025-10-17 6:51 ` Vladimir Sementsov-Ogievskiy
2025-10-17 15:55 ` Peter Xu
2025-10-17 8:10 ` Daniel P. Berrangé
2025-10-17 8:26 ` Vladimir Sementsov-Ogievskiy
2025-10-17 8:50 ` Daniel P. Berrangé
2025-10-17 9:18 ` Vladimir Sementsov-Ogievskiy
2025-10-17 8:39 ` Vladimir Sementsov-Ogievskiy
2025-10-17 16:08 ` Peter Xu
2025-10-16 20:26 ` Vladimir Sementsov-Ogievskiy
2025-10-16 20:30 ` Vladimir Sementsov-Ogievskiy
2025-10-16 10:56 ` Markus Armbruster
2025-10-16 12:07 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 17/19] virtio-net: support backend-transfer migration for virtio-net/tap Vladimir Sementsov-Ogievskiy
2025-10-16 8:23 ` Daniel P. Berrangé
2025-10-16 9:15 ` Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 18/19] tests/functional: add skipWithoutSudo() decorator Vladimir Sementsov-Ogievskiy
2025-10-15 13:21 ` [PATCH v8 19/19] tests/functional: add test_x86_64_tap_migration Vladimir Sementsov-Ogievskiy
2025-10-18 15:38 ` [PATCH v8 00/19] virtio-net: live-TAP local migration Lei Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).