All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/8] Live update: tap and vhost
@ 2026-01-28 20:39 Ben Chaney
  2026-01-28 20:39 ` [PATCH v4 1/8] migration: stop vm earlier for cpr Ben Chaney
                   ` (8 more replies)
  0 siblings, 9 replies; 22+ messages in thread
From: Ben Chaney @ 2026-01-28 20:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Stefano Garzarella,
	Jason Wang, Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Mark Kanda, Joshua Hunt, Max Tottenham,
	Ben Chaney, Steve Sistare, Vladimir Sementsov-Ogievskiy

Tap and vhost devices can be preserved during cpr-transfer using
traditional live migration methods, wherein the management layer
creates new interfaces for the target and fiddles with 'ip link'
to deactivate the old interface and activate the new.

However, CPR can simply send the file descriptors to new QEMU,
with no special management actions required.  The user enables
this behavior by specifing '-netdev tap,cpr=on'.  The default
is cpr=off.

Signed-off-by: Ben Chaney <bchaney@akamai.com>
---
Changes in v4:
- change the name of cpr_get_fd_param as it is no longer used
  exclusively during cpr transfer
- clarify documentation
- Do not require fd=-1 if fds will be provided by cpr
- Do not interleave tap and vhost fds
- Do not check cpr state in qio_channel_handle_fds
- Link to v3: https://lore.kernel.org/qemu-devel/20251203-cpr-tap-v3-0-3c12e0a61f8e@akamai.com

---
Ben Chaney (2):
      tap: cpr support
      tap: cpr fixes

Steve Sistare (6):
      migration: stop vm earlier for cpr
      migration: cpr setup notifier
      vhost: reset vhost devices for cpr
      cpr: delete all fds
      tap: common return label
      tap: postload fix for cpr

 hw/net/virtio-net.c               |  26 +++++++
 hw/vfio/device.c                  |   2 +-
 hw/virtio/vhost-backend.c         |   6 ++
 hw/virtio/vhost.c                 |  32 +++++++++
 include/hw/virtio/vhost-backend.h |   1 +
 include/hw/virtio/vhost.h         |   1 +
 include/migration/cpr.h           |   5 +-
 include/net/tap.h                 |   1 +
 migration/cpr.c                   |  32 ++++++---
 migration/migration.c             |  69 ++++++++++++++----
 net/tap-win32.c                   |   5 ++
 net/tap.c                         | 148 +++++++++++++++++++++++++++++---------
 qapi/net.json                     |   6 +-
 stubs/cpr.c                       |   8 +++
 stubs/meson.build                 |   1 +
 15 files changed, 283 insertions(+), 60 deletions(-)
---
base-commit: 2339d0a1cfac6ecc667e6e062a593865c1541c35
change-id: 20251203-cpr-tap-04fd811ace03

Best regards,
-- 
Ben Chaney <bchaney@akamai.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 1/8] migration: stop vm earlier for cpr
  2026-01-28 20:39 [PATCH v4 0/8] Live update: tap and vhost Ben Chaney
@ 2026-01-28 20:39 ` Ben Chaney
  2026-02-02 13:31   ` Fabiano Rosas
  2026-01-28 20:39 ` [PATCH v4 2/8] migration: cpr setup notifier Ben Chaney
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Ben Chaney @ 2026-01-28 20:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Stefano Garzarella,
	Jason Wang, Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Mark Kanda, Joshua Hunt, Max Tottenham,
	Ben Chaney, Steve Sistare

From: Steve Sistare <steven.sistare@oracle.com>

Stop the vm earlier for cpr, before cpr_save_state which causes new QEMU
to proceed and initialize devices.  We must guarantee devices are stopped
in old QEMU, and all source notifiers called, before they are initialized
in new QEMU.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Ben Chaney <bchaney@akamai.com>
---
 migration/migration.c | 57 +++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 48 insertions(+), 9 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 1bcde301f7..f36e59d9e8 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1654,6 +1654,7 @@ void migration_cancel(void)
                           MIGRATION_STATUS_CANCELLED);
         cpr_state_close();
         migrate_hup_delete(s);
+        vm_resume(s->vm_old_state);
     }
 }
 
@@ -2212,6 +2213,7 @@ void qmp_migrate(const char *uri, bool has_channels,
     MigrationAddress *addr = NULL;
     MigrationChannel *channelv[MIGRATION_CHANNEL_TYPE__MAX] = { NULL };
     MigrationChannel *cpr_channel = NULL;
+    bool stopped = false;
 
     /*
      * Having preliminary checks for uri and channel
@@ -2264,6 +2266,46 @@ void qmp_migrate(const char *uri, bool has_channels,
         return;
     }
 
+    /*
+     * CPR-transfer  ordering:
+     *
+     *   SOURCE                              TARGET
+     *   ------                              ------
+     *                                       cpr_state_load() blocks
+     *   |                                        |
+     *   |  1. migration_stop_vm()                |
+     *   |     VM stopped, devices quiesced       |
+     *   |                                        | Waiting for
+     *   |  2. notifiers (PRECOPY_SETUP)          | FDs from source
+     *   |     vhost_reset_owner() releases       |
+     *   |     device ownership                   |
+     *   |                                        |
+     *   |  3. cpr_state_save() ---- FDs -------> |
+     *   |                                        |
+     *   v                                        v
+     *   postmigrate                         Device init begins
+     *                                       - cpr_find_fd()
+     *                                       - vhost_dev_init()
+     *                                       - VHOST_SET_OWNER
+     *
+     * Step 3 is the synchronization/cut-over point. Target proceeds immediately
+     * upon receiving FDs, so steps 1-2 must complete otherwise:
+     * - Target's VHOST_SET_OWNER fails with -EBUSY (source still owns)
+     * - Race between source I/O and target device init
+     *
+     *  We stop the VM early (before FD transfer) to prevent this race.
+     *  Unlike regular migration, CPR-transfer passes memory via FD (memfd)
+     *  rather than copying RAM, so early VM stop should have minimal downtime.
+     */
+    if (migrate_mode_is_cpr(s)) {
+        int ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
+        if (ret < 0) {
+            error_setg(&local_err, "migration_stop_vm failed, error %d", -ret);
+            goto out;
+        }
+        stopped = true;
+    }
+
     if (!cpr_state_save(cpr_channel, &local_err)) {
         goto out;
     }
@@ -2290,6 +2332,9 @@ out:
     if (local_err) {
         migration_connect_error_propagate(s, error_copy(local_err));
         error_propagate(errp, local_err);
+        if (stopped) {
+            vm_resume(s->vm_old_state);
+        }
     }
 }
 
@@ -2334,6 +2379,9 @@ static void qmp_migrate_finish(MigrationAddress *addr, bool resume_requested,
         }
         migration_connect_error_propagate(s, error_copy(local_err));
         error_propagate(errp, local_err);
+        if (migrate_mode_is_cpr(s)) {
+            vm_resume(s->vm_old_state);
+        }
         return;
     }
 }
@@ -4017,7 +4065,6 @@ void migration_connect(MigrationState *s, Error *error_in)
     Error *local_err = NULL;
     uint64_t rate_limit;
     bool resume = (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP);
-    int ret;
 
     /*
      * If there's a previous error, free it and prepare for another one.
@@ -4088,14 +4135,6 @@ void migration_connect(MigrationState *s, Error *error_in)
         return;
     }
 
-    if (migrate_mode_is_cpr(s)) {
-        ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
-        if (ret < 0) {
-            error_setg(&local_err, "migration_stop_vm failed, error %d", -ret);
-            goto fail;
-        }
-    }
-
     /*
      * Take a refcount to make sure the migration object won't get freed by
      * the main thread already in migration_shutdown().

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 2/8] migration: cpr setup notifier
  2026-01-28 20:39 [PATCH v4 0/8] Live update: tap and vhost Ben Chaney
  2026-01-28 20:39 ` [PATCH v4 1/8] migration: stop vm earlier for cpr Ben Chaney
@ 2026-01-28 20:39 ` Ben Chaney
  2026-02-02 14:01   ` Fabiano Rosas
  2026-01-28 20:39 ` [PATCH v4 3/8] vhost: reset vhost devices for cpr Ben Chaney
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Ben Chaney @ 2026-01-28 20:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Stefano Garzarella,
	Jason Wang, Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Mark Kanda, Joshua Hunt, Max Tottenham,
	Ben Chaney, Steve Sistare

From: Steve Sistare <steven.sistare@oracle.com>

Call MIG_EVENT_PRECOPY_SETUP earlier, before CPR starts.  An early notifier
is needed for resetting vhost devices, as explained in the next patch.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Ben Chaney <bchaney@akamai.com>
---
 migration/migration.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index f36e59d9e8..191a34f667 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2306,7 +2306,14 @@ void qmp_migrate(const char *uri, bool has_channels,
         stopped = true;
     }
 
+    /* Notify before starting migration thread, and before starting cpr */
+    if (!resume_requested &&
+        migration_call_notifiers(s, MIG_EVENT_PRECOPY_SETUP, &local_err)) {
+        goto out;
+    }
+
     if (!cpr_state_save(cpr_channel, &local_err)) {
+        migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL);
         goto out;
     }
 
@@ -4097,11 +4104,6 @@ void migration_connect(MigrationState *s, Error *error_in)
     } else {
         /* This is a fresh new migration */
         rate_limit = migrate_max_bandwidth();
-
-        /* Notify before starting migration thread */
-        if (migration_call_notifiers(s, MIG_EVENT_PRECOPY_SETUP, &local_err)) {
-            goto fail;
-        }
     }
 
     migration_rate_set(rate_limit);

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 3/8] vhost: reset vhost devices for cpr
  2026-01-28 20:39 [PATCH v4 0/8] Live update: tap and vhost Ben Chaney
  2026-01-28 20:39 ` [PATCH v4 1/8] migration: stop vm earlier for cpr Ben Chaney
  2026-01-28 20:39 ` [PATCH v4 2/8] migration: cpr setup notifier Ben Chaney
@ 2026-01-28 20:39 ` Ben Chaney
  2026-01-28 20:39 ` [PATCH v4 4/8] cpr: delete all fds Ben Chaney
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Ben Chaney @ 2026-01-28 20:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Stefano Garzarella,
	Jason Wang, Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Mark Kanda, Joshua Hunt, Max Tottenham,
	Ben Chaney, Steve Sistare

From: Steve Sistare <steven.sistare@oracle.com>

When preserving a vhost fd using CPR, call VHOST_RESET_OWNER prior to CPR
in old QEMU.  Otherwise, new QEMU will fail when it calls VHOST_SET_OWNER
during vhost_dev_init.

Signed-off-by: Mark Kanda <mark.kanda@oracle.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Ben Chaney <bchaney@akamai.com>
---
 hw/virtio/vhost-backend.c         |  6 ++++++
 hw/virtio/vhost.c                 | 32 ++++++++++++++++++++++++++++++++
 include/hw/virtio/vhost-backend.h |  1 +
 include/hw/virtio/vhost.h         |  1 +
 4 files changed, 40 insertions(+)

diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 4367db0d95..1447d12963 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -261,6 +261,11 @@ static int vhost_kernel_set_owner(struct vhost_dev *dev)
     return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
 }
 
+static int vhost_kernel_reset_owner(struct vhost_dev *dev)
+{
+    return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
+}
+
 static int vhost_kernel_get_vq_index(struct vhost_dev *dev, int idx)
 {
     assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
@@ -385,6 +390,7 @@ const VhostOps kernel_ops = {
         .vhost_get_features_ex = vhost_kernel_get_features,
         .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
         .vhost_set_owner = vhost_kernel_set_owner,
+        .vhost_reset_owner = vhost_kernel_reset_owner,
         .vhost_get_vq_index = vhost_kernel_get_vq_index,
         .vhost_vsock_set_guest_cid = vhost_kernel_vsock_set_guest_cid,
         .vhost_vsock_set_running = vhost_kernel_vsock_set_running,
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 31e9704cdc..beec547e46 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -24,6 +24,7 @@
 #include "standard-headers/linux/vhost_types.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/mem/memory-device.h"
+#include "migration/misc.h"
 #include "migration/blocker.h"
 #include "migration/qemu-file-types.h"
 #include "system/dma.h"
@@ -1549,6 +1550,32 @@ static int vhost_dev_get_features(struct vhost_dev *hdev,
     return r;
 }
 
+static int vhost_cpr_notifier(NotifierWithReturn *notifier,
+                              MigrationEvent *e, Error **errp)
+{
+    struct vhost_dev *dev;
+    int r;
+
+    dev = container_of(notifier, struct vhost_dev, cpr_transfer_notifier);
+
+    if (dev->vhost_ops->backend_type != VHOST_BACKEND_TYPE_KERNEL) {
+        return 0;
+    }
+
+    if (e->type == MIG_EVENT_PRECOPY_SETUP) {
+        r = dev->vhost_ops->vhost_reset_owner(dev);
+        if (r < 0) {
+            VHOST_OPS_DEBUG(r, "vhost_reset_owner failed");
+        }
+    } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
+        r = dev->vhost_ops->vhost_set_owner(dev);
+        if (r < 0) {
+            VHOST_OPS_DEBUG(r, "vhost_set_owner failed");
+        }
+    }
+    return 0;
+}
+
 int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
                    VhostBackendType backend_type, uint32_t busyloop_timeout,
                    Error **errp)
@@ -1559,6 +1586,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 
     hdev->vdev = NULL;
     hdev->migration_blocker = NULL;
+    hdev->cpr_transfer_notifier.notify = NULL;
 
     r = vhost_set_backend_type(hdev, backend_type);
     assert(r >= 0);
@@ -1659,6 +1687,9 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
     hdev->log_enabled = false;
     hdev->started = false;
     memory_listener_register(&hdev->memory_listener, &address_space_memory);
+    migration_add_notifier_mode(&hdev->cpr_transfer_notifier,
+                                vhost_cpr_notifier,
+                                MIG_MODE_CPR_TRANSFER);
     QLIST_INSERT_HEAD(&vhost_devices, hdev, entry);
 
     /*
@@ -1711,6 +1742,7 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
         QLIST_REMOVE(hdev, entry);
     }
     migrate_del_blocker(&hdev->migration_blocker);
+    migration_remove_notifier(&hdev->cpr_transfer_notifier);
     g_free(hdev->mem);
     g_free(hdev->mem_sections);
     if (hdev->vhost_ops) {
diff --git a/include/hw/virtio/vhost-backend.h b/include/hw/virtio/vhost-backend.h
index ff94fa1734..18ce5ea9a0 100644
--- a/include/hw/virtio/vhost-backend.h
+++ b/include/hw/virtio/vhost-backend.h
@@ -196,6 +196,7 @@ typedef struct VhostOps {
     vhost_get_features_op vhost_get_features;
     vhost_set_backend_cap_op vhost_set_backend_cap;
     vhost_set_owner_op vhost_set_owner;
+    vhost_set_owner_op vhost_reset_owner;
     vhost_reset_device_op vhost_reset_device;
     vhost_get_vq_index_op vhost_get_vq_index;
     vhost_set_vring_enable_op vhost_set_vring_enable;
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 08bbb4dfe9..5d11a97e43 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -133,6 +133,7 @@ struct vhost_dev {
     QLIST_ENTRY(vhost_dev) logdev_entry;
     QLIST_HEAD(, vhost_iommu) iommu_list;
     IOMMUNotifier n;
+    NotifierWithReturn cpr_transfer_notifier;
     const VhostDevConfigOps *config_ops;
 };
 

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 4/8] cpr: delete all fds
  2026-01-28 20:39 [PATCH v4 0/8] Live update: tap and vhost Ben Chaney
                   ` (2 preceding siblings ...)
  2026-01-28 20:39 ` [PATCH v4 3/8] vhost: reset vhost devices for cpr Ben Chaney
@ 2026-01-28 20:39 ` Ben Chaney
  2026-01-28 20:39 ` [PATCH v4 5/8] tap: common return label Ben Chaney
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Ben Chaney @ 2026-01-28 20:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Stefano Garzarella,
	Jason Wang, Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Mark Kanda, Joshua Hunt, Max Tottenham,
	Ben Chaney, Steve Sistare

From: Steve Sistare <steven.sistare@oracle.com>

Add the cpr_delete_fd_all function to delete all fds associated with a
device.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Ben Chaney <bchaney@akamai.com>
---
 include/migration/cpr.h |  1 +
 migration/cpr.c         | 13 +++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index 027cb98073..d585fadc5b 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -29,6 +29,7 @@ extern CprState cpr_state;
 
 void cpr_save_fd(const char *name, int id, int fd);
 void cpr_delete_fd(const char *name, int id);
+void cpr_delete_fd_all(const char *name);
 int cpr_find_fd(const char *name, int id);
 void cpr_resave_fd(const char *name, int id, int fd);
 int cpr_open_fd(const char *path, int flags, const char *name, int id,
diff --git a/migration/cpr.c b/migration/cpr.c
index adee2a919a..c0bf93a7ba 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -85,6 +85,19 @@ void cpr_delete_fd(const char *name, int id)
     trace_cpr_delete_fd(name, id);
 }
 
+void cpr_delete_fd_all(const char *name)
+{
+    CprFd *elem, *next_elem;
+
+    QLIST_FOREACH_SAFE(elem, &cpr_state.fds, next, next_elem) {
+        if (!strcmp(elem->name, name)) {
+            QLIST_REMOVE(elem, next);
+            g_free(elem->name);
+            g_free(elem);
+        }
+    }
+}
+
 int cpr_find_fd(const char *name, int id)
 {
     CprFd *elem = find_fd(&cpr_state.fds, name, id);

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 5/8] tap: common return label
  2026-01-28 20:39 [PATCH v4 0/8] Live update: tap and vhost Ben Chaney
                   ` (3 preceding siblings ...)
  2026-01-28 20:39 ` [PATCH v4 4/8] cpr: delete all fds Ben Chaney
@ 2026-01-28 20:39 ` Ben Chaney
  2026-01-28 20:39 ` [PATCH v4 6/8] tap: cpr support Ben Chaney
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 22+ messages in thread
From: Ben Chaney @ 2026-01-28 20:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Stefano Garzarella,
	Jason Wang, Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Mark Kanda, Joshua Hunt, Max Tottenham,
	Ben Chaney, Steve Sistare

From: Steve Sistare <steven.sistare@oracle.com>

Modify net_init_tap so every return branches to a common label, for
common cleanup in a subsequent patch.  No functional change.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Ben Chaney <bchaney@akamai.com>
---
 net/tap.c | 55 +++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 37 insertions(+), 18 deletions(-)

diff --git a/net/tap.c b/net/tap.c
index bfba3fd7a7..1847167e4f 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -829,7 +829,8 @@ int net_init_tap(const Netdev *netdev, const char *name,
      * For -netdev, peer is always NULL. */
     if (peer && (tap->has_queues || tap->fds || tap->vhostfds)) {
         error_setg(errp, "Multiqueue tap cannot be used with hubs");
-        return -1;
+        ret = -1;
+        goto out;
     }
 
     if (tap->fd) {
@@ -839,23 +840,27 @@ int net_init_tap(const Netdev *netdev, const char *name,
             error_setg(errp, "ifname=, script=, downscript=, vnet_hdr=, "
                        "helper=, queues=, fds=, and vhostfds= "
                        "are invalid with fd=");
-            return -1;
+            ret = -1;
+            goto out;
         }
 
         fd = monitor_fd_param(monitor_cur(), tap->fd, errp);
         if (fd == -1) {
-            return -1;
+            ret = -1;
+            goto out;
         }
 
         if (!qemu_set_blocking(fd, false, errp)) {
             close(fd);
-            return -1;
+            ret = -1;
+            goto out;
         }
 
         vnet_hdr = tap_probe_vnet_hdr(fd, errp);
         if (vnet_hdr < 0) {
             close(fd);
-            return -1;
+            ret = -1;
+            goto out;
         }
 
         net_init_tap_one(tap, peer, "tap", name, NULL,
@@ -864,7 +869,8 @@ int net_init_tap(const Netdev *netdev, const char *name,
         if (err) {
             error_propagate(errp, err);
             close(fd);
-            return -1;
+            ret = -1;
+            goto out;
         }
     } else if (tap->fds) {
         char **fds;
@@ -877,7 +883,8 @@ int net_init_tap(const Netdev *netdev, const char *name,
             error_setg(errp, "ifname=, script=, downscript=, vnet_hdr=, "
                        "helper=, queues=, and vhostfd= "
                        "are invalid with fds=");
-            return -1;
+            ret = -1;
+            goto out;
         }
 
         fds = g_new0(char *, MAX_TAP_QUEUES);
@@ -939,29 +946,35 @@ free_fail:
         }
         g_free(fds);
         g_free(vhost_fds);
-        return ret;
+        goto out;
+
     } else if (tap->helper) {
         if (tap->ifname || tap->script || tap->downscript ||
             tap->has_vnet_hdr || tap->has_queues || tap->vhostfds) {
             error_setg(errp, "ifname=, script=, downscript=, vnet_hdr=, "
                        "queues=, and vhostfds= are invalid with helper=");
-            return -1;
+            ret = -1;
+            goto out;
         }
 
         fd = net_bridge_run_helper(tap->helper,
                                    tap->br ?: DEFAULT_BRIDGE_INTERFACE,
                                    errp);
         if (fd == -1) {
-            return -1;
+            ret = -1;
+            goto out;
         }
 
         if (!qemu_set_blocking(fd, false, errp)) {
-            return -1;
+            close(fd);
+            ret = -1;
+            goto out;
         }
         vnet_hdr = tap_probe_vnet_hdr(fd, errp);
         if (vnet_hdr < 0) {
             close(fd);
-            return -1;
+            ret = -1;
+            goto out;
         }
 
         net_init_tap_one(tap, peer, "bridge", name, ifname,
@@ -970,14 +983,16 @@ free_fail:
         if (err) {
             error_propagate(errp, err);
             close(fd);
-            return -1;
+            ret = -1;
+            goto out;
         }
     } else {
         g_autofree char *default_script = NULL;
         g_autofree char *default_downscript = NULL;
         if (tap->vhostfds) {
             error_setg(errp, "vhostfds= is invalid if fds= wasn't specified");
-            return -1;
+            ret = -1;
+            goto out;
         }
 
         if (!script) {
@@ -998,14 +1013,16 @@ free_fail:
             fd = net_tap_init(tap, &vnet_hdr, i >= 1 ? "no" : script,
                               ifname, sizeof ifname, queues > 1, errp);
             if (fd == -1) {
-                return -1;
+                ret = -1;
+                goto out;
             }
 
             if (queues > 1 && i == 0 && !tap->ifname) {
                 if (tap_fd_get_ifname(fd, ifname)) {
                     error_setg(errp, "Fail to get ifname");
                     close(fd);
-                    return -1;
+                    ret = -1;
+                    goto out;
                 }
             }
 
@@ -1016,12 +1033,14 @@ free_fail:
             if (err) {
                 error_propagate(errp, err);
                 close(fd);
-                return -1;
+                ret = -1;
+                goto out;
             }
         }
     }
 
-    return 0;
+out:
+    return ret;
 }
 
 int tap_enable(NetClientState *nc)

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 6/8] tap: cpr support
  2026-01-28 20:39 [PATCH v4 0/8] Live update: tap and vhost Ben Chaney
                   ` (4 preceding siblings ...)
  2026-01-28 20:39 ` [PATCH v4 5/8] tap: common return label Ben Chaney
@ 2026-01-28 20:39 ` Ben Chaney
  2026-02-04 13:05   ` Markus Armbruster
  2026-01-28 20:39 ` [PATCH v4 7/8] tap: postload fix for cpr Ben Chaney
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 22+ messages in thread
From: Ben Chaney @ 2026-01-28 20:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Stefano Garzarella,
	Jason Wang, Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Mark Kanda, Joshua Hunt, Max Tottenham,
	Ben Chaney, Steve Sistare

Provide the cpr=on option to preserve TAP and vhost descriptors during
cpr-transfer, so the management layer does not need to create a new
device for the target.

Save all tap fd's in order with the tap device fds saved first,
and the vhostfd saved after.

Example:

-netdev tap,id=hostnet2,cpr=on

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Ben Chaney <bchaney@akamai.com>
---
 hw/vfio/device.c        |  2 +-
 include/migration/cpr.h |  4 +--
 migration/cpr.c         | 19 +++++++------
 net/tap.c               | 74 +++++++++++++++++++++++++++++++++++++++----------
 qapi/net.json           |  6 +++-
 5 files changed, 77 insertions(+), 28 deletions(-)

diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 086f20f676..cbc8db6a67 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -363,7 +363,7 @@ void vfio_device_free_name(VFIODevice *vbasedev)
 
 void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
 {
-    vbasedev->fd = cpr_get_fd_param(vbasedev->dev->id, str, 0, errp);
+    vbasedev->fd = get_fd_param(vbasedev->dev->id, str, 0, true, errp);
 }
 
 static VFIODeviceIOOps vfio_device_io_ops_ioctl;
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index d585fadc5b..ded6ceff7c 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -48,8 +48,8 @@ void cpr_state_close(void);
 struct QIOChannel *cpr_state_ioc(void);
 
 bool cpr_incoming_needed(void *opaque);
-int cpr_get_fd_param(const char *name, const char *fdname, int index,
-                     Error **errp);
+int get_fd_param(const char *cpr_name, const char *fdname, int index, bool cpr,
+                 Error **errp);
 
 QEMUFile *cpr_transfer_output(MigrationChannel *channel, Error **errp);
 QEMUFile *cpr_transfer_input(MigrationChannel *channel, Error **errp);
diff --git a/migration/cpr.c b/migration/cpr.c
index c0bf93a7ba..f2c40eeba5 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -311,11 +311,12 @@ bool cpr_incoming_needed(void *opaque)
 }
 
 /*
- * cpr_get_fd_param: find a descriptor and return its value.
+ * get_fd_param: find a descriptor and return its value.
  *
- * @name: CPR name for the descriptor
+ * @cpr_name: CPR name for the descriptor
  * @fdname: An integer-valued string, or a name passed to a getfd command
  * @index: CPR index of the descriptor
+ * @cpr: cpr is enabled on the associated device
  * @errp: returned error message
  *
  * If CPR is not being performed, then use @fdname to find the fd.
@@ -324,23 +325,23 @@ bool cpr_incoming_needed(void *opaque)
  *
  * On success returns the fd value, else returns -1.
  */
-int cpr_get_fd_param(const char *name, const char *fdname, int index,
-                     Error **errp)
+int get_fd_param(const char *cpr_name, const char *fdname, int index,
+                     bool cpr, Error **errp)
 {
     ERRP_GUARD();
     int fd;
 
-    if (cpr_is_incoming()) {
-        fd = cpr_find_fd(name, index);
+    if (cpr && cpr_is_incoming()) {
+        fd = cpr_find_fd(cpr_name, index);
         if (fd < 0) {
             error_setg(errp, "cannot find saved value for fd %s", fdname);
         }
     } else {
         fd = monitor_fd_param(monitor_cur(), fdname, errp);
-        if (fd >= 0) {
-            cpr_save_fd(name, index, fd);
-        } else {
+        if (fd < 0) {
             error_prepend(errp, "Could not parse object fd %s:", fdname);
+        } else if (cpr) {
+            cpr_save_fd(cpr_name, index, fd);
         }
     }
     return fd;
diff --git a/net/tap.c b/net/tap.c
index 1847167e4f..8875498434 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -35,6 +35,7 @@
 #include "net/eth.h"
 #include "net/net.h"
 #include "clients.h"
+#include "migration/cpr.h"
 #include "monitor/monitor.h"
 #include "system/system.h"
 #include "qapi/error.h"
@@ -80,6 +81,7 @@ typedef struct TAPState {
     bool has_uso;
     bool has_tunnel;
     bool enabled;
+    bool cpr;
     VHostNetState *vhost_net;
     unsigned host_vnet_hdr_len;
     Notifier exit;
@@ -323,6 +325,9 @@ static void tap_cleanup(NetClientState *nc)
 {
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
 
+    if (s->cpr) {
+        cpr_delete_fd_all(nc->name);
+    }
     if (s->vhost_net) {
         vhost_net_cleanup(s->vhost_net);
         g_free(s->vhost_net);
@@ -690,18 +695,24 @@ static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
     return fd;
 }
 
+/* CPR fd's for each queue are saved at these indices */
+#define TAP_FD_INDEX(queue)                   ((queue))
+#define TAP_VHOSTFD_INDEX(queue, total_fds)   ((queue) + (total_fds))
+
 #define MAX_TAP_QUEUES 1024
 
 static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
                              const char *model, const char *name,
                              const char *ifname, const char *script,
                              const char *downscript, const char *vhostfdname,
-                             int vnet_hdr, int fd, Error **errp)
+                             int vnet_hdr, int fd, int index, Error **errp)
 {
     Error *err = NULL;
     TAPState *s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
+    bool cpr = tap->has_cpr ? tap->cpr : false;
     int vhostfd;
 
+    s->cpr = cpr;
     tap_set_sndbuf(s->fd, tap, &err);
     if (err) {
         error_propagate(errp, err);
@@ -736,7 +747,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
         }
 
         if (vhostfdname) {
-            vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, &err);
+            vhostfd = get_fd_param(name, vhostfdname, index, cpr, &err);
             if (vhostfd == -1) {
                 error_propagate(errp, err);
                 goto failed;
@@ -745,12 +756,21 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
                 goto failed;
             }
         } else {
-            vhostfd = open("/dev/vhost-net", O_RDWR);
+            vhostfd = cpr ? cpr_find_fd(name, index) : -1;
+            if (vhostfd < 0) {
+                vhostfd = open("/dev/vhost-net", O_RDWR);
+                if (cpr && vhostfd >= 0) {
+                    cpr_save_fd(name, index, vhostfd);
+                }
+            }
             if (vhostfd < 0) {
                 error_setg_file_open(errp, errno, "/dev/vhost-net");
                 goto failed;
             }
             if (!qemu_set_blocking(vhostfd, false, errp)) {
+                if (!cpr) {
+                    close(vhostfd);
+                }
                 goto failed;
             }
         }
@@ -776,6 +796,9 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
     return;
 
 failed:
+    if (cpr) {
+        cpr_delete_fd_all(name);
+    }
     qemu_del_net_client(&s->nc);
 }
 
@@ -808,7 +831,8 @@ static int get_fds(char *str, char *fds[], int max)
 int net_init_tap(const Netdev *netdev, const char *name,
                  NetClientState *peer, Error **errp)
 {
-    const NetdevTapOptions *tap;
+    const NetdevTapOptions *tap = &netdev->u.tap;
+    bool cpr = tap->has_cpr ? tap->cpr : false;
     int fd, vnet_hdr = 0, i = 0, queues;
     /* for the no-fd, no-helper case */
     const char *script;
@@ -844,7 +868,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
             goto out;
         }
 
-        fd = monitor_fd_param(monitor_cur(), tap->fd, errp);
+        fd = get_fd_param(name, tap->fd, TAP_FD_INDEX(0), cpr, errp);
         if (fd == -1) {
             ret = -1;
             goto out;
@@ -865,13 +889,15 @@ int net_init_tap(const Netdev *netdev, const char *name,
 
         net_init_tap_one(tap, peer, "tap", name, NULL,
                          script, downscript,
-                         vhostfdname, vnet_hdr, fd, &err);
+                         vhostfdname, vnet_hdr, fd,
+                         TAP_VHOSTFD_INDEX(0, 1), &err);
         if (err) {
             error_propagate(errp, err);
             close(fd);
             ret = -1;
             goto out;
         }
+
     } else if (tap->fds) {
         char **fds;
         char **vhost_fds;
@@ -902,7 +928,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
         }
 
         for (i = 0; i < nfds; i++) {
-            fd = monitor_fd_param(monitor_cur(), fds[i], errp);
+            fd = get_fd_param(name, fds[i], TAP_FD_INDEX(i), cpr, errp);
             if (fd == -1) {
                 ret = -1;
                 goto free_fail;
@@ -929,7 +955,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
             net_init_tap_one(tap, peer, "tap", name, ifname,
                              script, downscript,
                              tap->vhostfds ? vhost_fds[i] : NULL,
-                             vnet_hdr, fd, &err);
+                             vnet_hdr, fd, TAP_VHOSTFD_INDEX(i, nfds), &err);
             if (err) {
                 error_propagate(errp, err);
                 ret = -1;
@@ -957,9 +983,15 @@ free_fail:
             goto out;
         }
 
-        fd = net_bridge_run_helper(tap->helper,
-                                   tap->br ?: DEFAULT_BRIDGE_INTERFACE,
-                                   errp);
+        fd = cpr ? cpr_find_fd(name, TAP_FD_INDEX(0)) : -1;
+        if (fd < 0) {
+            fd = net_bridge_run_helper(tap->helper,
+                                    tap->br ?: DEFAULT_BRIDGE_INTERFACE,
+                                    errp);
+            if (cpr && fd >= 0) {
+                cpr_save_fd(name, TAP_FD_INDEX(0), fd);
+            }
+        }
         if (fd == -1) {
             ret = -1;
             goto out;
@@ -979,13 +1011,14 @@ free_fail:
 
         net_init_tap_one(tap, peer, "bridge", name, ifname,
                          script, downscript, vhostfdname,
-                         vnet_hdr, fd, &err);
+                         vnet_hdr, fd, TAP_VHOSTFD_INDEX(0, 1), &err);
         if (err) {
             error_propagate(errp, err);
             close(fd);
             ret = -1;
             goto out;
         }
+
     } else {
         g_autofree char *default_script = NULL;
         g_autofree char *default_downscript = NULL;
@@ -1010,8 +1043,14 @@ free_fail:
         }
 
         for (i = 0; i < queues; i++) {
-            fd = net_tap_init(tap, &vnet_hdr, i >= 1 ? "no" : script,
-                              ifname, sizeof ifname, queues > 1, errp);
+            fd = cpr ? cpr_find_fd(name, TAP_FD_INDEX(i)) : -1;
+            if (fd < 0) {
+                fd = net_tap_init(tap, &vnet_hdr, i >= 1 ? "no" : script,
+                                ifname, sizeof ifname, queues > 1, errp);
+                if (cpr && fd >= 0) {
+                    cpr_save_fd(name, TAP_FD_INDEX(i), fd);
+                }
+            }
             if (fd == -1) {
                 ret = -1;
                 goto out;
@@ -1029,7 +1068,9 @@ free_fail:
             net_init_tap_one(tap, peer, "tap", name, ifname,
                              i >= 1 ? "no" : script,
                              i >= 1 ? "no" : downscript,
-                             vhostfdname, vnet_hdr, fd, &err);
+                             vhostfdname, vnet_hdr,
+                             fd, TAP_VHOSTFD_INDEX(i, queues),
+                             &err);
             if (err) {
                 error_propagate(errp, err);
                 close(fd);
@@ -1040,6 +1081,9 @@ free_fail:
     }
 
 out:
+    if (ret && cpr) {
+        cpr_delete_fd_all(name);
+    }
     return ret;
 }
 
diff --git a/qapi/net.json b/qapi/net.json
index 118bd34965..4b12fca94b 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -355,6 +355,9 @@
 # @poll-us: maximum number of microseconds that could be spent on busy
 #     polling for tap (since 2.7)
 #
+# @cpr: preserve the state of this device and its associated file
+#     descriptors during cpr-transfer for reduced migration downtime
+#
 # Since: 1.2
 ##
 { 'struct': 'NetdevTapOptions',
@@ -373,7 +376,8 @@
     '*vhostfds':   'str',
     '*vhostforce': 'bool',
     '*queues':     'uint32',
-    '*poll-us':    'uint32'} }
+    '*poll-us':    'uint32',
+    '*cpr':        'bool'} }
 
 ##
 # @NetdevSocketOptions:

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 7/8] tap: postload fix for cpr
  2026-01-28 20:39 [PATCH v4 0/8] Live update: tap and vhost Ben Chaney
                   ` (5 preceding siblings ...)
  2026-01-28 20:39 ` [PATCH v4 6/8] tap: cpr support Ben Chaney
@ 2026-01-28 20:39 ` Ben Chaney
  2026-01-28 20:39 ` [PATCH v4 8/8] tap: cpr fixes Ben Chaney
  2026-01-29 13:58 ` [PATCH v4 0/8] Live update: tap and vhost Vladimir Sementsov-Ogievskiy
  8 siblings, 0 replies; 22+ messages in thread
From: Ben Chaney @ 2026-01-28 20:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Stefano Garzarella,
	Jason Wang, Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Mark Kanda, Joshua Hunt, Max Tottenham,
	Ben Chaney, Steve Sistare

From: Steve Sistare <steven.sistare@oracle.com>

After cpr of a multi-queue NIC, if any queues are unused, then the
corresponding tap is marked enabled in userland, but it is disabled in the
kernel for the fd that was preserved.  One cannot call tap_disable() during
postload, because that eventually calls IFF_DETACH_QUEUE, which fails
because the queue is already detached.  Define tap_disable_postload to
avoid IFF_DETACH_QUEUE.

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Ben Chaney <bchaney@akamai.com>
---
 hw/net/virtio-net.c | 20 ++++++++++++++++++++
 include/net/tap.h   |  1 +
 net/tap-win32.c     |  5 +++++
 net/tap.c           | 17 +++++++++++++++++
 4 files changed, 43 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 317f1ad23b..bb7c8d9b78 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -765,6 +765,25 @@ static int peer_detach(VirtIONet *n, int index)
     return tap_disable(nc->peer);
 }
 
+/*
+ * Set the disabled flag on unused queue pairs after vmstate load, without
+ * calling IFF_DETACH_QUEUE, which fails because the queue is already detached.
+ */
+static void virtio_net_postload_queue_pairs(VirtIONet *n)
+{
+    int i;
+    MigMode mode = migrate_mode();
+
+    if (mode == MIG_MODE_CPR_TRANSFER) {
+        for (i = n->curr_queue_pairs; i < n->max_queue_pairs; i++) {
+            NetClientState *nc = qemu_get_subqueue(n->nic, i);
+            if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_TAP) {
+                tap_disable_postload(nc->peer);
+            }
+        }
+    }
+}
+
 static void virtio_net_set_queue_pairs(VirtIONet *n)
 {
     int i;
@@ -3210,6 +3229,7 @@ static int virtio_net_post_load_device(void *opaque, int version_id)
      */
     n->saved_guest_offloads = n->curr_guest_offloads;
 
+    virtio_net_postload_queue_pairs(n);
     virtio_net_set_queue_pairs(n);
 
     /* Find the first multicast entry in the saved MAC filter */
diff --git a/include/net/tap.h b/include/net/tap.h
index 6f34f13eae..934131f551 100644
--- a/include/net/tap.h
+++ b/include/net/tap.h
@@ -30,6 +30,7 @@
 
 int tap_enable(NetClientState *nc);
 int tap_disable(NetClientState *nc);
+void tap_disable_postload(NetClientState *nc);
 
 int tap_get_fd(NetClientState *nc);
 
diff --git a/net/tap-win32.c b/net/tap-win32.c
index 38baf90e0b..efe81c54ee 100644
--- a/net/tap-win32.c
+++ b/net/tap-win32.c
@@ -766,3 +766,8 @@ int tap_disable(NetClientState *nc)
 {
     abort();
 }
+
+void tap_disable_postload(NetClientState *nc)
+{
+    abort();
+}
diff --git a/net/tap.c b/net/tap.c
index 8875498434..2961607cda 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -1121,3 +1121,20 @@ int tap_disable(NetClientState *nc)
         return ret;
     }
 }
+
+/*
+ * On cpr restart, the tap is marked enabled in userland, but it might be
+ * disabled in the kernel, and IFF_DETACH_QUEUE will fail because it is
+ * already detached.  This function disables without calling IFF_DETACH_QUEUE.
+ */
+void tap_disable_postload(NetClientState *nc)
+{
+    TAPState *s = DO_UPCAST(TAPState, nc, nc);
+
+    if (!s->cpr || s->enabled == 0) {
+        return;
+    } else {
+        s->enabled = false;
+        tap_update_fd_handler(s);
+    }
+}

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 8/8] tap: cpr fixes
  2026-01-28 20:39 [PATCH v4 0/8] Live update: tap and vhost Ben Chaney
                   ` (6 preceding siblings ...)
  2026-01-28 20:39 ` [PATCH v4 7/8] tap: postload fix for cpr Ben Chaney
@ 2026-01-28 20:39 ` Ben Chaney
  2026-01-29 13:58 ` [PATCH v4 0/8] Live update: tap and vhost Vladimir Sementsov-Ogievskiy
  8 siblings, 0 replies; 22+ messages in thread
From: Ben Chaney @ 2026-01-28 20:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Stefano Garzarella,
	Jason Wang, Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Mark Kanda, Joshua Hunt, Max Tottenham,
	Ben Chaney, Vladimir Sementsov-Ogievskiy, Steve Sistare

Fix "virtio_net_set_queue_pairs: Assertion `!r' failed."
Fix "virtio-net: saved image requires vnet_hdr=on"

Reported-by: Ben Chaney <bchaney@akamai.com>
Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
Signed-off-by: Ben Chaney <bchaney@akamai.com>
---
 hw/net/virtio-net.c | 6 ++++++
 net/tap.c           | 2 ++
 stubs/cpr.c         | 8 ++++++++
 stubs/meson.build   | 1 +
 4 files changed, 17 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index bb7c8d9b78..7e809c37fc 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -37,6 +37,7 @@
 #include "qapi/qapi-types-migration.h"
 #include "qapi/qapi-events-migration.h"
 #include "hw/virtio/virtio-access.h"
+#include "migration/cpr.h"
 #include "migration/misc.h"
 #include "standard-headers/linux/ethtool.h"
 #include "system/system.h"
@@ -789,6 +790,11 @@ static void virtio_net_set_queue_pairs(VirtIONet *n)
     int i;
     int r;
 
+    if (cpr_is_incoming()) {
+        /* peers are already attached, do nothing */
+        return;
+    }
+
     if (n->nic->peer_deleted) {
         return;
     }
diff --git a/net/tap.c b/net/tap.c
index 2961607cda..b58072db99 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -1050,6 +1050,8 @@ free_fail:
                 if (cpr && fd >= 0) {
                     cpr_save_fd(name, TAP_FD_INDEX(i), fd);
                 }
+            } else {
+                vnet_hdr = tap->has_vnet_hdr ? tap->vnet_hdr : 1;
             }
             if (fd == -1) {
                 ret = -1;
diff --git a/stubs/cpr.c b/stubs/cpr.c
new file mode 100644
index 0000000000..1a4dbbb2d7
--- /dev/null
+++ b/stubs/cpr.c
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#include "qemu/osdep.h"
+#include "migration/cpr.h"
+
+bool cpr_is_incoming(void)
+{
+    return false;
+}
diff --git a/stubs/meson.build b/stubs/meson.build
index 2b5fd8a88a..19c1932bbf 100644
--- a/stubs/meson.build
+++ b/stubs/meson.build
@@ -10,6 +10,7 @@ stub_ss.add(files('is-daemonized.c'))
 stub_ss.add(files('monitor-core.c'))
 stub_ss.add(files('replay-mode.c'))
 stub_ss.add(files('trace-control.c'))
+stub_ss.add(files('cpr.c'))
 
 if have_block
   stub_ss.add(files('bdrv-next-monitor-owned.c'))

-- 
2.34.1



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 0/8] Live update: tap and vhost
  2026-01-28 20:39 [PATCH v4 0/8] Live update: tap and vhost Ben Chaney
                   ` (7 preceding siblings ...)
  2026-01-28 20:39 ` [PATCH v4 8/8] tap: cpr fixes Ben Chaney
@ 2026-01-29 13:58 ` Vladimir Sementsov-Ogievskiy
  2026-02-02 14:06   ` Peter Xu
  8 siblings, 1 reply; 22+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2026-01-29 13:58 UTC (permalink / raw)
  To: Ben Chaney, qemu-devel
  Cc: Peter Xu, Fabiano Rosas, Michael S. Tsirkin, Stefano Garzarella,
	Jason Wang, Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Mark Kanda, Joshua Hunt, Max Tottenham,
	Steve Sistare

On 28.01.26 23:39, Ben Chaney wrote:
> Tap and vhost devices can be preserved during cpr-transfer using
> traditional live migration methods, wherein the management layer
> creates new interfaces for the target and fiddles with 'ip link'
> to deactivate the old interface and activate the new.
> 
> However, CPR can simply send the file descriptors to new QEMU,
> with no special management actions required.  The user enables
> this behavior by specifing '-netdev tap,cpr=on'.  The default
> is cpr=off.
> 
> Signed-off-by: Ben Chaney <bchaney@akamai.com>
Hi!

I'd like to note again, that I'mvworking on an alternative solution for live-updating
virtio-net+TAP, passing FDs through unix domain socket, which:

1. Doesn't require second migration channel
2. Doesn't use CPR: the whole TAP state, including negotiated parameters
and opened FD are natively passed as usual migration state structure.
(look here: https://lore.kernel.org/qemu-devel/20251030203116.870742-7-vsementsov@yandex-team.ru/ )
3. Still should be compatible with CPR, and may be used in context of CPR-update

The latest version was

[PATCH v9 0/8] virtio-net: live-TAP local migration
https://lore.kernel.org/qemu-devel/20251030203116.870742-1-vsementsov@yandex-team.ru/

and I plan to post v10 soon.


-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 1/8] migration: stop vm earlier for cpr
  2026-01-28 20:39 ` [PATCH v4 1/8] migration: stop vm earlier for cpr Ben Chaney
@ 2026-02-02 13:31   ` Fabiano Rosas
  0 siblings, 0 replies; 22+ messages in thread
From: Fabiano Rosas @ 2026-02-02 13:31 UTC (permalink / raw)
  To: Ben Chaney, qemu-devel
  Cc: Peter Xu, Michael S. Tsirkin, Stefano Garzarella, Jason Wang,
	Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Mark Kanda, Joshua Hunt, Max Tottenham,
	Ben Chaney, Steve Sistare

Ben Chaney <bchaney@akamai.com> writes:

> From: Steve Sistare <steven.sistare@oracle.com>
>
> Stop the vm earlier for cpr, before cpr_save_state which causes new QEMU
> to proceed and initialize devices.  We must guarantee devices are stopped
> in old QEMU, and all source notifiers called, before they are initialized
> in new QEMU.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> Signed-off-by: Ben Chaney <bchaney@akamai.com>
> ---
>  migration/migration.c | 57 +++++++++++++++++++++++++++++++++++++++++++--------
>  1 file changed, 48 insertions(+), 9 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 1bcde301f7..f36e59d9e8 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -1654,6 +1654,7 @@ void migration_cancel(void)
>                            MIGRATION_STATUS_CANCELLED);
>          cpr_state_close();
>          migrate_hup_delete(s);
> +        vm_resume(s->vm_old_state);
>      }
>  }
>  
> @@ -2212,6 +2213,7 @@ void qmp_migrate(const char *uri, bool has_channels,
>      MigrationAddress *addr = NULL;
>      MigrationChannel *channelv[MIGRATION_CHANNEL_TYPE__MAX] = { NULL };
>      MigrationChannel *cpr_channel = NULL;
> +    bool stopped = false;
>  

This is not needed anymore, all cleanup now happens after
migration_connect_error_propagate -> migration_cleanup.

>      /*
>       * Having preliminary checks for uri and channel
> @@ -2264,6 +2266,46 @@ void qmp_migrate(const char *uri, bool has_channels,
>          return;
>      }
>  
> +    /*
> +     * CPR-transfer  ordering:
> +     *
> +     *   SOURCE                              TARGET
> +     *   ------                              ------
> +     *                                       cpr_state_load() blocks
> +     *   |                                        |
> +     *   |  1. migration_stop_vm()                |
> +     *   |     VM stopped, devices quiesced       |
> +     *   |                                        | Waiting for
> +     *   |  2. notifiers (PRECOPY_SETUP)          | FDs from source
> +     *   |     vhost_reset_owner() releases       |
> +     *   |     device ownership                   |

Patch 2 should come before, then. Otherwise this is not accurate.

> +     *   |                                        |
> +     *   |  3. cpr_state_save() ---- FDs -------> |
> +     *   |                                        |
> +     *   v                                        v
> +     *   postmigrate                         Device init begins
> +     *                                       - cpr_find_fd()
> +     *                                       - vhost_dev_init()
> +     *                                       - VHOST_SET_OWNER
> +     *
> +     * Step 3 is the synchronization/cut-over point. Target proceeds immediately
> +     * upon receiving FDs, so steps 1-2 must complete otherwise:
> +     * - Target's VHOST_SET_OWNER fails with -EBUSY (source still owns)
> +     * - Race between source I/O and target device init
> +     *
> +     *  We stop the VM early (before FD transfer) to prevent this race.
> +     *  Unlike regular migration, CPR-transfer passes memory via FD (memfd)
> +     *  rather than copying RAM, so early VM stop should have minimal downtime.
> +     */
> +    if (migrate_mode_is_cpr(s)) {
> +        int ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
> +        if (ret < 0) {
> +            error_setg(&local_err, "migration_stop_vm failed, error %d", -ret);
> +            goto out;
> +        }
> +        stopped = true;
> +    }
> +
>      if (!cpr_state_save(cpr_channel, &local_err)) {
>          goto out;
>      }
> @@ -2290,6 +2332,9 @@ out:
>      if (local_err) {
>          migration_connect_error_propagate(s, error_copy(local_err));
>          error_propagate(errp, local_err);
> +        if (stopped) {
> +            vm_resume(s->vm_old_state);
> +        }

This resume should now happen at migration_cleanup.

>      }
>  }
>  
> @@ -2334,6 +2379,9 @@ static void qmp_migrate_finish(MigrationAddress *addr, bool resume_requested,
>          }
>          migration_connect_error_propagate(s, error_copy(local_err));
>          error_propagate(errp, local_err);
> +        if (migrate_mode_is_cpr(s)) {
> +            vm_resume(s->vm_old_state);
> +        }

Same here.

>          return;
>      }
>  }
> @@ -4017,7 +4065,6 @@ void migration_connect(MigrationState *s, Error *error_in)
>      Error *local_err = NULL;
>      uint64_t rate_limit;
>      bool resume = (s->state == MIGRATION_STATUS_POSTCOPY_RECOVER_SETUP);
> -    int ret;
>  
>      /*
>       * If there's a previous error, free it and prepare for another one.
> @@ -4088,14 +4135,6 @@ void migration_connect(MigrationState *s, Error *error_in)
>          return;
>      }
>  
> -    if (migrate_mode_is_cpr(s)) {
> -        ret = migration_stop_vm(s, RUN_STATE_FINISH_MIGRATE);
> -        if (ret < 0) {
> -            error_setg(&local_err, "migration_stop_vm failed, error %d", -ret);
> -            goto fail;
> -        }
> -    }
> -
>      /*
>       * Take a refcount to make sure the migration object won't get freed by
>       * the main thread already in migration_shutdown().


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 2/8] migration: cpr setup notifier
  2026-01-28 20:39 ` [PATCH v4 2/8] migration: cpr setup notifier Ben Chaney
@ 2026-02-02 14:01   ` Fabiano Rosas
  0 siblings, 0 replies; 22+ messages in thread
From: Fabiano Rosas @ 2026-02-02 14:01 UTC (permalink / raw)
  To: Ben Chaney, qemu-devel
  Cc: Peter Xu, Michael S. Tsirkin, Stefano Garzarella, Jason Wang,
	Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Mark Kanda, Joshua Hunt, Max Tottenham,
	Ben Chaney, Steve Sistare

Ben Chaney <bchaney@akamai.com> writes:

> From: Steve Sistare <steven.sistare@oracle.com>
>
> Call MIG_EVENT_PRECOPY_SETUP earlier, before CPR starts.  An early notifier
> is needed for resetting vhost devices, as explained in the next patch.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> Signed-off-by: Ben Chaney <bchaney@akamai.com>
> ---
>  migration/migration.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index f36e59d9e8..191a34f667 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -2306,7 +2306,14 @@ void qmp_migrate(const char *uri, bool has_channels,
>          stopped = true;
>      }
>  
> +    /* Notify before starting migration thread, and before starting cpr */
> +    if (!resume_requested &&
> +        migration_call_notifiers(s, MIG_EVENT_PRECOPY_SETUP, &local_err)) {
> +        goto out;
> +    }

Probably better at the end of migrate_init() or along with the state
change to SETUP.

Also note that this will emit the event even before connections have
been established. I didn't spot any issues with that.

> +
>      if (!cpr_state_save(cpr_channel, &local_err)) {
> +        migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL);

Due to changes on master, this now needs to go into
migration_connect_error_propagate() so that all paths that saw
MIG_EVENT_PRECOPY_SETUP also see MIG_EVENT_PRECOPY_FAILED.

  migrate_prepare() -> MIG_EVENT_PRECOPY_SETUP
    migration_connect_error_propagate() -> MIG_EVENT_PRECOPY_FAILED
    migration_iteration_finish() -> MIG_EVENT_PRECOPY_FAILED
  migration_cleanup() -> MIG_EVENT_PRECOPY_DONE

And due to changes queued [0], migration_cleanup() no longer emits the
FAILED event, so migration_connect_error_propagate() needs to send the
event for both FAILED and CANCELLING states, you can use the
migration_has_failed() helper.
https://lore.kernel.org/r/20260126213614.3815900-1-peterx@redhat.com

The special case at migrate_cancel() will also need
MIG_EVENT_PRECOPY_FAILED before resuming the VM.

>          goto out;
>      }
>  
> @@ -4097,11 +4104,6 @@ void migration_connect(MigrationState *s, Error *error_in)
>      } else {
>          /* This is a fresh new migration */
>          rate_limit = migrate_max_bandwidth();
> -
> -        /* Notify before starting migration thread */
> -        if (migration_call_notifiers(s, MIG_EVENT_PRECOPY_SETUP, &local_err)) {
> -            goto fail;
> -        }
>      }
>  
>      migration_rate_set(rate_limit);


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 0/8] Live update: tap and vhost
  2026-01-29 13:58 ` [PATCH v4 0/8] Live update: tap and vhost Vladimir Sementsov-Ogievskiy
@ 2026-02-02 14:06   ` Peter Xu
  2026-02-02 15:42     ` Chaney, Ben
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2026-02-02 14:06 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Ben Chaney, qemu-devel, Fabiano Rosas, Michael S. Tsirkin,
	Stefano Garzarella, Jason Wang, Alex Williamson,
	Cédric Le Goater, Eric Blake, Markus Armbruster, Stefan Weil,
	Daniel P. Berrangé, Paolo Bonzini, Hamza Khan, Mark Kanda,
	Joshua Hunt, Max Tottenham, Steve Sistare

On Thu, Jan 29, 2026 at 04:58:41PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 28.01.26 23:39, Ben Chaney wrote:
> > Tap and vhost devices can be preserved during cpr-transfer using
> > traditional live migration methods, wherein the management layer
> > creates new interfaces for the target and fiddles with 'ip link'
> > to deactivate the old interface and activate the new.
> > 
> > However, CPR can simply send the file descriptors to new QEMU,
> > with no special management actions required.  The user enables
> > this behavior by specifing '-netdev tap,cpr=on'.  The default
> > is cpr=off.
> > 
> > Signed-off-by: Ben Chaney <bchaney@akamai.com>
> Hi!
> 
> I'd like to note again, that I'mvworking on an alternative solution for live-updating
> virtio-net+TAP, passing FDs through unix domain socket, which:
> 
> 1. Doesn't require second migration channel
> 2. Doesn't use CPR: the whole TAP state, including negotiated parameters
> and opened FD are natively passed as usual migration state structure.
> (look here: https://lore.kernel.org/qemu-devel/20251030203116.870742-7-vsementsov@yandex-team.ru/ )
> 3. Still should be compatible with CPR, and may be used in context of CPR-update
> 
> The latest version was
> 
> [PATCH v9 0/8] virtio-net: live-TAP local migration
> https://lore.kernel.org/qemu-devel/20251030203116.870742-1-vsementsov@yandex-team.ru/
> 
> and I plan to post v10 soon.

Yes, thanks for re-raising this.  If we have similar features being
proposed, we should always discuss whether we should stick with one of them
if that'll work for all.

IIUC Vladimir's solution looks indeed superior in that it has less
constraints, and also works for CPR mode.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 0/8] Live update: tap and vhost
  2026-02-02 14:06   ` Peter Xu
@ 2026-02-02 15:42     ` Chaney, Ben
  2026-02-03  9:57       ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 22+ messages in thread
From: Chaney, Ben @ 2026-02-02 15:42 UTC (permalink / raw)
  To: Peter Xu, Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel@nongnu.org, Fabiano Rosas, Michael S. Tsirkin,
	Stefano Garzarella, Jason Wang, Alex Williamson,
	Cédric Le Goater, Eric Blake, Markus Armbruster, Stefan Weil,
	Daniel P. Berrangé, Paolo Bonzini, Hamza Khan, Mark Kanda,
	Hunt, Joshua, Tottenham, Max, Steve Sistare



On 2/2/26, 9:07 AM, "Peter Xu" <peterx@redhat.com <mailto:peterx@redhat.com>> wrote:

> > The latest version was
> >
> > [PATCH v9 0/8] virtio-net: live-TAP local migration
> > https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20251030203116.870742-1-vsementsov@yandex-team.ru <mailto:20251030203116.870742-1-vsementsov@yandex-team.ru>/__;!!GjvTz_vk!T9a83fcFgNhv7DgAAfFINb-FE6knVp53t5c7XkCrY76jAhmGxZz42YqHIlPt-eV_hh7V-OksMeBMsw$
> >
> > and I plan to post v10 soon.


> Yes, thanks for re-raising this. If we have similar features being
> proposed, we should always discuss whether we should stick with one of them
> if that'll work for all.


> IIUC Vladimir's solution looks indeed superior in that it has less
> constraints, and also works for CPR mode.


This was previously discussed here: https://lore.kernel.org/all/ef7fd47a-f7c0-4bca-823c-07005c5f1959@yandex-team.ru/

My impression from that discussion is that

1. Vladimir's solution has some extra complexity
2. We are trying to standardize cpr as the primary method for local migration,
so the benefit of supporting non-cpr local transfers is slightly double edged

That said, both solutions seem valid and Vladimir is doing good work in this area,
so I'd be happy with either outcome in this case.

Thanks,
     Ben


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 0/8] Live update: tap and vhost
  2026-02-02 15:42     ` Chaney, Ben
@ 2026-02-03  9:57       ` Vladimir Sementsov-Ogievskiy
  2026-02-03 19:17         ` Peter Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2026-02-03  9:57 UTC (permalink / raw)
  To: Chaney, Ben, Peter Xu
  Cc: qemu-devel@nongnu.org, Fabiano Rosas, Michael S. Tsirkin,
	Stefano Garzarella, Jason Wang, Alex Williamson,
	Cédric Le Goater, Eric Blake, Markus Armbruster, Stefan Weil,
	Daniel P. Berrangé, Paolo Bonzini, Hamza Khan, Mark Kanda,
	Hunt, Joshua, Tottenham, Max, Steve Sistare

On 02.02.26 18:42, Chaney, Ben wrote:
> 
> 
> On 2/2/26, 9:07 AM, "Peter Xu" <peterx@redhat.com <mailto:peterx@redhat.com>> wrote:
> 
>>> The latest version was
>>>
>>> [PATCH v9 0/8] virtio-net: live-TAP local migration
>>> https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20251030203116.870742-1-vsementsov@yandex-team.ru <mailto:20251030203116.870742-1-vsementsov@yandex-team.ru>/__;!!GjvTz_vk!T9a83fcFgNhv7DgAAfFINb-FE6knVp53t5c7XkCrY76jAhmGxZz42YqHIlPt-eV_hh7V-OksMeBMsw$
>>>
>>> and I plan to post v10 soon.
> 
> 
>> Yes, thanks for re-raising this. If we have similar features being
>> proposed, we should always discuss whether we should stick with one of them
>> if that'll work for all.
> 
> 
>> IIUC Vladimir's solution looks indeed superior in that it has less
>> constraints, and also works for CPR mode.
> 
> 
> This was previously discussed here: https://lore.kernel.org/all/ef7fd47a-f7c0-4bca-823c-07005c5f1959@yandex-team.ru/
> 
> My impression from that discussion is that
> 
> 1. Vladimir's solution has some extra complexity
> 2. We are trying to standardize cpr as the primary method for local migration,

I believe, that we may do local migration of devices with FDs natively,
through one migration channel, without CPR.

In my opinion, CPR breaks migration architecture, creating additional
state, which owns mixed pieces of different devices (and sometimes,
not only FDs, I heard).

Instead we can keep device state all in device state description,
including FDs if needed.

Also, second migration channel, and the fact that on target we can't
access QMP until we say "migrate" on source seems to me an unnecessary
load on the user and management software, we can avoid this.

Next, as I understand, the only point, why we use CPR for devices, is
avoiding rework of initialization code of some devices, which wants to
have FDs at early stage. But that approach can't be applied
everywhere. An example is vhost-user-blk: you have to rework
initialization code anyway, as if you simple pass FDs to the target in
CPR state, when source is still running, target will simple break the
source, touching the FDs. And, if we can't touch FDs until source stop
- it's actually a usual migration, and we can pass FDs through main
migration channel, doing necessary things in pre-save and post-load,
as usual.

Hmm, looking at patch 01 here, I understand, that virtio-net/TAP does
suffer from same problem? That we actually must not use passed FDs on
target, when source is still running? But stopping source earlier
means increase freeze-time. I think, if we can avoid it (and we can)
we should avoid it.

> so the benefit of supporting non-cpr local transfers is slightly double edged
> 

So, I think, if we plan that there would be more and more devices,
supporting FDs local migration, and we have any change of fitting them
into the "old" migration architecture (without CPR), we should try it.

--

Hm, I don't have a full picture of CPR, it's not only device migration,
but also some other things? Interesting, how much feasible is to move
all these things into main migration channel. That's the question I
can't answer now. But even if keep CPR for some non-device things, it
seems still good to keep the whole state description for a device in
one place - in device code, like it was historically.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 0/8] Live update: tap and vhost
  2026-02-03  9:57       ` Vladimir Sementsov-Ogievskiy
@ 2026-02-03 19:17         ` Peter Xu
  2026-02-03 19:46           ` Chaney, Ben
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2026-02-03 19:17 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Chaney, Ben, qemu-devel@nongnu.org, Fabiano Rosas,
	Michael S. Tsirkin, Stefano Garzarella, Jason Wang,
	Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Mark Kanda, Hunt, Joshua,
	Tottenham, Max, Steve Sistare

On Tue, Feb 03, 2026 at 12:57:16PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On 02.02.26 18:42, Chaney, Ben wrote:
> > 
> > 
> > On 2/2/26, 9:07 AM, "Peter Xu" <peterx@redhat.com <mailto:peterx@redhat.com>> wrote:
> > 
> > > > The latest version was
> > > > 
> > > > [PATCH v9 0/8] virtio-net: live-TAP local migration
> > > > https://urldefense.com/v3/__https://lore.kernel.org/qemu-devel/20251030203116.870742-1-vsementsov@yandex-team.ru <mailto:20251030203116.870742-1-vsementsov@yandex-team.ru>/__;!!GjvTz_vk!T9a83fcFgNhv7DgAAfFINb-FE6knVp53t5c7XkCrY76jAhmGxZz42YqHIlPt-eV_hh7V-OksMeBMsw$
> > > > 
> > > > and I plan to post v10 soon.
> > 
> > 
> > > Yes, thanks for re-raising this. If we have similar features being
> > > proposed, we should always discuss whether we should stick with one of them
> > > if that'll work for all.
> > 
> > 
> > > IIUC Vladimir's solution looks indeed superior in that it has less
> > > constraints, and also works for CPR mode.
> > 
> > 
> > This was previously discussed here: https://lore.kernel.org/all/ef7fd47a-f7c0-4bca-823c-07005c5f1959@yandex-team.ru/
> > 
> > My impression from that discussion is that
> > 
> > 1. Vladimir's solution has some extra complexity
> > 2. We are trying to standardize cpr as the primary method for local migration,
> 
> I believe, that we may do local migration of devices with FDs natively,
> through one migration channel, without CPR.
> 
> In my opinion, CPR breaks migration architecture, creating additional
> state, which owns mixed pieces of different devices (and sometimes,
> not only FDs, I heard).
> 
> Instead we can keep device state all in device state description,
> including FDs if needed.
> 
> Also, second migration channel, and the fact that on target we can't
> access QMP until we say "migrate" on source seems to me an unnecessary
> load on the user and management software, we can avoid this.
> 
> Next, as I understand, the only point, why we use CPR for devices, is
> avoiding rework of initialization code of some devices, which wants to
> have FDs at early stage. But that approach can't be applied
> everywhere. An example is vhost-user-blk: you have to rework
> initialization code anyway, as if you simple pass FDs to the target in
> CPR state, when source is still running, target will simple break the
> source, touching the FDs. And, if we can't touch FDs until source stop
> - it's actually a usual migration, and we can pass FDs through main
> migration channel, doing necessary things in pre-save and post-load,
> as usual.
> 
> Hmm, looking at patch 01 here, I understand, that virtio-net/TAP does
> suffer from same problem? That we actually must not use passed FDs on
> target, when source is still running? But stopping source earlier
> means increase freeze-time. I think, if we can avoid it (and we can)
> we should avoid it.
> 
> > so the benefit of supporting non-cpr local transfers is slightly double edged
> > 
> 
> So, I think, if we plan that there would be more and more devices,
> supporting FDs local migration, and we have any change of fitting them
> into the "old" migration architecture (without CPR), we should try it.

Well explained, thank you Vladimir.  I wish some day we can move all at
least cpr-transfer users to local-migration and deprecate CPR if ever
possible.  The uncertainty to me is cpr-exec, but I really don't know how
much mgmt is adopting cpr-exec..  cpr-reboot also looks pretty special and
may not be relevant.

The core idea (originated from Steve..) is really about fd sharing, and
it's great if we can do it in a cleaner way.

> 
> --
> 
> Hm, I don't have a full picture of CPR, it's not only device migration,
> but also some other things? Interesting, how much feasible is to move
> all these things into main migration channel. That's the question I
> can't answer now. But even if keep CPR for some non-device things, it
> seems still good to keep the whole state description for a device in
> one place - in device code, like it was historically.

My understanding is there're some special mgmt (Oracle's?) that may depend
on cpr-exec; I'm not sure how far that went in any downstream deployment.

That should be able to reuse mgmt channels too (relevant to chardev fd
sharing, perhaps?) instead of requiring e.g. all monitor ports to reconnect
to a new QEMU after migration.  Said that, I always assumed re-connect is
fine, and most mgmt supports live migration so the mgmt should have that
infrastructure there already.  Maybe Ben would know better.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 0/8] Live update: tap and vhost
  2026-02-03 19:17         ` Peter Xu
@ 2026-02-03 19:46           ` Chaney, Ben
  2026-02-03 20:04             ` Mark Kanda
  0 siblings, 1 reply; 22+ messages in thread
From: Chaney, Ben @ 2026-02-03 19:46 UTC (permalink / raw)
  To: Peter Xu, Vladimir Sementsov-Ogievskiy, Mark Kanda
  Cc: qemu-devel@nongnu.org, Fabiano Rosas, Michael S. Tsirkin,
	Stefano Garzarella, Jason Wang, Alex Williamson,
	Cédric Le Goater, Eric Blake, Markus Armbruster, Stefan Weil,
	Daniel P. Berrangé, Paolo Bonzini, Hamza Khan, Hunt, Joshua,
	Tottenham, Max


> Well explained, thank you Vladimir. I wish some day we can move all at
> least cpr-transfer users to local-migration and deprecate CPR if ever
> possible. The uncertainty to me is cpr-exec, but I really don't know how
> much mgmt is adopting cpr-exec.. cpr-reboot also looks pretty special and
> may not be relevant.


> The core idea (originated from Steve..) is really about fd sharing, and
> it's great if we can do it in a cleaner way.

Thanks for the clarification. If that is the case then probably Vladimir's
solution is preferable. I sent some comments on the prerequisite
refactoring patch set. I'll try to review the main set soon.


> My understanding is there're some special mgmt (Oracle's?) that may depend
> on cpr-exec; I'm not sure how far that went in any downstream deployment.


> That should be able to reuse mgmt channels too (relevant to chardev fd
> sharing, perhaps?) instead of requiring e.g. all monitor ports to reconnect
> to a new QEMU after migration. Said that, I always assumed re-connect is
> fine, and most mgmt supports live migration so the mgmt should have that
> infrastructure there already. Maybe Ben would know better.

I'm missing a lot of context here as I only became more involved in this project
recently. @Mark Kanda Can you provide any context about Steve's work before
he retired, or Oracle's usage of these features?

Thanks,
        Ben


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 0/8] Live update: tap and vhost
  2026-02-03 19:46           ` Chaney, Ben
@ 2026-02-03 20:04             ` Mark Kanda
  2026-02-03 20:47               ` Peter Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Mark Kanda @ 2026-02-03 20:04 UTC (permalink / raw)
  To: Chaney, Ben, Peter Xu, Vladimir Sementsov-Ogievskiy
  Cc: qemu-devel@nongnu.org, Fabiano Rosas, Michael S. Tsirkin,
	Stefano Garzarella, Jason Wang, Alex Williamson,
	Cédric Le Goater, Eric Blake, Markus Armbruster, Stefan Weil,
	Daniel P. Berrangé, Paolo Bonzini, Hamza Khan, Hunt, Joshua,
	Tottenham, Max

[-- Attachment #1: Type: text/plain, Size: 1557 bytes --]

On 2/3/26 1:46 PM, Chaney, Ben wrote:
>> Well explained, thank you Vladimir. I wish some day we can move all at
>> least cpr-transfer users to local-migration and deprecate CPR if ever
>> possible. The uncertainty to me is cpr-exec, but I really don't know how
>> much mgmt is adopting cpr-exec.. cpr-reboot also looks pretty special and
>> may not be relevant.
>> The core idea (originated from Steve..) is really about fd sharing, and
>> it's great if we can do it in a cleaner way.
> Thanks for the clarification. If that is the case then probably Vladimir's
> solution is preferable. I sent some comments on the prerequisite
> refactoring patch set. I'll try to review the main set soon.
>
>> My understanding is there're some special mgmt (Oracle's?) that may depend
>> on cpr-exec; I'm not sure how far that went in any downstream deployment.
> That should be able to reuse mgmt channels too (relevant to chardev fd
> sharing, perhaps?) instead of requiring e.g. all monitor ports to reconnect
> to a new QEMU after migration. Said that, I always assumed re-connect is
> fine, and most mgmt supports live migration so the mgmt should have that
> infrastructure there already. Maybe Ben would know better.
>
> I'm missing a lot of context here as I only became more involved in this project
> recently. @Mark Kanda Can you provide any context about Steve's work before
> he retired, or Oracle's usage of these features?

We (Oracle) have an internal VM manager which relies on cpr-exec, and would
like to continue supporting it.

Thanks/regards,
-Mark

[-- Attachment #2: Type: text/html, Size: 2328 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 0/8] Live update: tap and vhost
  2026-02-03 20:04             ` Mark Kanda
@ 2026-02-03 20:47               ` Peter Xu
  2026-02-04  7:56                 ` Vladimir Sementsov-Ogievskiy
  0 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2026-02-03 20:47 UTC (permalink / raw)
  To: Mark Kanda
  Cc: Chaney, Ben, Vladimir Sementsov-Ogievskiy, qemu-devel@nongnu.org,
	Fabiano Rosas, Michael S. Tsirkin, Stefano Garzarella, Jason Wang,
	Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Hunt, Joshua, Tottenham, Max

On Tue, Feb 03, 2026 at 02:04:21PM -0600, Mark Kanda wrote:
> On 2/3/26 1:46 PM, Chaney, Ben wrote:
> >> Well explained, thank you Vladimir. I wish some day we can move all at
> >> least cpr-transfer users to local-migration and deprecate CPR if ever
> >> possible. The uncertainty to me is cpr-exec, but I really don't know how
> >> much mgmt is adopting cpr-exec.. cpr-reboot also looks pretty special and
> >> may not be relevant.
> >> The core idea (originated from Steve..) is really about fd sharing, and
> >> it's great if we can do it in a cleaner way.
> > Thanks for the clarification. If that is the case then probably Vladimir's
> > solution is preferable. I sent some comments on the prerequisite
> > refactoring patch set. I'll try to review the main set soon.
> >
> >> My understanding is there're some special mgmt (Oracle's?) that may depend
> >> on cpr-exec; I'm not sure how far that went in any downstream deployment.
> > That should be able to reuse mgmt channels too (relevant to chardev fd
> > sharing, perhaps?) instead of requiring e.g. all monitor ports to reconnect
> > to a new QEMU after migration. Said that, I always assumed re-connect is
> > fine, and most mgmt supports live migration so the mgmt should have that
> > infrastructure there already. Maybe Ben would know better.
> >
> > I'm missing a lot of context here as I only became more involved in this project
> > recently. @Mark Kanda Can you provide any context about Steve's work before
> > he retired, or Oracle's usage of these features?
> 
> We (Oracle) have an internal VM manager which relies on cpr-exec, and would
> like to continue supporting it.

IIUC that means QEMU upstream needs to start merging two solutions for not
only migration but also vhost, and maybe more in the future.  Or we reject
Vladimir's work, but frankly I think that's indeed superior and less hacky
when without the need of the 2nd channel.

Can we try to reduce the duplicated logics between the two solutions?

For example, is it possible to rebase this series on top of Vladimir's
work, reusing logics as much as possible so that those FDs can also be
preserved via execve() and reused in the after-exec world (likely reused
not during fd open, but instead loadvm)?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 0/8] Live update: tap and vhost
  2026-02-03 20:47               ` Peter Xu
@ 2026-02-04  7:56                 ` Vladimir Sementsov-Ogievskiy
  2026-02-04 16:34                   ` Peter Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Vladimir Sementsov-Ogievskiy @ 2026-02-04  7:56 UTC (permalink / raw)
  To: Peter Xu, Mark Kanda
  Cc: Chaney, Ben, qemu-devel@nongnu.org, Fabiano Rosas,
	Michael S. Tsirkin, Stefano Garzarella, Jason Wang,
	Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Hunt, Joshua, Tottenham, Max

On 03.02.26 23:47, Peter Xu wrote:
> On Tue, Feb 03, 2026 at 02:04:21PM -0600, Mark Kanda wrote:
>> On 2/3/26 1:46 PM, Chaney, Ben wrote:
>>>> Well explained, thank you Vladimir. I wish some day we can move all at
>>>> least cpr-transfer users to local-migration and deprecate CPR if ever
>>>> possible. The uncertainty to me is cpr-exec, but I really don't know how
>>>> much mgmt is adopting cpr-exec.. cpr-reboot also looks pretty special and
>>>> may not be relevant.
>>>> The core idea (originated from Steve..) is really about fd sharing, and
>>>> it's great if we can do it in a cleaner way.
>>> Thanks for the clarification. If that is the case then probably Vladimir's
>>> solution is preferable. I sent some comments on the prerequisite
>>> refactoring patch set. I'll try to review the main set soon.
>>>
>>>> My understanding is there're some special mgmt (Oracle's?) that may depend
>>>> on cpr-exec; I'm not sure how far that went in any downstream deployment.
>>> That should be able to reuse mgmt channels too (relevant to chardev fd
>>> sharing, perhaps?) instead of requiring e.g. all monitor ports to reconnect
>>> to a new QEMU after migration. Said that, I always assumed re-connect is
>>> fine, and most mgmt supports live migration so the mgmt should have that
>>> infrastructure there already. Maybe Ben would know better.
>>>
>>> I'm missing a lot of context here as I only became more involved in this project
>>> recently. @Mark Kanda Can you provide any context about Steve's work before
>>> he retired, or Oracle's usage of these features?
>>
>> We (Oracle) have an internal VM manager which relies on cpr-exec, and would
>> like to continue supporting it.
> 
> IIUC that means QEMU upstream needs to start merging two solutions for not
> only migration but also vhost, and maybe more in the future.  Or we reject
> Vladimir's work, but frankly I think that's indeed superior and less hacky
> when without the need of the 2nd channel.
> 
> Can we try to reduce the duplicated logics between the two solutions?
> 
> For example, is it possible to rebase this series on top of Vladimir's
> work, reusing logics as much as possible so that those FDs can also be
> preserved via execve() and reused in the after-exec world (likely reused
> not during fd open, but instead loadvm)?
> 

I think, rebasing is not necessary, as local-fd migration (this series)
should work together with CPR, and even with CPR-exec. So Oracle can use cpr-exec,
and simply enable backend-transfer for virtio-net/tap, and it should work, I
remember we discussed this with Steve. I didn't yet tested it, but I'll try to
add a test for such setup.

-- 
Best regards,
Vladimir


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 6/8] tap: cpr support
  2026-01-28 20:39 ` [PATCH v4 6/8] tap: cpr support Ben Chaney
@ 2026-02-04 13:05   ` Markus Armbruster
  0 siblings, 0 replies; 22+ messages in thread
From: Markus Armbruster @ 2026-02-04 13:05 UTC (permalink / raw)
  To: Ben Chaney
  Cc: qemu-devel, Peter Xu, Fabiano Rosas, Michael S. Tsirkin,
	Stefano Garzarella, Jason Wang, Alex Williamson,
	Cédric Le Goater, Eric Blake, Stefan Weil,
	Daniel P. Berrangé, Paolo Bonzini, Hamza Khan, Mark Kanda,
	Joshua Hunt, Max Tottenham, Steve Sistare

Ben Chaney <bchaney@akamai.com> writes:

> Provide the cpr=on option to preserve TAP and vhost descriptors during
> cpr-transfer, so the management layer does not need to create a new
> device for the target.
>
> Save all tap fd's in order with the tap device fds saved first,
> and the vhostfd saved after.
>
> Example:
>
> -netdev tap,id=hostnet2,cpr=on
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> Signed-off-by: Ben Chaney <bchaney@akamai.com>

[...]

> diff --git a/qapi/net.json b/qapi/net.json
> index 118bd34965..4b12fca94b 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -355,6 +355,9 @@
>  # @poll-us: maximum number of microseconds that could be spent on busy
>  #     polling for tap (since 2.7)
>  #
> +# @cpr: preserve the state of this device and its associated file
> +#     descriptors during cpr-transfer for reduced migration downtime

(default: false) (since 11.0)

> +#
>  # Since: 1.2
>  ##
>  { 'struct': 'NetdevTapOptions',
> @@ -373,7 +376,8 @@
>      '*vhostfds':   'str',
>      '*vhostforce': 'bool',
>      '*queues':     'uint32',
> -    '*poll-us':    'uint32'} }
> +    '*poll-us':    'uint32',
> +    '*cpr':        'bool'} }
>  
>  ##
>  # @NetdevSocketOptions:

With that, QAPI schema
Acked-by: Markus Armbruster <armbru@redhat.com>



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 0/8] Live update: tap and vhost
  2026-02-04  7:56                 ` Vladimir Sementsov-Ogievskiy
@ 2026-02-04 16:34                   ` Peter Xu
  0 siblings, 0 replies; 22+ messages in thread
From: Peter Xu @ 2026-02-04 16:34 UTC (permalink / raw)
  To: Vladimir Sementsov-Ogievskiy
  Cc: Mark Kanda, Chaney, Ben, qemu-devel@nongnu.org, Fabiano Rosas,
	Michael S. Tsirkin, Stefano Garzarella, Jason Wang,
	Alex Williamson, Cédric Le Goater, Eric Blake,
	Markus Armbruster, Stefan Weil, Daniel P. Berrangé,
	Paolo Bonzini, Hamza Khan, Hunt, Joshua, Tottenham, Max

On Wed, Feb 04, 2026 at 10:56:58AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> I think, rebasing is not necessary, as local-fd migration (this series)
> should work together with CPR, and even with CPR-exec. So Oracle can use cpr-exec,
> and simply enable backend-transfer for virtio-net/tap, and it should work, I
> remember we discussed this with Steve. I didn't yet tested it, but I'll try to
> add a test for such setup.

IIUC it should work for cpr-transfer, but likely not cpr-exec.  cpr-exec
requires removal of FD_CLOEXEC for fds to be persisted.  See cpr_exec_cb()
where it invokes cpr_exec_preserve_fds() (only on top of cpr saved fds).

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-02-04 16:35 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-28 20:39 [PATCH v4 0/8] Live update: tap and vhost Ben Chaney
2026-01-28 20:39 ` [PATCH v4 1/8] migration: stop vm earlier for cpr Ben Chaney
2026-02-02 13:31   ` Fabiano Rosas
2026-01-28 20:39 ` [PATCH v4 2/8] migration: cpr setup notifier Ben Chaney
2026-02-02 14:01   ` Fabiano Rosas
2026-01-28 20:39 ` [PATCH v4 3/8] vhost: reset vhost devices for cpr Ben Chaney
2026-01-28 20:39 ` [PATCH v4 4/8] cpr: delete all fds Ben Chaney
2026-01-28 20:39 ` [PATCH v4 5/8] tap: common return label Ben Chaney
2026-01-28 20:39 ` [PATCH v4 6/8] tap: cpr support Ben Chaney
2026-02-04 13:05   ` Markus Armbruster
2026-01-28 20:39 ` [PATCH v4 7/8] tap: postload fix for cpr Ben Chaney
2026-01-28 20:39 ` [PATCH v4 8/8] tap: cpr fixes Ben Chaney
2026-01-29 13:58 ` [PATCH v4 0/8] Live update: tap and vhost Vladimir Sementsov-Ogievskiy
2026-02-02 14:06   ` Peter Xu
2026-02-02 15:42     ` Chaney, Ben
2026-02-03  9:57       ` Vladimir Sementsov-Ogievskiy
2026-02-03 19:17         ` Peter Xu
2026-02-03 19:46           ` Chaney, Ben
2026-02-03 20:04             ` Mark Kanda
2026-02-03 20:47               ` Peter Xu
2026-02-04  7:56                 ` Vladimir Sementsov-Ogievskiy
2026-02-04 16:34                   ` Peter Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.