From: "Cédric Le Goater" <clg@redhat.com>
To: Ben Chaney <bchaney@akamai.com>, qemu-devel@nongnu.org
Cc: "Peter Xu" <peterx@redhat.com>, "Fabiano Rosas" <farosas@suse.de>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Stefano Garzarella" <sgarzare@redhat.com>,
"Jason Wang" <jasowang@redhat.com>,
"Alex Williamson" <alex@shazbot.org>,
"Eric Blake" <eblake@redhat.com>,
"Markus Armbruster" <armbru@redhat.com>,
"Stefan Weil" <sw@weilnetz.de>,
"Daniel P. Berrangé" <berrange@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Hamza Khan" <hamza.khan@nutanix.com>,
"Mark Kanda" <mark.kanda@oracle.com>,
"Joshua Hunt" <johunt@akamai.com>,
"Max Tottenham" <mtottenh@akamai.com>,
"Steve Sistare" <steven.sistare@oracle.com>
Subject: Re: [PATCH v3 6/8] tap: cpr support
Date: Thu, 4 Dec 2025 18:46:21 +0100 [thread overview]
Message-ID: <fbc8007b-2667-42c6-9fdf-56147cae664d@redhat.com> (raw)
In-Reply-To: <20251203-cpr-tap-v3-6-3c12e0a61f8e@akamai.com>
On 12/3/25 19:51, Ben Chaney wrote:
> From: Steve Sistare <steven.sistare@oracle.com>
>
> Provide the cpr=on option to preserve TAP and vhost descriptors during
> cpr-transfer, so the management layer does not need to create a new
> device for the target.
>
> Save all tap fd's in canonical order, leveraging the index argument of
> cpr_save_fd. For the i'th queue, the tap device fd is saved at index 2*i,
> and the vhostfd (if any) at index 2*i+1.
>
> tap and vhost fd's are passed by name to the monitor when a NIC is hot
> plugged, but the name is not known to qemu after cpr. Allow the manager
> to pass -1 for the fd "name" in the new qemu args to indicate that QEMU
> should search for a saved value. Example:
>
> -netdev tap,id=hostnet2,fds=-1:-1,vhostfds=-1:-1,cpr=on
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> Signed-off-by: Ben Chaney <bchaney@akamai.com>
> ---
> hw/vfio/device.c | 2 +-
> include/migration/cpr.h | 2 +-
> migration/cpr.c | 11 ++++----
> net/tap.c | 73 +++++++++++++++++++++++++++++++++++++++----------
> qapi/net.json | 5 +++-
> 5 files changed, 70 insertions(+), 23 deletions(-)
>
> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
> index 76869828fc..73e622f7b5 100644
> --- a/hw/vfio/device.c
> +++ b/hw/vfio/device.c
> @@ -362,7 +362,7 @@ void vfio_device_free_name(VFIODevice *vbasedev)
>
> void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
> {
> - vbasedev->fd = cpr_get_fd_param(vbasedev->dev->id, str, 0, errp);
> + vbasedev->fd = cpr_get_fd_param(vbasedev->dev->id, str, 0, true, errp);
This looks weird to me.
This calls a cpr* routine with a 'cpr' bool argument that toggles
CPR on or off. It looks a bit hacky. Could you clarify the intention?
C.
> }
>
> static VFIODeviceIOOps vfio_device_io_ops_ioctl;
> diff --git a/include/migration/cpr.h b/include/migration/cpr.h
> index d585fadc5b..68424b4b03 100644
> --- a/include/migration/cpr.h
> +++ b/include/migration/cpr.h
> @@ -48,7 +48,7 @@ void cpr_state_close(void);
> struct QIOChannel *cpr_state_ioc(void);
>
> bool cpr_incoming_needed(void *opaque);
> -int cpr_get_fd_param(const char *name, const char *fdname, int index,
> +int cpr_get_fd_param(const char *name, const char *fdname, int index, bool cpr,
> Error **errp);
>
> QEMUFile *cpr_transfer_output(MigrationChannel *channel, Error **errp);
> diff --git a/migration/cpr.c b/migration/cpr.c
> index c0bf93a7ba..19bd56339d 100644
> --- a/migration/cpr.c
> +++ b/migration/cpr.c
> @@ -316,6 +316,7 @@ bool cpr_incoming_needed(void *opaque)
> * @name: CPR name for the descriptor
> * @fdname: An integer-valued string, or a name passed to a getfd command
> * @index: CPR index of the descriptor
> + * @cpr: use cpr
> * @errp: returned error message
> *
> * If CPR is not being performed, then use @fdname to find the fd.
> @@ -325,22 +326,22 @@ bool cpr_incoming_needed(void *opaque)
> * On success returns the fd value, else returns -1.
> */
> int cpr_get_fd_param(const char *name, const char *fdname, int index,
> - Error **errp)
> + bool cpr, Error **errp)
> {
> ERRP_GUARD();
> int fd;
>
> - if (cpr_is_incoming()) {
> + if (cpr && cpr_is_incoming()) {
> fd = cpr_find_fd(name, index);
> if (fd < 0) {
> error_setg(errp, "cannot find saved value for fd %s", fdname);
> }
> } else {
> fd = monitor_fd_param(monitor_cur(), fdname, errp);
> - if (fd >= 0) {
> - cpr_save_fd(name, index, fd);
> - } else {
> + if (fd < 0) {
> error_prepend(errp, "Could not parse object fd %s:", fdname);
> + } else if (cpr) {
> + cpr_save_fd(name, index, fd);
> }
> }
> return fd;
> diff --git a/net/tap.c b/net/tap.c
> index 9d480574c3..79e29addd1 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -35,6 +35,7 @@
> #include "net/eth.h"
> #include "net/net.h"
> #include "clients.h"
> +#include "migration/cpr.h"
> #include "monitor/monitor.h"
> #include "system/system.h"
> #include "qapi/error.h"
> @@ -80,6 +81,7 @@ typedef struct TAPState {
> bool has_uso;
> bool has_tunnel;
> bool enabled;
> + bool cpr;
> VHostNetState *vhost_net;
> unsigned host_vnet_hdr_len;
> Notifier exit;
> @@ -323,6 +325,9 @@ static void tap_cleanup(NetClientState *nc)
> {
> TAPState *s = DO_UPCAST(TAPState, nc, nc);
>
> + if (s->cpr) {
> + cpr_delete_fd_all(nc->name);
> + }
> if (s->vhost_net) {
> vhost_net_cleanup(s->vhost_net);
> g_free(s->vhost_net);
> @@ -690,18 +695,24 @@ static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
> return fd;
> }
>
> +/* CPR fd's for each queue are saved at these indices */
> +#define TAP_FD_INDEX(queue) (2 * (queue) + 0)
> +#define TAP_VHOSTFD_INDEX(queue) (2 * (queue) + 1)
> +
> #define MAX_TAP_QUEUES 1024
>
> static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
> const char *model, const char *name,
> const char *ifname, const char *script,
> const char *downscript, const char *vhostfdname,
> - int vnet_hdr, int fd, Error **errp)
> + int vnet_hdr, int fd, int index, Error **errp)
> {
> Error *err = NULL;
> TAPState *s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
> + bool cpr = tap->has_cpr ? tap->cpr : false;
> int vhostfd;
>
> + s->cpr = cpr;
> tap_set_sndbuf(s->fd, tap, &err);
> if (err) {
> error_propagate(errp, err);
> @@ -736,7 +747,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
> }
>
> if (vhostfdname) {
> - vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, &err);
> + vhostfd = cpr_get_fd_param(name, vhostfdname, index, cpr, &err);
> if (vhostfd == -1) {
> error_propagate(errp, err);
> goto failed;
> @@ -745,13 +756,22 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
> goto failed;
> }
> } else {
> - vhostfd = open("/dev/vhost-net", O_RDWR);
> + vhostfd = cpr ? cpr_find_fd(name, index) : -1;
> + if (vhostfd < 0) {
> + vhostfd = open("/dev/vhost-net", O_RDWR);
> + if (cpr && vhostfd >= 0) {
> + cpr_save_fd(name, index, vhostfd);
> + }
> + }
> if (vhostfd < 0) {
> error_setg_errno(errp, errno,
> "tap: open vhost char device failed");
> goto failed;
> }
> if (!qemu_set_blocking(vhostfd, false, errp)) {
> + if (!cpr) {
> + close(vhostfd);
> + }
> goto failed;
> }
> }
> @@ -777,6 +797,9 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
> return;
>
> failed:
> + if (cpr) {
> + cpr_delete_fd_all(name);
> + }
> qemu_del_net_client(&s->nc);
> }
>
> @@ -809,7 +832,8 @@ static int get_fds(char *str, char *fds[], int max)
> int net_init_tap(const Netdev *netdev, const char *name,
> NetClientState *peer, Error **errp)
> {
> - const NetdevTapOptions *tap;
> + const NetdevTapOptions *tap = &netdev->u.tap;
> + bool cpr = tap->has_cpr ? tap->cpr : false;
> int fd, vnet_hdr = 0, i = 0, queues;
> /* for the no-fd, no-helper case */
> const char *script;
> @@ -845,7 +869,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
> goto out;
> }
>
> - fd = monitor_fd_param(monitor_cur(), tap->fd, errp);
> + fd = cpr_get_fd_param(name, tap->fd, TAP_FD_INDEX(0), cpr, errp);
> if (fd == -1) {
> ret = -1;
> goto out;
> @@ -866,13 +890,14 @@ int net_init_tap(const Netdev *netdev, const char *name,
>
> net_init_tap_one(tap, peer, "tap", name, NULL,
> script, downscript,
> - vhostfdname, vnet_hdr, fd, &err);
> + vhostfdname, vnet_hdr, fd, TAP_VHOSTFD_INDEX(0), &err);
> if (err) {
> error_propagate(errp, err);
> close(fd);
> ret = -1;
> goto out;
> }
> +
> } else if (tap->fds) {
> char **fds;
> char **vhost_fds;
> @@ -903,7 +928,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
> }
>
> for (i = 0; i < nfds; i++) {
> - fd = monitor_fd_param(monitor_cur(), fds[i], errp);
> + fd = cpr_get_fd_param(name, fds[i], TAP_FD_INDEX(i), cpr, errp);
> if (fd == -1) {
> ret = -1;
> goto free_fail;
> @@ -930,7 +955,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
> net_init_tap_one(tap, peer, "tap", name, ifname,
> script, downscript,
> tap->vhostfds ? vhost_fds[i] : NULL,
> - vnet_hdr, fd, &err);
> + vnet_hdr, fd, TAP_VHOSTFD_INDEX(i), &err);
> if (err) {
> error_propagate(errp, err);
> ret = -1;
> @@ -958,9 +983,15 @@ free_fail:
> goto out;
> }
>
> - fd = net_bridge_run_helper(tap->helper,
> - tap->br ?: DEFAULT_BRIDGE_INTERFACE,
> - errp);
> + fd = cpr ? cpr_find_fd(name, TAP_FD_INDEX(0)) : -1;
> + if (fd < 0) {
> + fd = net_bridge_run_helper(tap->helper,
> + tap->br ?: DEFAULT_BRIDGE_INTERFACE,
> + errp);
> + if (cpr && fd >= 0) {
> + cpr_save_fd(name, TAP_FD_INDEX(0), fd);
> + }
> + }
> if (fd == -1) {
> ret = -1;
> goto out;
> @@ -980,13 +1011,14 @@ free_fail:
>
> net_init_tap_one(tap, peer, "bridge", name, ifname,
> script, downscript, vhostfdname,
> - vnet_hdr, fd, &err);
> + vnet_hdr, fd, TAP_VHOSTFD_INDEX(0), &err);
> if (err) {
> error_propagate(errp, err);
> close(fd);
> ret = -1;
> goto out;
> }
> +
> } else {
> g_autofree char *default_script = NULL;
> g_autofree char *default_downscript = NULL;
> @@ -1011,8 +1043,14 @@ free_fail:
> }
>
> for (i = 0; i < queues; i++) {
> - fd = net_tap_init(tap, &vnet_hdr, i >= 1 ? "no" : script,
> - ifname, sizeof ifname, queues > 1, errp);
> + fd = cpr ? cpr_find_fd(name, TAP_FD_INDEX(i)) : -1;
> + if (fd < 0) {
> + fd = net_tap_init(tap, &vnet_hdr, i >= 1 ? "no" : script,
> + ifname, sizeof ifname, queues > 1, errp);
> + if (cpr && fd >= 0) {
> + cpr_save_fd(name, TAP_FD_INDEX(i), fd);
> + }
> + }
> if (fd == -1) {
> ret = -1;
> goto out;
> @@ -1030,7 +1068,9 @@ free_fail:
> net_init_tap_one(tap, peer, "tap", name, ifname,
> i >= 1 ? "no" : script,
> i >= 1 ? "no" : downscript,
> - vhostfdname, vnet_hdr, fd, &err);
> + vhostfdname, vnet_hdr,
> + fd, TAP_VHOSTFD_INDEX(i),
> + &err);
> if (err) {
> error_propagate(errp, err);
> close(fd);
> @@ -1041,6 +1081,9 @@ free_fail:
> }
>
> out:
> + if (ret && cpr) {
> + cpr_delete_fd_all(name);
> + }
> return ret;
> }
>
> diff --git a/qapi/net.json b/qapi/net.json
> index 118bd34965..264213b5d9 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -355,6 +355,8 @@
> # @poll-us: maximum number of microseconds that could be spent on busy
> # polling for tap (since 2.7)
> #
> +# @cpr: preserve fds and vhostfds during cpr-transfer.
> +#
> # Since: 1.2
> ##
> { 'struct': 'NetdevTapOptions',
> @@ -373,7 +375,8 @@
> '*vhostfds': 'str',
> '*vhostforce': 'bool',
> '*queues': 'uint32',
> - '*poll-us': 'uint32'} }
> + '*poll-us': 'uint32',
> + '*cpr': 'bool'} }
>
> ##
> # @NetdevSocketOptions:
>
next prev parent reply other threads:[~2025-12-04 17:47 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-03 18:51 [PATCH v3 0/8] Live update: tap and vhost Ben Chaney
2025-12-03 18:51 ` [PATCH v3 1/8] migration: stop vm earlier for cpr Ben Chaney
2025-12-03 18:51 ` [PATCH v3 2/8] migration: cpr setup notifier Ben Chaney
2025-12-03 18:51 ` [PATCH v3 3/8] vhost: reset vhost devices for cpr Ben Chaney
2025-12-03 18:51 ` [PATCH v3 4/8] cpr: delete all fds Ben Chaney
2025-12-03 18:51 ` [PATCH v3 5/8] tap: common return label Ben Chaney
2025-12-03 18:51 ` [PATCH v3 6/8] tap: cpr support Ben Chaney
2025-12-04 8:09 ` Markus Armbruster
2025-12-05 0:51 ` Jason Wang
2025-12-05 6:46 ` Markus Armbruster
2025-12-04 17:46 ` Cédric Le Goater [this message]
2025-12-04 17:56 ` Daniel P. Berrangé
2025-12-03 18:51 ` [PATCH v3 7/8] tap: postload fix for cpr Ben Chaney
2025-12-03 18:51 ` [PATCH v3 8/8] tap: cpr fixes Ben Chaney
2025-12-04 17:59 ` Daniel P. Berrangé
2025-12-04 12:52 ` [PATCH v3 0/8] Live update: tap and vhost Vladimir Sementsov-Ogievskiy
2025-12-08 21:03 ` Chaney, Ben
2025-12-09 7:27 ` Vladimir Sementsov-Ogievskiy
2025-12-08 10:08 ` Cédric Le Goater
2025-12-08 14:22 ` Mark Kanda
2025-12-08 14:42 ` Cédric Le Goater
2025-12-09 18:36 ` Chaney, Ben
-- strict thread matches above, loose matches on Subject: below --
2025-12-03 18:43 Ben Chaney
2025-12-03 18:43 ` [PATCH v3 6/8] tap: cpr support Ben Chaney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fbc8007b-2667-42c6-9fdf-56147cae664d@redhat.com \
--to=clg@redhat.com \
--cc=alex@shazbot.org \
--cc=armbru@redhat.com \
--cc=bchaney@akamai.com \
--cc=berrange@redhat.com \
--cc=eblake@redhat.com \
--cc=farosas@suse.de \
--cc=hamza.khan@nutanix.com \
--cc=jasowang@redhat.com \
--cc=johunt@akamai.com \
--cc=mark.kanda@oracle.com \
--cc=mst@redhat.com \
--cc=mtottenh@akamai.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=sgarzare@redhat.com \
--cc=steven.sistare@oracle.com \
--cc=sw@weilnetz.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).