* [RFC V1 1/6] Revert "vhost-backend: remove vhost_kernel_reset_device()"
2024-08-30 11:56 [RFC V1 0/6] Live Update: tap and vhost Steve Sistare
@ 2024-08-30 11:56 ` Steve Sistare
2024-09-03 10:44 ` Euan Turner
2024-08-30 11:56 ` [RFC V1 2/6] tap: common return label Steve Sistare
` (4 subsequent siblings)
5 siblings, 1 reply; 10+ messages in thread
From: Steve Sistare @ 2024-08-30 11:56 UTC (permalink / raw)
To: qemu-devel
Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, Fabiano Rosas,
Euan Turner, Steve Sistare
This reverts commit e6383293eb01928692047e617665a742cca87e23.
The reset function is needed for CPR.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/virtio/vhost-backend.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
index 833804d..9b75141 100644
--- a/hw/virtio/vhost-backend.c
+++ b/hw/virtio/vhost-backend.c
@@ -221,6 +221,11 @@ static int vhost_kernel_set_owner(struct vhost_dev *dev)
return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
}
+static int vhost_kernel_reset_device(struct vhost_dev *dev)
+{
+ return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
+}
+
static int vhost_kernel_get_vq_index(struct vhost_dev *dev, int idx)
{
assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
@@ -345,6 +350,7 @@ const VhostOps kernel_ops = {
.vhost_get_features = vhost_kernel_get_features,
.vhost_set_backend_cap = vhost_kernel_set_backend_cap,
.vhost_set_owner = vhost_kernel_set_owner,
+ .vhost_reset_device = vhost_kernel_reset_device,
.vhost_get_vq_index = vhost_kernel_get_vq_index,
.vhost_vsock_set_guest_cid = vhost_kernel_vsock_set_guest_cid,
.vhost_vsock_set_running = vhost_kernel_vsock_set_running,
--
1.8.3.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [RFC V1 1/6] Revert "vhost-backend: remove vhost_kernel_reset_device()"
2024-08-30 11:56 ` [RFC V1 1/6] Revert "vhost-backend: remove vhost_kernel_reset_device()" Steve Sistare
@ 2024-09-03 10:44 ` Euan Turner
2024-09-03 19:55 ` Steven Sistare
0 siblings, 1 reply; 10+ messages in thread
From: Euan Turner @ 2024-09-03 10:44 UTC (permalink / raw)
To: Steve Sistare, qemu-devel
Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, Fabiano Rosas
Hi Steve,
On 30/08/2024 12:56, Steve Sistare wrote:
> This reverts commit e6383293eb01928692047e617665a742cca87e23.
> The reset function is needed for CPR.
>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> hw/virtio/vhost-backend.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
> index 833804d..9b75141 100644
> --- a/hw/virtio/vhost-backend.c
> +++ b/hw/virtio/vhost-backend.c
> @@ -221,6 +221,11 @@ static int vhost_kernel_set_owner(struct vhost_dev *dev)
> return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
> }
>
> +static int vhost_kernel_reset_device(struct vhost_dev *dev)
> +{
> + return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
> +}
> +
How does this series avoid falling foul of
c0c4f147291f37765a5275aa24c3e1195468903b (which follows the commit
reverted here)?
I've been playing around with this patch series a bit, in the context of
cpr-transfer, and am seeing the issues highlighted in that c0c4...
commit message:
Since vhost-kernel now has a reset_device, this is called in
virtio_reset as part of qemu_machine_creation_done. (I have the full
backtrace if it's helpful). Subsequent ioctls then fail (with ownership
errors) due to the RESET_OWNER:
2024-09-02T15:40:56.860541Z qemu-kvm: vhost_set_vring_call failed 1
2024-09-02T15:40:56.860908Z qemu-kvm: vhost_set_vring_call failed 1
2024-09-02T15:40:56.861253Z qemu-kvm: vhost_set_mem_table failed:
Operation not permitted (1)
2024-09-02T15:40:56.861586Z qemu-kvm: vhost_set_vring_call failed 1
2024-09-02T15:40:56.861831Z qemu-kvm: vhost_set_vring_call failed 1
2024-09-02T15:40:56.862199Z qemu-kvm: unable to start vhost net: 1:
falling back on userspace virtio
For me the NIC then fails during the migration, although the migration
as a whole appears to succeed. (At least, prior the the migration, I
could ssh into the VM and ping out to 8.8.8.8, but then I lose the ssh
connection during the migration, and cannot ssh back in again afterwards
on the new QEMU).
Do you think this could be because of QEMU falling back from the vhost
backend to use virtio?
It may be down to some misconfiguration on my part, here's the netdev
command line I had for reference:
On the source QEMU:
-netdev
'{"type":"tap","fd":"39","vhost":true,"vhostfd":"40","id":"hostua-43bc0eaf-ff55-44e6-87ec-a4798f592db1"}'
\
-device
'{"driver":"virtio-net-pci","rx_queue_size":256,"netdev":"hostua-43bc0eaf-ff55-44e6-87ec-a4798f592db1","id":"ua-43bc0eaf-ff55-44e6-87ec-a4798f592db1","mac":"50:6b:8d:0c:03:e0","bus":"pci.1","addr":"0x0"}'
\
On the destination QEMU:
-netdev
'{"type":"tap","fd":"-1","vhostfd":"-1","id":"hostua-43bc0eaf-ff55-44e6-87ec-a4798f592db1"}'
\
-device
'{"driver":"virtio-net-pci","rx_queue_size":256,"netdev":"hostua-43bc0eaf-ff55-44e6-87ec-a4798f592db1","id":"ua-43bc0eaf-ff55-44e6-87ec-a4798f592db1","mac":"50:6b:8d:0c:03:e0","bus":"pci.1","addr":"0x0"}'
\
> static int vhost_kernel_get_vq_index(struct vhost_dev *dev, int idx)
> {
> assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
> @@ -345,6 +350,7 @@ const VhostOps kernel_ops = {
> .vhost_get_features = vhost_kernel_get_features,
> .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
> .vhost_set_owner = vhost_kernel_set_owner,
> + .vhost_reset_device = vhost_kernel_reset_device,
> .vhost_get_vq_index = vhost_kernel_get_vq_index,
> .vhost_vsock_set_guest_cid = vhost_kernel_vsock_set_guest_cid,
> .vhost_vsock_set_running = vhost_kernel_vsock_set_running,
Thanks,
Euan
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [RFC V1 1/6] Revert "vhost-backend: remove vhost_kernel_reset_device()"
2024-09-03 10:44 ` Euan Turner
@ 2024-09-03 19:55 ` Steven Sistare
0 siblings, 0 replies; 10+ messages in thread
From: Steven Sistare @ 2024-09-03 19:55 UTC (permalink / raw)
To: Euan Turner, qemu-devel
Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, Fabiano Rosas
[-- Attachment #1: Type: text/plain, Size: 4400 bytes --]
On 9/3/2024 6:44 AM, Euan Turner wrote:
> Hi Steve,
>
> On 30/08/2024 12:56, Steve Sistare wrote:
>> This reverts commit e6383293eb01928692047e617665a742cca87e23.
>> The reset function is needed for CPR.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> hw/virtio/vhost-backend.c | 6 ++++++
>> 1 file changed, 6 insertions(+)
>>
>> diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.c
>> index 833804d..9b75141 100644
>> --- a/hw/virtio/vhost-backend.c
>> +++ b/hw/virtio/vhost-backend.c
>> @@ -221,6 +221,11 @@ static int vhost_kernel_set_owner(struct vhost_dev *dev)
>> return vhost_kernel_call(dev, VHOST_SET_OWNER, NULL);
>> }
>> +static int vhost_kernel_reset_device(struct vhost_dev *dev)
>> +{
>> + return vhost_kernel_call(dev, VHOST_RESET_OWNER, NULL);
>> +}
>> +
> How does this series avoid falling foul of c0c4f147291f37765a5275aa24c3e1195468903b (which follows the commit reverted here)?
>
> I've been playing around with this patch series a bit, in the context of cpr-transfer, and am seeing the issues highlighted in that c0c4... commit message:
> Since vhost-kernel now has a reset_device, this is called in virtio_reset as part of qemu_machine_creation_done. (I have the full backtrace if it's helpful). Subsequent ioctls then fail (with ownership errors) due to the RESET_OWNER:
>
> 2024-09-02T15:40:56.860541Z qemu-kvm: vhost_set_vring_call failed 1
> 2024-09-02T15:40:56.860908Z qemu-kvm: vhost_set_vring_call failed 1
> 2024-09-02T15:40:56.861253Z qemu-kvm: vhost_set_mem_table failed: Operation not permitted (1)
> 2024-09-02T15:40:56.861586Z qemu-kvm: vhost_set_vring_call failed 1
> 2024-09-02T15:40:56.861831Z qemu-kvm: vhost_set_vring_call failed 1
> 2024-09-02T15:40:56.862199Z qemu-kvm: unable to start vhost net: 1: falling back on userspace virtio
>
> For me the NIC then fails during the migration, although the migration as a whole appears to succeed. (At least, prior the the migration, I could ssh into the VM and ping out to 8.8.8.8, but then I lose the ssh connection during the migration, and cannot ssh back in again afterwards on the new QEMU).
>
> Do you think this could be because of QEMU falling back from the vhost backend to use virtio?
>
> It may be down to some misconfiguration on my part, here's the netdev command line I had for reference:
> On the source QEMU:
>
> -netdev '{"type":"tap","fd":"39","vhost":true,"vhostfd":"40","id":"hostua-43bc0eaf-ff55-44e6-87ec-a4798f592db1"}' \
> -device '{"driver":"virtio-net-pci","rx_queue_size":256,"netdev":"hostua-43bc0eaf-ff55-44e6-87ec-a4798f592db1","id":"ua-43bc0eaf-ff55-44e6-87ec-a4798f592db1","mac":"50:6b:8d:0c:03:e0","bus":"pci.1","addr":"0x0"}' \
>
> On the destination QEMU:
> -netdev '{"type":"tap","fd":"-1","vhostfd":"-1","id":"hostua-43bc0eaf-ff55-44e6-87ec-a4798f592db1"}' \
> -device '{"driver":"virtio-net-pci","rx_queue_size":256,"netdev":"hostua-43bc0eaf-ff55-44e6-87ec-a4798f592db1","id":"ua-43bc0eaf-ff55-44e6-87ec-a4798f592db1","mac":"50:6b:8d:0c:03:e0","bus":"pci.1","addr":"0x0"}' \
>
>> static int vhost_kernel_get_vq_index(struct vhost_dev *dev, int idx)
>> {
>> assert(idx >= dev->vq_index && idx < dev->vq_index + dev->nvqs);
>> @@ -345,6 +350,7 @@ const VhostOps kernel_ops = {
>> .vhost_get_features = vhost_kernel_get_features,
>> .vhost_set_backend_cap = vhost_kernel_set_backend_cap,
>> .vhost_set_owner = vhost_kernel_set_owner,
>> + .vhost_reset_device = vhost_kernel_reset_device,
>> .vhost_get_vq_index = vhost_kernel_get_vq_index,
>> .vhost_vsock_set_guest_cid = vhost_kernel_vsock_set_guest_cid,
>> .vhost_vsock_set_running = vhost_kernel_vsock_set_running,
>
The 6 patches in this series are only sufficient for cpr-exec mode, but I have
attached the 2 additional patches you need for cpr-transfer mode. That mode
works fine for me with those patches. When I run a similar test, vhost_reset_device
is not called on target QEMU because vhost_started is false:
qemu_machine_creation_done()
virtio_reset()
if (vdev->vhost_started && k->get_vhost)
vhost_reset_device(k->get_vhost(vdev));
I don't know why you are seeing the vhost_set_vring_call failures. I don't see those,
with or without the 2 additional patches.
- Steve
[-- Attachment #2: 0001-migration-cpr-setup-notifier.patch --]
[-- Type: text/plain, Size: 2136 bytes --]
From 4f719179546e9327cba3b74e86d2a403e36c5d71 Mon Sep 17 00:00:00 2001
From: Steve Sistare <steven.sistare@oracle.com>
Date: Tue, 20 Aug 2024 05:36:15 -0700
Subject: [PATCH 1/2] migration: cpr setup notifier
Add a cpr-setup notification point which is called before general setup.
It is needed for resetting vhost devices, as explained in the next patch.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
include/migration/misc.h | 7 ++++---
migration/migration.c | 5 +++++
2 files changed, 9 insertions(+), 3 deletions(-)
diff --git a/include/migration/misc.h b/include/migration/misc.h
index bcb6f35..bd05fe2 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -60,6 +60,7 @@ bool migration_thread_is_self(void);
bool migration_is_setup_or_active(void);
typedef enum MigrationEventType {
+ MIG_EVENT_PRECOPY_CPR_SETUP,
MIG_EVENT_PRECOPY_SETUP,
MIG_EVENT_PRECOPY_DONE,
MIG_EVENT_PRECOPY_FAILED,
@@ -71,9 +72,9 @@ typedef struct MigrationEvent {
} MigrationEvent;
/*
- * A MigrationNotifyFunc may return an error code and an Error object,
- * but only when @e->type is MIG_EVENT_PRECOPY_SETUP. The code is an int
- * to allow for different failure modes and recovery actions.
+ * A MigrationNotifyFunc may return an error code and an Error object, but
+ * only when @e->type is MIG_EVENT_PRECOPY_SETUP or MIG_EVENT_PRECOPY_CPR_SETUP.
+ * The code is an int to allow for different failure modes and recovery actions.
*/
typedef int (*MigrationNotifyFunc)(NotifierWithReturn *notify,
MigrationEvent *e, Error **errp);
diff --git a/migration/migration.c b/migration/migration.c
index 15ac8f5..8bc1975 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2201,7 +2201,12 @@ void qmp_migrate(const char *uri, bool has_channels,
stopped = true;
}
+ if (migration_call_notifiers(s, MIG_EVENT_PRECOPY_CPR_SETUP, &local_err)) {
+ goto out;
+ }
+
if (cpr_state_save(&local_err)) {
+ migration_call_notifiers(s, MIG_EVENT_PRECOPY_FAILED, NULL);
goto out;
}
--
1.8.3.1
[-- Attachment #3: 0002-vhost-cpr-transfer-support.patch --]
[-- Type: text/plain, Size: 3281 bytes --]
From 0f09781df215086f095e40b42662dd2f3378ca7f Mon Sep 17 00:00:00 2001
From: Steve Sistare <steven.sistare@oracle.com>
Date: Tue, 20 Aug 2024 05:48:08 -0700
Subject: [PATCH 2/2] vhost cpr-transfer support
For cpr-transfer mode, a vhost device must be reset before CPR state
is sent to new QEMU, because after state is sent, new QEMU initializes
the device and calls set_owner. Install a cpr-setup notifier to do so.
This still works for cpr-exec mode.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/virtio/vhost.c | 17 +++++++++++++++--
include/hw/virtio/vhost.h | 1 +
2 files changed, 16 insertions(+), 2 deletions(-)
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 54c5ec7..31c72d1 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1513,16 +1513,24 @@ static int vhost_cpr_notifier(NotifierWithReturn *notifier,
struct vhost_dev *dev;
int r;
- dev = container_of(notifier, struct vhost_dev, cpr_exec_notifier);
+ dev = (migrate_mode() == MIG_MODE_CPR_EXEC) ?
+ container_of(notifier, struct vhost_dev, cpr_exec_notifier) :
+ container_of(notifier, struct vhost_dev, cpr_transfer_notifier);
+
if (dev->vhost_ops->backend_type != VHOST_BACKEND_TYPE_KERNEL) {
return 0;
}
- if (e->type == MIG_EVENT_PRECOPY_DONE) {
+ if (e->type == MIG_EVENT_PRECOPY_CPR_SETUP) {
r = dev->vhost_ops->vhost_reset_device(dev);
if (r < 0) {
VHOST_OPS_DEBUG(r, "vhost_reset_device failed");
}
+ } else if (e->type == MIG_EVENT_PRECOPY_FAILED) {
+ r = dev->vhost_ops->vhost_set_owner(dev);
+ if (r < 0) {
+ VHOST_OPS_DEBUG(r, "vhost_set_owner failed");
+ }
}
return 0;
}
@@ -1538,6 +1546,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
hdev->vdev = NULL;
hdev->migration_blocker = NULL;
hdev->cpr_exec_notifier.notify = NULL;
+ hdev->cpr_transfer_notifier.notify = NULL;
r = vhost_set_backend_type(hdev, backend_type);
assert(r >= 0);
@@ -1641,6 +1650,9 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
migration_add_notifier_mode(&hdev->cpr_exec_notifier,
vhost_cpr_notifier,
MIG_MODE_CPR_EXEC);
+ migration_add_notifier_mode(&hdev->cpr_transfer_notifier,
+ vhost_cpr_notifier,
+ MIG_MODE_CPR_TRANSFER);
QLIST_INSERT_HEAD(&vhost_devices, hdev, entry);
/*
@@ -1698,6 +1710,7 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
}
migrate_del_blocker(&hdev->migration_blocker);
migration_remove_notifier(&hdev->cpr_exec_notifier);
+ migration_remove_notifier(&hdev->cpr_transfer_notifier);
g_free(hdev->mem);
g_free(hdev->mem_sections);
if (hdev->vhost_ops) {
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index e9aca11..ab7e874 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -133,6 +133,7 @@ struct vhost_dev {
QLIST_HEAD(, vhost_iommu) iommu_list;
IOMMUNotifier n;
NotifierWithReturn cpr_exec_notifier;
+ NotifierWithReturn cpr_transfer_notifier;
const VhostDevConfigOps *config_ops;
};
--
1.8.3.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC V1 2/6] tap: common return label
2024-08-30 11:56 [RFC V1 0/6] Live Update: tap and vhost Steve Sistare
2024-08-30 11:56 ` [RFC V1 1/6] Revert "vhost-backend: remove vhost_kernel_reset_device()" Steve Sistare
@ 2024-08-30 11:56 ` Steve Sistare
2024-08-30 11:56 ` [RFC V1 3/6] tap: fix net_init_tap() return code Steve Sistare
` (3 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Steve Sistare @ 2024-08-30 11:56 UTC (permalink / raw)
To: qemu-devel
Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, Fabiano Rosas,
Euan Turner, Steve Sistare
Modify net_init_tap so every return branches to a common label, for
common cleanup in a subsequent patch. No functional change.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
net/tap.c | 54 ++++++++++++++++++++++++++++++++++++------------------
1 file changed, 36 insertions(+), 18 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index 51f7aec..8deabcb 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -774,7 +774,8 @@ int net_init_tap(const Netdev *netdev, const char *name,
* For -netdev, peer is always NULL. */
if (peer && (tap->has_queues || tap->fds || tap->vhostfds)) {
error_setg(errp, "Multiqueue tap cannot be used with hubs");
- return -1;
+ ret = -1;
+ goto out;
}
if (tap->fd) {
@@ -784,25 +785,29 @@ int net_init_tap(const Netdev *netdev, const char *name,
error_setg(errp, "ifname=, script=, downscript=, vnet_hdr=, "
"helper=, queues=, fds=, and vhostfds= "
"are invalid with fd=");
- return -1;
+ ret = -1;
+ goto out;
}
fd = monitor_fd_param(monitor_cur(), tap->fd, errp);
if (fd == -1) {
- return -1;
+ ret = -1;
+ goto out;
}
if (!g_unix_set_fd_nonblocking(fd, true, NULL)) {
error_setg_errno(errp, errno, "%s: Can't use file descriptor %d",
name, fd);
close(fd);
- return -1;
+ ret = -1;
+ goto out;
}
vnet_hdr = tap_probe_vnet_hdr(fd, errp);
if (vnet_hdr < 0) {
close(fd);
- return -1;
+ ret = -1;
+ goto out;
}
net_init_tap_one(tap, peer, "tap", name, NULL,
@@ -811,7 +816,8 @@ int net_init_tap(const Netdev *netdev, const char *name,
if (err) {
error_propagate(errp, err);
close(fd);
- return -1;
+ ret = -1;
+ goto out;
}
} else if (tap->fds) {
char **fds;
@@ -824,7 +830,8 @@ int net_init_tap(const Netdev *netdev, const char *name,
error_setg(errp, "ifname=, script=, downscript=, vnet_hdr=, "
"helper=, queues=, and vhostfd= "
"are invalid with fds=");
- return -1;
+ ret = -1;
+ goto out;
}
fds = g_new0(char *, MAX_TAP_QUEUES);
@@ -888,30 +895,35 @@ free_fail:
}
g_free(fds);
g_free(vhost_fds);
- return ret;
+ goto out;
+
} else if (tap->helper) {
if (tap->ifname || tap->script || tap->downscript ||
tap->has_vnet_hdr || tap->has_queues || tap->vhostfds) {
error_setg(errp, "ifname=, script=, downscript=, vnet_hdr=, "
"queues=, and vhostfds= are invalid with helper=");
- return -1;
+ ret = -1;
+ goto out;
}
fd = net_bridge_run_helper(tap->helper,
tap->br ?: DEFAULT_BRIDGE_INTERFACE,
errp);
if (fd == -1) {
- return -1;
+ ret = -1;
+ goto out;
}
if (!g_unix_set_fd_nonblocking(fd, true, NULL)) {
error_setg_errno(errp, errno, "Failed to set FD nonblocking");
- return -1;
+ ret = -1;
+ goto out;
}
vnet_hdr = tap_probe_vnet_hdr(fd, errp);
if (vnet_hdr < 0) {
close(fd);
- return -1;
+ ret = -1;
+ goto out;
}
net_init_tap_one(tap, peer, "bridge", name, ifname,
@@ -920,14 +932,16 @@ free_fail:
if (err) {
error_propagate(errp, err);
close(fd);
- return -1;
+ ret = -1;
+ goto out;
}
} else {
g_autofree char *default_script = NULL;
g_autofree char *default_downscript = NULL;
if (tap->vhostfds) {
error_setg(errp, "vhostfds= is invalid if fds= wasn't specified");
- return -1;
+ ret = -1;
+ goto out;
}
if (!script) {
@@ -948,14 +962,16 @@ free_fail:
fd = net_tap_init(tap, &vnet_hdr, i >= 1 ? "no" : script,
ifname, sizeof ifname, queues > 1, errp);
if (fd == -1) {
- return -1;
+ ret = -1;
+ goto out;
}
if (queues > 1 && i == 0 && !tap->ifname) {
if (tap_fd_get_ifname(fd, ifname)) {
error_setg(errp, "Fail to get ifname");
close(fd);
- return -1;
+ ret = -1;
+ goto out;
}
}
@@ -966,12 +982,14 @@ free_fail:
if (err) {
error_propagate(errp, err);
close(fd);
- return -1;
+ ret = -1;
+ goto out;
}
}
}
- return 0;
+out:
+ return ret;
}
VHostNetState *tap_get_vhost_net(NetClientState *nc)
--
1.8.3.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC V1 3/6] tap: fix net_init_tap() return code
2024-08-30 11:56 [RFC V1 0/6] Live Update: tap and vhost Steve Sistare
2024-08-30 11:56 ` [RFC V1 1/6] Revert "vhost-backend: remove vhost_kernel_reset_device()" Steve Sistare
2024-08-30 11:56 ` [RFC V1 2/6] tap: common return label Steve Sistare
@ 2024-08-30 11:56 ` Steve Sistare
2024-08-30 11:56 ` [RFC V1 4/6] migration: cpr_get_fd_param helper Steve Sistare
` (2 subsequent siblings)
5 siblings, 0 replies; 10+ messages in thread
From: Steve Sistare @ 2024-08-30 11:56 UTC (permalink / raw)
To: qemu-devel
Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, Fabiano Rosas,
Euan Turner, Steve Sistare
When net_init_tap() succeeds for a multi-queue device, it returns a
non-zero ret=1 code to its caller, because of this code where ret becomes
1 when g_unix_set_fd_nonblocking succeeds. Luckily, the only current call
site checks for negative, rather than non-zero.
ret = g_unix_set_fd_nonblocking(fd, true, NULL);
if (!ret) {
...
goto free_fail;
Also, if g_unix_set_fd_nonblocking fails (though unlikely), ret=0 is
returned, and the caller will use a broken interface.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
net/tap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index 8deabcb..20e4dae 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -855,8 +855,8 @@ int net_init_tap(const Netdev *netdev, const char *name,
goto free_fail;
}
- ret = g_unix_set_fd_nonblocking(fd, true, NULL);
- if (!ret) {
+ if (!g_unix_set_fd_nonblocking(fd, true, NULL)) {
+ ret = -1;
error_setg_errno(errp, errno, "%s: Can't use file descriptor %d",
name, fd);
goto free_fail;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC V1 4/6] migration: cpr_get_fd_param helper
2024-08-30 11:56 [RFC V1 0/6] Live Update: tap and vhost Steve Sistare
` (2 preceding siblings ...)
2024-08-30 11:56 ` [RFC V1 3/6] tap: fix net_init_tap() return code Steve Sistare
@ 2024-08-30 11:56 ` Steve Sistare
2024-09-03 15:38 ` Fabiano Rosas
2024-08-30 11:56 ` [RFC V1 5/6] tap: cpr support Steve Sistare
2024-08-30 11:56 ` [RFC V1 6/6] tap: postload fix for cpr Steve Sistare
5 siblings, 1 reply; 10+ messages in thread
From: Steve Sistare @ 2024-08-30 11:56 UTC (permalink / raw)
To: qemu-devel
Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, Fabiano Rosas,
Euan Turner, Steve Sistare
Add the helper function cpr_get_fd_param, for use by tap and vdpa.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
include/migration/cpr.h | 2 ++
migration/cpr.c | 26 ++++++++++++++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
index 9a76365..b5fa264 100644
--- a/include/migration/cpr.h
+++ b/include/migration/cpr.h
@@ -27,6 +27,8 @@ void cpr_set_cpr_uri(const char *uri);
int cpr_state_save(Error **errp);
int cpr_state_load(Error **errp);
bool cpr_needed_for_reuse(void *opaque);
+int cpr_get_fd_param(const char *name, const char *fdname,
+ int index, Error **errp);
QEMUFile *cpr_exec_output(Error **errp);
QEMUFile *cpr_exec_input(Error **errp);
diff --git a/migration/cpr.c b/migration/cpr.c
index 096d684..0940ee2 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -13,6 +13,7 @@
#include "migration/qemu-file.h"
#include "migration/savevm.h"
#include "migration/vmstate.h"
+#include "monitor/monitor.h"
#include "sysemu/runstate.h"
#include "trace.h"
@@ -268,3 +269,28 @@ bool cpr_needed_for_reuse(void *opaque)
MigMode mode = migrate_mode();
return mode == MIG_MODE_CPR_EXEC || mode == MIG_MODE_CPR_TRANSFER;
}
+
+int cpr_get_fd_param(const char *name, const char *fdname, int index,
+ Error **errp)
+{
+ int fd;
+
+ /*
+ * fd's are passed by name to the monitor when a NIC is hot plugged, but
+ * the name is not known to qemu after cpr. The manager can pass -1 for
+ * the fd "name" in the new qemu args to indicate that qemu should search
+ * for a saved name.
+ */
+ if (!strcmp(fdname, "-1")) {
+ fd = cpr_find_fd(name, index);
+ if (fd < 0) {
+ error_setg(errp, "cannot find saved value for fd %s", fdname);
+ }
+ } else {
+ fd = monitor_fd_param(monitor_cur(), fdname, errp);
+ }
+ if (fd >= 0) {
+ cpr_resave_fd(name, index, fd);
+ }
+ return fd;
+}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC V1 5/6] tap: cpr support
2024-08-30 11:56 [RFC V1 0/6] Live Update: tap and vhost Steve Sistare
` (3 preceding siblings ...)
2024-08-30 11:56 ` [RFC V1 4/6] migration: cpr_get_fd_param helper Steve Sistare
@ 2024-08-30 11:56 ` Steve Sistare
2024-08-30 11:56 ` [RFC V1 6/6] tap: postload fix for cpr Steve Sistare
5 siblings, 0 replies; 10+ messages in thread
From: Steve Sistare @ 2024-08-30 11:56 UTC (permalink / raw)
To: qemu-devel
Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, Fabiano Rosas,
Euan Turner, Steve Sistare
Save all tap fd's in canonical order, leveraging the index argument of
cpr_save_fd. For the i'th queue, the tap device fd is saved at index 2*i,
and the vhostfd (if any) at index 2*i+1.
tap and vhost fd's are passed by name to the monitor when a NIC is hot
plugged, but the name is not known to qemu after cpr-exec. Allow the
manager to pass -1 for the fd "name" in the new qemu args to indicate that
qemu should search for a saved value. Example:
-netdev tap,id=hostnet2,fds=-1:-1,vhostfds=-1:-1
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
net/tap.c | 55 +++++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 41 insertions(+), 14 deletions(-)
diff --git a/net/tap.c b/net/tap.c
index 20e4dae..4d57a53 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -35,6 +35,7 @@
#include "net/eth.h"
#include "net/net.h"
#include "clients.h"
+#include "migration/cpr.h"
#include "monitor/monitor.h"
#include "sysemu/sysemu.h"
#include "qapi/error.h"
@@ -290,6 +291,8 @@ static void tap_cleanup(NetClientState *nc)
{
TAPState *s = DO_UPCAST(TAPState, nc, nc);
+ cpr_delete_fd_all(nc->name);
+
if (s->vhost_net) {
vhost_net_cleanup(s->vhost_net);
g_free(s->vhost_net);
@@ -636,13 +639,17 @@ static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr,
return fd;
}
+/* CPR fd's for each queue are saved at these indices */
+#define TAP_FD_INDEX(queue) (2 * (queue) + 0)
+#define TAP_VHOSTFD_INDEX(queue) (2 * (queue) + 1)
+
#define MAX_TAP_QUEUES 1024
static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
const char *model, const char *name,
const char *ifname, const char *script,
const char *downscript, const char *vhostfdname,
- int vnet_hdr, int fd, Error **errp)
+ int vnet_hdr, int fd, int index, Error **errp)
{
Error *err = NULL;
TAPState *s = net_tap_fd_init(peer, model, name, fd, vnet_hdr);
@@ -682,7 +689,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
}
if (vhostfdname) {
- vhostfd = monitor_fd_param(monitor_cur(), vhostfdname, &err);
+ vhostfd = cpr_get_fd_param(name, vhostfdname, index, &err);
if (vhostfd == -1) {
error_propagate(errp, err);
goto failed;
@@ -693,7 +700,10 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
goto failed;
}
} else {
- vhostfd = open("/dev/vhost-net", O_RDWR);
+ vhostfd = cpr_find_fd(name, index);
+ if (vhostfd < 0) {
+ vhostfd = open("/dev/vhost-net", O_RDWR);
+ }
if (vhostfd < 0) {
error_setg_errno(errp, errno,
"tap: open vhost char device failed");
@@ -703,6 +713,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
error_setg_errno(errp, errno, "Failed to set FD nonblocking");
goto failed;
}
+ cpr_resave_fd(name, index, vhostfd);
}
options.opaque = (void *)(uintptr_t)vhostfd;
options.nvqs = 2;
@@ -721,6 +732,7 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
return;
failed:
+ cpr_delete_fd_all(name);
qemu_del_net_client(&s->nc);
}
@@ -789,7 +801,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
goto out;
}
- fd = monitor_fd_param(monitor_cur(), tap->fd, errp);
+ fd = cpr_get_fd_param(name, tap->fd, TAP_FD_INDEX(0), errp);
if (fd == -1) {
ret = -1;
goto out;
@@ -812,13 +824,14 @@ int net_init_tap(const Netdev *netdev, const char *name,
net_init_tap_one(tap, peer, "tap", name, NULL,
script, downscript,
- vhostfdname, vnet_hdr, fd, &err);
+ vhostfdname, vnet_hdr, fd, TAP_VHOSTFD_INDEX(0), &err);
if (err) {
error_propagate(errp, err);
close(fd);
ret = -1;
goto out;
}
+
} else if (tap->fds) {
char **fds;
char **vhost_fds;
@@ -849,7 +862,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
}
for (i = 0; i < nfds; i++) {
- fd = monitor_fd_param(monitor_cur(), fds[i], errp);
+ fd = cpr_get_fd_param(name, fds[i], TAP_FD_INDEX(i), errp);
if (fd == -1) {
ret = -1;
goto free_fail;
@@ -878,7 +891,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
net_init_tap_one(tap, peer, "tap", name, ifname,
script, downscript,
tap->vhostfds ? vhost_fds[i] : NULL,
- vnet_hdr, fd, &err);
+ vnet_hdr, fd, TAP_VHOSTFD_INDEX(i), &err);
if (err) {
error_propagate(errp, err);
ret = -1;
@@ -906,9 +919,12 @@ free_fail:
goto out;
}
- fd = net_bridge_run_helper(tap->helper,
- tap->br ?: DEFAULT_BRIDGE_INTERFACE,
- errp);
+ fd = cpr_find_fd(name, TAP_FD_INDEX(0));
+ if (fd < 0) {
+ fd = net_bridge_run_helper(tap->helper,
+ tap->br ?: DEFAULT_BRIDGE_INTERFACE,
+ errp);
+ }
if (fd == -1) {
ret = -1;
goto out;
@@ -928,13 +944,15 @@ free_fail:
net_init_tap_one(tap, peer, "bridge", name, ifname,
script, downscript, vhostfdname,
- vnet_hdr, fd, &err);
+ vnet_hdr, fd, TAP_VHOSTFD_INDEX(0), &err);
if (err) {
error_propagate(errp, err);
close(fd);
ret = -1;
goto out;
}
+ cpr_resave_fd(name, TAP_FD_INDEX(0), fd);
+
} else {
g_autofree char *default_script = NULL;
g_autofree char *default_downscript = NULL;
@@ -959,8 +977,11 @@ free_fail:
}
for (i = 0; i < queues; i++) {
- fd = net_tap_init(tap, &vnet_hdr, i >= 1 ? "no" : script,
- ifname, sizeof ifname, queues > 1, errp);
+ fd = cpr_find_fd(name, TAP_FD_INDEX(i));
+ if (fd < 0) {
+ fd = net_tap_init(tap, &vnet_hdr, i >= 1 ? "no" : script,
+ ifname, sizeof ifname, queues > 1, errp);
+ }
if (fd == -1) {
ret = -1;
goto out;
@@ -978,17 +999,23 @@ free_fail:
net_init_tap_one(tap, peer, "tap", name, ifname,
i >= 1 ? "no" : script,
i >= 1 ? "no" : downscript,
- vhostfdname, vnet_hdr, fd, &err);
+ vhostfdname, vnet_hdr,
+ fd, TAP_VHOSTFD_INDEX(i),
+ &err);
if (err) {
error_propagate(errp, err);
close(fd);
ret = -1;
goto out;
}
+ cpr_resave_fd(name, TAP_FD_INDEX(i), fd);
}
}
out:
+ if (ret) {
+ cpr_delete_fd_all(name);
+ }
return ret;
}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [RFC V1 6/6] tap: postload fix for cpr
2024-08-30 11:56 [RFC V1 0/6] Live Update: tap and vhost Steve Sistare
` (4 preceding siblings ...)
2024-08-30 11:56 ` [RFC V1 5/6] tap: cpr support Steve Sistare
@ 2024-08-30 11:56 ` Steve Sistare
5 siblings, 0 replies; 10+ messages in thread
From: Steve Sistare @ 2024-08-30 11:56 UTC (permalink / raw)
To: qemu-devel
Cc: Michael S. Tsirkin, Jason Wang, Peter Xu, Fabiano Rosas,
Euan Turner, Steve Sistare
After cpr-exec of a multi-queue NIC, if any queues are unused, then
the corresponding tap is marked enabled in userland, but it is disabled
in the kernel for the fd that was preserved across exec. One cannot call
tap_disable() during postload, because that eventually calls
IFF_DETACH_QUEUE, which fails because the queue is already detached.
Define tap_disable_postload to avoid IFF_DETACH_QUEUE.
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
hw/net/virtio-net.c | 20 ++++++++++++++++++++
include/net/tap.h | 1 +
net/tap.c | 17 +++++++++++++++++
3 files changed, 38 insertions(+)
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 8f30972..8dc0a02 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -728,6 +728,25 @@ static int peer_detach(VirtIONet *n, int index)
return tap_disable(nc->peer);
}
+/*
+ * Set the disabled flag on unused queue pairs after vmstate load, without
+ * calling IFF_DETACH_QUEUE, which fails because the queue is already detached.
+ */
+static void virtio_net_postload_queue_pairs(VirtIONet *n)
+{
+ int i;
+ MigMode mode = migrate_mode();
+
+ if (mode == MIG_MODE_CPR_EXEC || mode == MIG_MODE_CPR_TRANSFER) {
+ for (i = n->curr_queue_pairs; i < n->max_queue_pairs; i++) {
+ NetClientState *nc = qemu_get_subqueue(n->nic, i);
+ if (nc->peer && nc->peer->info->type == NET_CLIENT_DRIVER_TAP) {
+ tap_disable_postload(nc->peer);
+ }
+ }
+ }
+}
+
static void virtio_net_set_queue_pairs(VirtIONet *n)
{
int i;
@@ -3056,6 +3075,7 @@ static int virtio_net_post_load_device(void *opaque, int version_id)
*/
n->saved_guest_offloads = n->curr_guest_offloads;
+ virtio_net_postload_queue_pairs(n);
virtio_net_set_queue_pairs(n);
/* Find the first multicast entry in the saved MAC filter */
diff --git a/include/net/tap.h b/include/net/tap.h
index 5d58551..9456abe 100644
--- a/include/net/tap.h
+++ b/include/net/tap.h
@@ -30,6 +30,7 @@
int tap_enable(NetClientState *nc);
int tap_disable(NetClientState *nc);
+void tap_disable_postload(NetClientState *nc);
int tap_get_fd(NetClientState *nc);
diff --git a/net/tap.c b/net/tap.c
index 4d57a53..abc4994 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -1060,3 +1060,20 @@ int tap_disable(NetClientState *nc)
return ret;
}
}
+
+/*
+ * On cpr restart, the tap is marked enabled in userland, but it might be
+ * disabled in the kernel, and IFF_DETACH_QUEUE will fail because it is
+ * already detached. This function disables without calling IFF_DETACH_QUEUE.
+ */
+void tap_disable_postload(NetClientState *nc)
+{
+ TAPState *s = DO_UPCAST(TAPState, nc, nc);
+
+ if (s->enabled == 0) {
+ return;
+ } else {
+ s->enabled = false;
+ tap_update_fd_handler(s);
+ }
+}
--
1.8.3.1
^ permalink raw reply related [flat|nested] 10+ messages in thread