qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [RFC V2 0/8] Live update: tap and vhost
@ 2025-07-17 18:39 Steve Sistare
  2025-07-17 18:39 ` [RFC V2 1/8] migration: stop vm earlier for cpr Steve Sistare
                   ` (10 more replies)
  0 siblings, 11 replies; 28+ messages in thread
From: Steve Sistare @ 2025-07-17 18:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Jason Wang, Michael S. Tsirkin, Stefano Garzarella, Peter Xu,
	Fabiano Rosas, Hamza Khan, Steve Sistare

Tap and vhost devices can be preserved during cpr-transfer using
traditional live migration methods, wherein the management layer
creates new interfaces for the target and fiddles with 'ip link'
to deactivate the old interface and activate the new.

However, CPR can simply send the file descriptors to new QEMU,
with no special management actions required.  The user enables
this behavior by specifing '-netdev tap,cpr=on'.  The default
is cpr=off.

Steve Sistare (8):
  migration: stop vm earlier for cpr
  migration: cpr setup notifier
  vhost: reset vhost devices for cpr
  cpr: delete all fds
  Revert "vhost-backend: remove vhost_kernel_reset_device()"
  tap: common return label
  tap: cpr support
  tap: postload fix for cpr

 qapi/net.json             |   5 +-
 include/hw/virtio/vhost.h |   1 +
 include/migration/cpr.h   |   3 +-
 include/net/tap.h         |   1 +
 hw/net/virtio-net.c       |  20 +++++++
 hw/vfio/device.c          |   2 +-
 hw/virtio/vhost-backend.c |   6 ++
 hw/virtio/vhost.c         |  32 +++++++++++
 migration/cpr.c           |  24 ++++++--
 migration/migration.c     |  38 ++++++++-----
 net/tap-win32.c           |   5 ++
 net/tap.c                 | 141 +++++++++++++++++++++++++++++++++++-----------
 12 files changed, 223 insertions(+), 55 deletions(-)

-- 
1.8.3.1



^ permalink raw reply	[flat|nested] 28+ messages in thread
* Re: [RFC V2 0/8] Live update: tap and vhost
@ 2025-08-18 15:04 Chaney, Ben
  2025-08-22 18:26 ` Steven Sistare
  0 siblings, 1 reply; 28+ messages in thread
From: Chaney, Ben @ 2025-08-18 15:04 UTC (permalink / raw)
  To: Steven Sistare
  Cc: qemu-devel@nongnu.org, jasowang@redhat.com, mst@redhat.com,
	sgarzare@redhat.com, peterx@redhat.com, farosas@suse.de,
	hamza.khan@nutanix.com, Hunt, Joshua, Tottenham, Max,
	Glasgall, Anna, Harnett, Dan

 steven.sistare@oracle.comFrom: Steve Sistare <steven.sistare@oracle.com <mailto:steven.sistare@oracle.com>

>Tap and vhost devices can be preserved during cpr-transfer using
> traditional live migration methods, wherein the management layer
> creates new interfaces for the target and fiddles with 'ip link'
> to deactivate the old interface and activate the new.

> However, CPR can simply send the file descriptors to new QEMU,
> with no special management actions required. The user enables
> this behavior by specifing '-netdev tap,cpr=on'. The default
> is cpr=off.


Hi Steve,

Thank you for sending this patch set I tried testing it, and
the migration fails with the following error on the destination:


2025-08-07T18:14:30.564323Z qemu-system-x86_64: could not disable queue
qemu-system-x86_64: ../hw/net/virtio-net.c:767: virtio_net_set_queue_pairs: Assertion `!r' failed.


And the following error on the source:

vhost_reset_device failed: Operation not permitted (1)
vhost_reset_device failed: Operation not permitted (1)
2025-08-15T14:50:16.028494Z qemu-system-x86_64: Failed to connect to 'main.sock': Connection refused
2025-08-15T14:50:16.028552Z qemu-system-x86_64: vhost_set_owner failed: Device or resource busy (16)
2025-08-15T14:50:16.028565Z qemu-system-x86_64: vhost_set_owner failed: Device or resource busy (16)
2025-08-15T14:50:16.028578Z qemu-system-x86_64: vhost_set_owner failed: Device or resource busy (16)
2025-08-15T14:50:16.028590Z qemu-system-x86_64: vhost_set_owner failed: Device or resource busy (16)
2025-08-15T14:50:16.028604Z qemu-system-x86_64: vhost_set_owner failed: Device or resource busy (16)
2025-08-15T14:50:16.028629Z qemu-system-x86_64: vhost_set_owner failed: Device or resource busy (16)
2025-08-15T14:50:16.028641Z qemu-system-x86_64: vhost_set_owner failed: Device or resource busy (16)
2025-08-15T14:50:16.028844Z qemu-system-x86_64: vhost_set_owner failed: Device or resource busy (16)
2025-08-15T14:50:16.028856Z qemu-system-x86_64: vhost_set_owner failed: Device or resource busy (16)
2025-08-15T14:50:16.028868Z qemu-system-x86_64: vhost_set_owner failed: Device or resource busy (16)
2025-08-15T14:50:16.028880Z qemu-system-x86_64: vhost_set_owner failed: Device or resource busy (16)
2025-08-15T14:50:16.028893Z qemu-system-x86_64: vhost_set_owner failed: Device or resource busy (16)
2025-08-15T14:50:16.028904Z qemu-system-x86_64: vhost_set_owner failed: Device or resource busy (16)
2025-08-15T14:50:16.028916Z qemu-system-x86_64: vhost_set_owner failed: Device or resource busy (16)
2025-08-15T14:50:16.028928Z qemu-system-x86_64: vhost_set_owner failed: Device or resource busy (16)

I suspect the issue may be related to the fact that we are dropping
privileges (-run-with user=$USERNAME) as cpr transfer has run
into other issues with that in the past, but I haven't found anything
concrete there yet.

Some other information:

The full qemu arguments used for networking are:

-netdev tap,id=net0,ifname=tap.79874411_0,script=no,downscript=no,vhost=on,queues=8,cpr=on
-device virtio-net-pci,netdev=net0,id=netpci0,mac=$mac1,vectors=18,mq=on
-netdev tap,id=net1,ifname=tap.79874411_1,script=no,downscript=no,vhost=on,queues=8,cpr=on
-device virtio-net-pci,netdev=net1,id=netpci1,mac=$mac2,vectors=18,mq=on

I applied your patch on top of 7136352b40631b058dd0fe731a0d404e761e799f
I also applied the pending arm interrupt fix

Thanks,
        Ben



^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2025-09-05 16:18 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-17 18:39 [RFC V2 0/8] Live update: tap and vhost Steve Sistare
2025-07-17 18:39 ` [RFC V2 1/8] migration: stop vm earlier for cpr Steve Sistare
2025-07-17 18:39 ` [RFC V2 2/8] migration: cpr setup notifier Steve Sistare
2025-07-17 18:39 ` [RFC V2 3/8] vhost: reset vhost devices for cpr Steve Sistare
2025-08-27 11:29   ` Vladimir Sementsov-Ogievskiy
2025-08-27 18:38     ` Steven Sistare
2025-07-17 18:39 ` [RFC V2 4/8] cpr: delete all fds Steve Sistare
2025-07-17 18:39 ` [RFC V2 5/8] Revert "vhost-backend: remove vhost_kernel_reset_device()" Steve Sistare
2025-08-22 18:26   ` Steven Sistare
2025-07-17 18:39 ` [RFC V2 6/8] tap: common return label Steve Sistare
2025-07-17 18:39 ` [RFC V2 7/8] tap: cpr support Steve Sistare
2025-07-17 18:39 ` [RFC V2 8/8] tap: postload fix for cpr Steve Sistare
2025-07-18  8:48 ` [RFC V2 0/8] Live update: tap and vhost Lei Yang
2025-07-18 17:31   ` Steven Sistare
2025-07-24  5:46   ` Lei Yang
2025-08-05 13:54 ` Fabiano Rosas
2025-08-05 19:53   ` Steven Sistare
2025-08-06 15:51     ` Peter Xu
2025-08-11 18:24     ` Steven Sistare
2025-08-23 21:53 ` Vladimir Sementsov-Ogievskiy
2025-08-28 15:48   ` Steven Sistare
2025-08-29 19:37     ` Steven Sistare
2025-09-01 11:44       ` Vladimir Sementsov-Ogievskiy
2025-09-02 15:33         ` Steven Sistare
2025-09-02 17:09           ` Vladimir Sementsov-Ogievskiy
2025-09-05 16:16             ` Peter Xu
  -- strict thread matches above, loose matches on Subject: below --
2025-08-18 15:04 Chaney, Ben
2025-08-22 18:26 ` Steven Sistare

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).