qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* sr-iov live migration with NET_FAILOVER
@ 2025-05-22 22:27 Paul B. Henson
  0 siblings, 0 replies; only message in thread
From: Paul B. Henson @ 2025-05-22 22:27 UTC (permalink / raw)
  To: qemu-devel

I'm running libvirtd under Debian 12 and trying to set up live migration 
of a linux vm that's using an sr-iov VF as its primary ethernet device. 
I have that device and the corresponding virtio backup device properly 
configured in libvirt, and when the vm starts up everything looks good:

2: nic0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state 
UP mode DEFAULT group default qlen 1000
     link/ether 52:54:00:a1:e0:38 brd ff:ff:ff:ff:ff:ff
3: enp8s0nsby: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc 
fq_codel master nic0 state DOWN mode DEFAULT group default qlen 1000
     link/ether 52:54:00:a1:e0:38 brd ff:ff:ff:ff:ff:ff
5: enp1s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master 
nic0 state UP mode DEFAULT group default qlen 1000
     link/ether 52:54:00:a1:e0:38 brd ff:ff:ff:ff:ff:ff

The problem I am having is that when I do a live migration of the box, 
link on the standby virtio interface does not come up when the VF is 
unplugged, so all network traffic to the system is dropped during the 
interval between the source pulling the VF and the destination plugging 
it back in.

It's not clear to me who is responsible for doing that? But from what I 
can tell, it seems like it should be qmeu?

Per the documentation:

https://www.kernel.org/doc/html/latest/networking/net_failover.html

The sequence of events should be:

* bring up link on standby device
* detach sr-iov device
* migrate vm
* attach sr-iov device
* bring down link on standby device

If I do that manually, using virsh to light up the standby device and 
detach the VF before migration and then manually reattach the device and 
bring down standby link, no traffic is lost at all during the migration 
process.

I initially thought perhaps libvirt was supposed to be doing it, but 
reviewing the debug logs and the QMP commands, it is neither detaching 
nor reattaching the VF. It's just telling qmeu there's a failover pair, 
and qemu is doing the detach/attach while migrating:

-device 
{"driver":"virtio-net-pci","failover":true,"netdev":"hostua-sr-iov-backup","id":"ua-sr-iov-backup","mac":"52:54:00:a1:e0:38","bus":"pci.7","addr":"0x0"}

-device 
{"driver":"vfio-pci","host":"0000:37:10.0","id":"hostdev0","failover_pair_id":"ua-sr-iov-backup","bus":"pci.1","addr":"0x0"}


I tried manually bringing up link on the standby device before migrating 
the system and letting qemu deal with the vf detach/attach, but that 
resulted in even more lost traffic than simply letting the standby 
device be down during the migration (the network started sending packets 
to the standby device but the failover device didn't forward them on to 
the virtual nic as long as the VF existed).

Am I missing something? Is there some other configuration I'm supposed 
to do? Any insight on this issue would be most appreciated.

Debian 12 has qemu 7.2 as stable, but I also tried the backport of 9.2 
with the same apparent behavior.

Thanks much...



^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2025-05-22 22:29 UTC | newest]

Thread overview: (only message) (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-22 22:27 sr-iov live migration with NET_FAILOVER Paul B. Henson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).