From: Jens Freimann <jfreimann@redhat.com>
To: qemu-devel@nongnu.org
Cc: pkrempa@redhat.com, berrange@redhat.com, ehabkost@redhat.com,
mst@redhat.com, aadam@redhat.com, dgilbert@redhat.com,
alex.williamson@redhat.com, laine@redhat.com, ailan@redhat.com,
parav@mellanox.com
Subject: [PATCH v4 0/11] add failover feature for assigned network devices
Date: Fri, 18 Oct 2019 22:20:29 +0200 [thread overview]
Message-ID: <20191018202040.30349-1-jfreimann@redhat.com> (raw)
This is implementing the host side of the net_failover concept
(https://www.kernel.org/doc/html/latest/networking/net_failover.html)
Changes since v3:
* Patch 1, make return values of qdev_should_hide_device() more clear
* Patch 1, clarify comment about new should_be_hidden DeviceListener
* Patch 2 new patch, add net_failover_option_id to PCIDevice, only
allow PCIExpress devices for now
* Patch 8, only go into wait_unplug state when failover devices are present
* Patch 8, add new state to migration_is_setup_or_active, tested cancelling
while migration is in this state
* Patch 8, simplify handling of wait_unplug state, don't cancel migration after
timeout, let upper layer do this, get rid of retry counter (dgilbert)
* Patch 11, move net_failover_pair_id to PCIDev, move check for pci class
to PCI code, only allow PCIe devices for now as we only support
hotplugging these devices (aw)
* verified qemu make check tests ran, checked that docker-test-quick@centos7 runs
successful, tested migration with/without failover, without vfio-pci
* this now allows only PCIe devices because that's the only hotplug
controller that supports the partial unplug as of now. I'll work on
making it discoverable for libvirt or on support for the
other hotplug controllers in a follow-on patch set
The general idea is that we have a pair of devices, a vfio-pci and a
virtio-net device. Before migration the vfio device is unplugged and data
flows to the virtio-net device, on the target side another vfio-pci device
is plugged in to take over the data-path. In the guest the net_failover
module will pair net devices with the same MAC address.
* Patch 1 adds the infrastructure to hide the device for the qbus and qdev APIs
* Patch 2 adds checks to PCIDevice for only allowing ethernet devices as
failover primary and only PCIExpress capable devices
* Patch 3 sets a new flag for PCIDevice 'partially_hotplugged' which we
use to skip the unrealize code path when doing a unplug of the primary
device
* Patch 4 sets the pending_deleted_event before triggering the guest
unplug request
* Patch 5 and 6 add new qmp events, one sends the device id of a device
that was just requested to be unplugged from the guest and another one
to let libvirt know if VIRTIO_NET_F_STANDBY was negotiated
* Patch 7 make sure that we can unplug the vfio-device before
migration starts
* Patch 8 adds a new migration state that is entered while we wait for
devices to be unplugged by guest OS
* Patch 9 just adds the new migration state to a check in libqos code
* Patch 10 In the second patch the virtio-net uses the API to defer adding the vfio
device until the VIRTIO_NET_F_STANDBY feature is acked. It also
implements the migration handler to unplug the device from the guest and
re-plug in case of migration failure
* Patch 11 allows migration for failover vfio-pci devices
Previous discussion:
RFC v1 https://patchwork.ozlabs.org/cover/989098/
RFC v2 https://www.mail-archive.com/qemu-devel@nongnu.org/msg606906.html
v1: https://lists.gnu.org/archive/html/qemu-devel/2019-05/msg03968.html
v2: https://www.mail-archive.com/qemu-devel@nongnu.org/msg635214.html
v3: https://patchew.org/QEMU/20191011112015.11785-1-jfreimann@redhat.com/
To summarize concerns/feedback from previous discussion:
1.- guest OS can reject or worse _delay_ unplug by any amount of time.
Migration might get stuck for unpredictable time with unclear reason.
This approach combines two tricky things, hot/unplug and migration.
-> We need to let libvirt know what's happening. Add new qmp events
and a new migration state. When a primary device is (partially)
unplugged (only from guest) we send a qmp event with the device id. When
it is unplugged from the guest the DEVICE_DELETED event is sent.
Migration will enter the wait-unplug state while waiting for the guest
os to unplug all primary devices and then move on with migration.
2. PCI devices are a precious ressource. The primary device should never
be added to QEMU if it won't be used by guest instead of hiding it in
QEMU.
-> We only hotplug the device when the standby feature bit was
negotiated. We save the device cmdline options until we need it for
qdev_device_add()
Hiding a device can be a useful concept to model. For example a
pci device in a powered-off slot could be marked as hidden until the slot is
powered on (mst).
3. Management layer software should handle this. Open Stack already has
components/code to handle unplug/replug VFIO devices and metadata to
provide to the guest for detecting which devices should be paired.
-> An approach that includes all software from firmware to
higher-level management software wasn't tried in the last years. This is
an attempt to keep it simple and contained in QEMU as much as possible.
One of the problems that stopped management software and libvirt from
implementing this idea is that it can't be sure that it's possible to
re-plug the primary device. By not freeing the devices resources in QEMU
and only asking the guest OS to unplug it is possible to re-plug the
device in case of a migration failure.
4. Hotplugging a device and then making it part of a failover setup is
not possible
-> addressed by extending qdev hotplug functions to check for hidden
attribute, so e.g. device_add can be used to plug a device.
I have tested this with a mlx5 and igb NIC and was able to migrate the VM.
Command line example:
qemu-system-x86_64 -enable-kvm -m 3072 -smp 3 \
-machine q35,kernel-irqchip=split -cpu host \
-serial stdio \
-net none \
-qmp unix:/tmp/qmp.socket,server,nowait \
-monitor telnet:127.0.0.1:5555,server,nowait \
-device pcie-root-port,id=root0,multifunction=on,chassis=0,addr=0xa \
-device pcie-root-port,id=root1,bus=pcie.0,chassis=1 \
-device pcie-root-port,id=root2,bus=pcie.0,chassis=2 \
-netdev tap,script=/root/bin/bridge.sh,downscript=no,id=hostnet1,vhost=on \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc,bus=root2,failover=on \
-device vfio-pci,host=5e:00.2,id=hostdev0,bus=root1,net_failover_pair_id =net1 \
/root/rhel-guest-image-8.0-1781.x86_64.qcow2
I'm grateful for any remarks or ideas!
Thanks!
regards,
Jens
Jens Freimann (10):
qdev/qbus: add hidden device support
pci: mark devices partially unplugged
pci: mark device having guest unplug request pending
qapi: add unplug primary event
qapi: add failover negotiated event
migration: allow unplug during migration for failover devices
migration: add new migration state wait-unplug
libqos: tolerate wait-unplug migration state
net/virtio: add failover support
vfio: unplug failover primary device before migration
hw/core/qdev.c | 20 +++
hw/net/virtio-net.c | 267 +++++++++++++++++++++++++++++++++
hw/pci/pci.c | 2 +
hw/pci/pcie.c | 6 +
hw/vfio/pci.c | 35 ++++-
hw/vfio/pci.h | 2 +
include/hw/pci/pci.h | 1 +
include/hw/qdev-core.h | 10 ++
include/hw/virtio/virtio-net.h | 12 ++
include/hw/virtio/virtio.h | 1 +
include/migration/vmstate.h | 2 +
migration/migration.c | 34 +++++
migration/migration.h | 3 +
migration/savevm.c | 36 +++++
migration/savevm.h | 1 +
qapi/migration.json | 24 ++-
qapi/net.json | 16 ++
qdev-monitor.c | 43 +++++-
tests/libqos/libqos.c | 3 +-
vl.c | 6 +-
20 files changed, 515 insertions(+), 9 deletions(-)
--
2.21.0
*** BLURB HERE ***
Jens Freimann (11):
qdev/qbus: add hidden device support
pci: add option for net failover
pci: mark devices partially unplugged
pci: mark device having guest unplug request pending
qapi: add unplug primary event
qapi: add failover negotiated event
migration: allow unplug during migration for failover devices
migration: add new migration state wait-unplug
libqos: tolerate wait-unplug migration state
net/virtio: add failover support
vfio: unplug failover primary device before migration
hw/core/qdev.c | 24 +++
hw/net/virtio-net.c | 282 +++++++++++++++++++++++++++++++++
hw/pci/pci.c | 17 ++
hw/pci/pcie.c | 6 +
hw/vfio/pci.c | 26 ++-
hw/vfio/pci.h | 1 +
include/hw/pci/pci.h | 4 +
include/hw/qdev-core.h | 9 ++
include/hw/virtio/virtio-net.h | 12 ++
include/hw/virtio/virtio.h | 1 +
include/migration/vmstate.h | 2 +
migration/migration.c | 22 +++
migration/migration.h | 3 +
migration/savevm.c | 36 +++++
migration/savevm.h | 1 +
qapi/migration.json | 24 ++-
qapi/net.json | 16 ++
qdev-monitor.c | 38 ++++-
tests/libqos/libqos.c | 3 +-
vl.c | 6 +-
20 files changed, 519 insertions(+), 14 deletions(-)
--
2.21.0
Jens Freimann (11):
qdev/qbus: add hidden device support
pci: add option for net failover
pci: mark devices partially unplugged
pci: mark device having guest unplug request pending
qapi: add unplug primary event
qapi: add failover negotiated event
migration: allow unplug during migration for failover devices
migration: add new migration state wait-unplug
libqos: tolerate wait-unplug migration state
net/virtio: add failover support
vfio: unplug failover primary device before migration
hw/core/qdev.c | 24 +++
hw/net/virtio-net.c | 282 +++++++++++++++++++++++++++++++++
hw/pci/pci.c | 17 ++
hw/pci/pcie.c | 6 +
hw/vfio/pci.c | 26 ++-
hw/vfio/pci.h | 1 +
include/hw/pci/pci.h | 4 +
include/hw/qdev-core.h | 9 ++
include/hw/virtio/virtio-net.h | 12 ++
include/hw/virtio/virtio.h | 1 +
include/migration/vmstate.h | 2 +
migration/migration.c | 21 +++
migration/migration.h | 3 +
migration/savevm.c | 36 +++++
migration/savevm.h | 2 +
qapi/migration.json | 24 ++-
qapi/net.json | 16 ++
qdev-monitor.c | 38 ++++-
tests/libqos/libqos.c | 3 +-
vl.c | 6 +-
20 files changed, 519 insertions(+), 14 deletions(-)
--
2.21.0
Jens Freimann (11):
qdev/qbus: add hidden device support
pci: add option for net failover
pci: mark devices partially unplugged
pci: mark device having guest unplug request pending
qapi: add unplug primary event
qapi: add failover negotiated event
migration: allow unplug during migration for failover devices
migration: add new migration state wait-unplug
libqos: tolerate wait-unplug migration state
net/virtio: add failover support
vfio: unplug failover primary device before migration
hw/core/qdev.c | 24 +++
hw/net/virtio-net.c | 282 +++++++++++++++++++++++++++++++++
hw/pci/pci.c | 17 ++
hw/pci/pcie.c | 6 +
hw/vfio/pci.c | 31 +++-
hw/vfio/pci.h | 1 +
include/hw/pci/pci.h | 4 +
include/hw/qdev-core.h | 9 ++
include/hw/virtio/virtio-net.h | 12 ++
include/hw/virtio/virtio.h | 1 +
include/migration/vmstate.h | 2 +
migration/migration.c | 21 +++
migration/migration.h | 3 +
migration/savevm.c | 36 +++++
migration/savevm.h | 2 +
qapi/migration.json | 24 ++-
qapi/net.json | 16 ++
qdev-monitor.c | 38 ++++-
tests/libqos/libqos.c | 3 +-
vl.c | 6 +-
20 files changed, 524 insertions(+), 14 deletions(-)
--
2.21.0
next reply other threads:[~2019-10-18 20:22 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-10-18 20:20 Jens Freimann [this message]
2019-10-18 20:20 ` [PATCH 01/11] qdev/qbus: add hidden device support Jens Freimann
2019-10-21 12:44 ` Cornelia Huck
2019-10-21 12:52 ` Jens Freimann
2019-10-21 12:53 ` Peter Maydell
2019-10-21 13:00 ` Jens Freimann
2019-10-18 20:20 ` [PATCH 02/11] pci: add option for net failover Jens Freimann
2019-10-21 14:58 ` Alex Williamson
2019-10-21 18:45 ` Jens Freimann
2019-10-21 19:01 ` Alex Williamson
2019-10-21 20:28 ` Jens Freimann
2019-10-21 20:48 ` Alex Williamson
2019-10-18 20:20 ` [PATCH 03/11] pci: mark devices partially unplugged Jens Freimann
2019-10-18 20:20 ` [PATCH 04/11] pci: mark device having guest unplug request pending Jens Freimann
2019-10-18 20:20 ` [PATCH 05/11] qapi: add unplug primary event Jens Freimann
2019-10-18 20:28 ` Eric Blake
2019-10-21 7:23 ` Jens Freimann
2019-10-18 20:20 ` [PATCH 06/11] qapi: add failover negotiated event Jens Freimann
2019-10-18 20:20 ` [PATCH 07/11] migration: allow unplug during migration for failover devices Jens Freimann
2019-10-18 20:20 ` [PATCH 08/11] migration: add new migration state wait-unplug Jens Freimann
2019-10-18 20:20 ` [PATCH 09/11] libqos: tolerate wait-unplug migration state Jens Freimann
2019-10-18 20:20 ` [PATCH 10/11] net/virtio: add failover support Jens Freimann
2019-10-18 20:20 ` [PATCH 11/11] vfio: unplug failover primary device before migration Jens Freimann
2019-10-21 13:45 ` Cornelia Huck
2019-10-21 14:09 ` Jens Freimann
2019-10-21 15:19 ` Alex Williamson
2019-10-21 19:16 ` Jens Freimann
2019-10-19 15:12 ` [PATCH v4 0/11] add failover feature for assigned network devices no-reply
2019-10-21 7:30 ` Jens Freimann
2019-10-19 15:15 ` no-reply
2019-10-21 14:18 ` Cornelia Huck
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191018202040.30349-1-jfreimann@redhat.com \
--to=jfreimann@redhat.com \
--cc=aadam@redhat.com \
--cc=ailan@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=ehabkost@redhat.com \
--cc=laine@redhat.com \
--cc=mst@redhat.com \
--cc=parav@mellanox.com \
--cc=pkrempa@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).