qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] Support for datapath switching during live migration
@ 2018-12-10 16:15 Venu Busireddy
  2018-12-10 16:15 ` [Qemu-devel] [PATCH 1/3] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit Venu Busireddy
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Venu Busireddy @ 2018-12-10 16:15 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: virtio-dev, qemu-devel

Implement the infrastructure to support datapath switching during live
migration involving SR-IOV devices.

1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
   bit and MAC address device pairing.

2. This set of events will be consumed by userspace management software
   to orchestrate all the hot plug and datapath switching activities.
   This scheme has the least QEMU modifications while allowing userspace
   software to build its own intelligence to control the whole process
   of SR-IOV live migration.

3. While the hidden device model (viz. coupled device model) is still
   being explored for automatic hot plugging (QEMU) and automatic datapath
   switching (host-kernel), this series provides a supplemental set
   of interfaces if management software wants to drive the SR-IOV live
   migration on its own. It should not conflict with the hidden device
   model but just offers simplicity of implementation.

Sridhar Samudrala (1):
  virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.

Venu Busireddy (1):
  virtio_net: Add support for "Data Path Switching" during Live Migration.

Si-Wei Liu (1):
  vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover

 hw/acpi/pcihp.c                | 27 ++++++++++++++++
 hw/net/virtio-net.c            | 25 +++++++++++++++
 hw/vfio/pci.c                  | 60 ++++++++++++++++++++++++++++++++++++
 hw/vfio/pci.h                  |  1 +
 include/hw/pci/pci.h           |  1 +
 include/hw/virtio/virtio-net.h |  4 +++
 qapi/net.json                  | 70 ++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 188 insertions(+)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 1/3] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.
  2018-12-10 16:15 [Qemu-devel] [PATCH 0/3] Support for datapath switching during live migration Venu Busireddy
@ 2018-12-10 16:15 ` Venu Busireddy
  2018-12-10 16:15 ` [Qemu-devel] [PATCH 2/3] virtio_net: Add support for "Data Path Switching" during Live Migration Venu Busireddy
  2018-12-10 16:15 ` [Qemu-devel] [PATCH 3/3] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover Venu Busireddy
  2 siblings, 0 replies; 12+ messages in thread
From: Venu Busireddy @ 2018-12-10 16:15 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: Sridhar Samudrala, virtio-dev, qemu-devel

From: Sridhar Samudrala <sridhar.samudrala@intel.com>

This feature bit can be used by a hypervisor to indicate to the virtio_net
device that it can act as a standby for another device with the same MAC
address.

Signed-off-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 hw/net/virtio-net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 385b1a0..411f8fb 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
                      true),
     DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
     DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+    DEFINE_PROP_BIT64("standby", VirtIONet, host_features, VIRTIO_NET_F_STANDBY,
+                      false),
     DEFINE_PROP_END_OF_LIST(),
 };
 

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 2/3] virtio_net: Add support for "Data Path Switching" during Live Migration.
  2018-12-10 16:15 [Qemu-devel] [PATCH 0/3] Support for datapath switching during live migration Venu Busireddy
  2018-12-10 16:15 ` [Qemu-devel] [PATCH 1/3] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit Venu Busireddy
@ 2018-12-10 16:15 ` Venu Busireddy
  2018-12-10 17:41   ` Michael S. Tsirkin
  2018-12-10 19:28   ` Eric Blake
  2018-12-10 16:15 ` [Qemu-devel] [PATCH 3/3] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover Venu Busireddy
  2 siblings, 2 replies; 12+ messages in thread
From: Venu Busireddy @ 2018-12-10 16:15 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: virtio-dev, qemu-devel

Added two new events, FAILOVER_PLUG_PRIMARY and FAILOVER_UNPLUG_PRIMARY.
The first is emitted when the guest negotiates the F_STANDBY feature
bit. The second is emitted when the virtio_net driver is removed, either
manually or as a result of guest reboot. Management stack can use these
events to determine when to plug/unplug the VF device to/from the guest.

Also, the Virtual Functions will be automatically removed from the guest
if the guest is rebooted. To properly identify the VFIO devices that
must be removed, a new property named "x-failover-primary" is added to
the vfio-pci devices. Only the vfio-pci devices that have this property
enabled are removed from the guest upon reboot.

Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 hw/acpi/pcihp.c                | 27 ++++++++++++++++++++++++++
 hw/net/virtio-net.c            | 23 ++++++++++++++++++++++
 hw/vfio/pci.c                  |  3 +++
 hw/vfio/pci.h                  |  1 +
 include/hw/pci/pci.h           |  1 +
 include/hw/virtio/virtio-net.h |  4 ++++
 qapi/net.json                  | 44 ++++++++++++++++++++++++++++++++++++++++++
 7 files changed, 103 insertions(+)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 80d42e1..2a3ffd3 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
     }
 }
 
+static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
+{
+    BusChild *kid, *next;
+    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
+
+    if (!bus) {
+        return;
+    }
+    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
+        DeviceState *qdev = kid->child;
+        PCIDevice *pdev = PCI_DEVICE(qdev);
+        int slot = PCI_SLOT(pdev->devfn);
+
+        if (pdev->failover_primary) {
+            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
+        }
+    }
+}
+
 static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
 {
     BusChild *kid, *next;
@@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
     int i;
 
     for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
+        /*
+         * Set the acpi_pcihp_pci_status[].down bits of all the
+         * failover_primary devices so that the devices are ejected
+         * from the guest. We can't use the qdev_unplug() as well as the
+         * hotplug_handler to unplug the devices, because the guest may
+         * not be in a state to cooperate.
+         */
+        acpi_pcihp_cleanup_failover_primary(s, i);
         acpi_pcihp_update_hotplug_bus(s, i);
     }
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 411f8fb..d37f33c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -248,6 +248,28 @@ static void virtio_net_drop_tx_queue_data(VirtIODevice *vdev, VirtQueue *vq)
     }
 }
 
+static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
+{
+    VirtIODevice *vdev = VIRTIO_DEVICE(n);
+
+    if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_STANDBY)) {
+        gchar *path = object_get_canonical_path(OBJECT(n->qdev));
+        /*
+         * Emit the FAILOVER_PLUG_PRIMARY event
+         *   when the status transitions from 0 to VIRTIO_CONFIG_S_DRIVER_OK
+         * Emit the FAILOVER_UNPLUG_PRIMARY event
+         *   when the status transitions from VIRTIO_CONFIG_S_DRIVER_OK to 0
+         */
+        if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+                (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
+            FAILOVER_NOTIFY_EVENT(plug, n, path);
+        } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
+                (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+            FAILOVER_NOTIFY_EVENT(unplug, n, path);
+        }
+    }
+}
+
 static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
 {
     VirtIONet *n = VIRTIO_NET(vdev);
@@ -256,6 +278,7 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
     uint8_t queue_status;
 
     virtio_net_vnet_endian_status(n, status);
+    virtio_net_failover_notify_event(n, status);
     virtio_net_vhost_status(n, status);
 
     for (i = 0; i < n->max_queues; i++) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5c7bd96..ce1f33c 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3077,6 +3077,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     vfio_register_err_notifier(vdev);
     vfio_register_req_notifier(vdev);
     vfio_setup_resetfn_quirk(vdev);
+    pdev->failover_primary = vdev->failover_primary;
 
     return;
 
@@ -3219,6 +3220,8 @@ static Property vfio_pci_dev_properties[] = {
                                    qdev_prop_nv_gpudirect_clique, uint8_t),
     DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
                                 OFF_AUTOPCIBAR_OFF),
+    DEFINE_PROP_BOOL("x-failover-primary", VFIOPCIDevice,
+                     failover_primary, false),
     /*
      * TODO - support passed fds... is this necessary?
      * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index b1ae4c0..06ca661 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -167,6 +167,7 @@ typedef struct VFIOPCIDevice {
     bool no_vfio_ioeventfd;
     bool enable_ramfb;
     VFIODisplay *dpy;
+    bool failover_primary;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index e6514bb..b0111d1 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -351,6 +351,7 @@ struct PCIDevice {
     MSIVectorUseNotifier msix_vector_use_notifier;
     MSIVectorReleaseNotifier msix_vector_release_notifier;
     MSIVectorPollNotifier msix_vector_poll_notifier;
+    bool failover_primary;
 };
 
 void pci_register_bar(PCIDevice *pci_dev, int region_num,
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 4d7f3c8..a697903 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -22,6 +22,10 @@
 #define VIRTIO_NET(obj) \
         OBJECT_CHECK(VirtIONet, (obj), TYPE_VIRTIO_NET)
 
+#define FAILOVER_NOTIFY_EVENT(type, n, path) \
+        qapi_event_send_failover_##type##_primary \
+                (!!n->netclient_name, n->netclient_name, path)
+
 #define TX_TIMER_INTERVAL 150000 /* 150 us */
 
 /* Limit the number of packets that can be sent via a single flush
diff --git a/qapi/net.json b/qapi/net.json
index 8f99fd9..04a9de9 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -683,3 +683,47 @@
 ##
 { 'event': 'NIC_RX_FILTER_CHANGED',
   'data': { '*name': 'str', 'path': 'str' } }
+
+##
+# @FAILOVER_PLUG_PRIMARY:
+#
+# Emitted when the guest successfully loads the driver after the STANDBY
+# feature bit is negotiated.
+#
+# @device: Indicates the virtio_net device.
+#
+# @path: Indicates the device path.
+#
+# Since: 3.0
+#
+# Example:
+#
+# <- {"timestamp": {"seconds": 1432121972, "microseconds": 744001},
+#     "event": "FAILOVER_PLUG_PRIMARY",
+#     "data": {"device": "net0", "path": "/machine/peripheral/net0/virtio-backend"} }
+#
+##
+{ 'event': 'FAILOVER_PLUG_PRIMARY',
+  'data': {'*device': 'str', 'path': 'str'} }
+
+##
+# @FAILOVER_UNPLUG_PRIMARY:
+#
+# Emitted when the guest resets the virtio_net driver.
+# The reset can be the result of either unloading the driver or a reboot.
+#
+# @device: Indicates the virtio_net device.
+#
+# @path: Indicates the device path.
+#
+# Since: 3.0
+#
+# Example:
+#
+# <- {"timestamp": {"seconds": 1432121972, "microseconds": 744001},
+#     "event": "FAILOVER_UNPLUG_PRIMARY",
+#     "data": {"device": "net0", "path": "/machine/peripheral/net0/virtio-backend"} }
+#
+##
+{ 'event': 'FAILOVER_UNPLUG_PRIMARY',
+  'data': {'*device': 'str', 'path': 'str'} }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 3/3] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
  2018-12-10 16:15 [Qemu-devel] [PATCH 0/3] Support for datapath switching during live migration Venu Busireddy
  2018-12-10 16:15 ` [Qemu-devel] [PATCH 1/3] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit Venu Busireddy
  2018-12-10 16:15 ` [Qemu-devel] [PATCH 2/3] virtio_net: Add support for "Data Path Switching" during Live Migration Venu Busireddy
@ 2018-12-10 16:15 ` Venu Busireddy
  2018-12-10 17:31   ` Michael S. Tsirkin
  2 siblings, 1 reply; 12+ messages in thread
From: Venu Busireddy @ 2018-12-10 16:15 UTC (permalink / raw)
  To: venu.busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: Si-Wei Liu, virtio-dev, qemu-devel

From: Si-Wei Liu <si-wei.liu@oracle.com>

When a VF is hotplugged into the guest, datapath switching will be
performed immediately, which is sub-optimal in terms of timing, and
could end up with substantial network downtime. One of ways to shorten
this downtime is to switch the datapath only after the VF is seen to get
enabled by guest, indicated by the bus master bit in VF's PCI config
space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
at that time to indicate this condition. Then management stack can kick
off datapath switching upon receiving the event.

Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
---
 hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qapi/net.json | 26 ++++++++++++++++++++++++++
 2 files changed, 83 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ce1f33c..ea24ca2 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -34,6 +34,7 @@
 #include "pci.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "qapi/qapi-events-net.h"
 
 #define MSIX_CAP_LENGTH 12
 
@@ -42,6 +43,7 @@
 
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
 
 /*
  * Disabling BAR mmaping can be slow, but toggling it around INTx can
@@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
 {
     VFIOPCIDevice *vdev = PCI_VFIO(pdev);
     uint32_t val_le = cpu_to_le32(val);
+    bool may_notify = false;
+    bool master_was = false;
 
     trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
 
@@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
                      __func__, vdev->vbasedev.name, addr, val, len);
     }
 
+    /* Bus Master Enabling/Disabling */
+    if (pdev->failover_primary && current_cpu &&
+        range_covers_byte(addr, len, PCI_COMMAND)) {
+        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+                        PCI_COMMAND_MASTER);
+        may_notify = true;
+    }
+
     /* MSI/MSI-X Enabling/Disabling */
     if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
         ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
@@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
         /* Write everything to QEMU to keep emulated bits correct */
         pci_default_write_config(pdev, addr, val, len);
     }
+
+    if (may_notify) {
+        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+                             PCI_COMMAND_MASTER);
+        if (master_was != master_now) {
+            vfio_failover_notify(vdev, master_now);
+        }
+    }
 }
 
 /*
@@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
     vdev->req_enabled = false;
 }
 
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
+{
+    PCIDevice *pdev = &vdev->pdev;
+    const char *n;
+    gchar *path;
+
+    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
+    path = object_get_canonical_path(OBJECT(vdev));
+    qapi_event_send_failover_primary_changed(!!n, n, path, status);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
     VFIOPCIDevice *vdev = PCI_VFIO(pdev);
@@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
     vfio_put_group(group);
 }
 
+static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
+{
+    PCIDevice *pdev = &vdev->pdev;
+
+    /*
+     * Guest driver may not get the chance to disable bus mastering
+     * before the device object gets to be unrealized. In that event,
+     * send out a "disabled" notification on behalf of guest driver.
+     */
+    if (pdev->failover_primary &&
+        pdev->bus_master_enable_region.enabled) {
+        vfio_failover_notify(vdev, false);
+    }
+}
+
 static void vfio_exitfn(PCIDevice *pdev)
 {
     VFIOPCIDevice *vdev = PCI_VFIO(pdev);
 
+    /*
+     * During the guest reboot sequence, it is sometimes possible that
+     * the guest may not get sufficient time to complete the entire driver
+     * removal sequence, near the end of which a PCI config space write to
+     * disable bus mastering can be intercepted by device. In such cases,
+     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
+     * is imperative to generate the event on the guest's behalf if the
+     * guest fails to make it.
+     */
+    vfio_exit_failover_notify(vdev);
+
     vfio_unregister_req_notifier(vdev);
     vfio_unregister_err_notifier(vdev);
     pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
diff --git a/qapi/net.json b/qapi/net.json
index 04a9de9..e5992c8 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -727,3 +727,29 @@
 ##
 { 'event': 'FAILOVER_UNPLUG_PRIMARY',
   'data': {'*device': 'str', 'path': 'str'} }
+
+##
+# @FAILOVER_PRIMARY_CHANGED:
+#
+# Emitted whenever the driver of failover primary is loaded or unloaded
+# by the guest.
+#
+# @device: device name
+#
+# @path: device path
+#
+# @enabled: true if driver is loaded thus device is enabled in guest
+#
+# Since: 3.0
+#
+# Example:
+#
+# <- { "event": "FAILOVER_PRIMARY_CHANGED",
+#      "data": { "device": "vfio-0",
+#                "path": "/machine/peripheral/vfio-0" },
+#                "enabled": "true" },
+#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
+#
+##
+{ 'event': 'FAILOVER_PRIMARY_CHANGED',
+  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
  2018-12-10 16:15 ` [Qemu-devel] [PATCH 3/3] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover Venu Busireddy
@ 2018-12-10 17:31   ` Michael S. Tsirkin
  2018-12-10 20:23     ` si-wei liu
  2019-01-07 18:01     ` [Qemu-devel] [virtio-dev] " Venu Busireddy
  0 siblings, 2 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2018-12-10 17:31 UTC (permalink / raw)
  To: Venu Busireddy; +Cc: Marcel Apfelbaum, Si-Wei Liu, virtio-dev, qemu-devel

On Mon, Dec 10, 2018 at 11:15:48AM -0500, Venu Busireddy wrote:
> From: Si-Wei Liu <si-wei.liu@oracle.com>
> 
> When a VF is hotplugged into the guest, datapath switching will be
> performed immediately, which is sub-optimal in terms of timing, and
> could end up with substantial network downtime. One of ways to shorten
> this downtime is to switch the datapath only after the VF is seen to get
> enabled by guest, indicated by the bus master bit in VF's PCI config
> space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> at that time to indicate this condition. Then management stack can kick
> off datapath switching upon receiving the event.
> 
> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>

As management stacks can lose events, it's necessary
to also have a query command to check device status.


> ---
>  hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  qapi/net.json | 26 ++++++++++++++++++++++++++
>  2 files changed, 83 insertions(+)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index ce1f33c..ea24ca2 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -34,6 +34,7 @@
>  #include "pci.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> +#include "qapi/qapi-events-net.h"
>  
>  #define MSIX_CAP_LENGTH 12
>  
> @@ -42,6 +43,7 @@
>  
>  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
>  
>  /*
>   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
>  {
>      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>      uint32_t val_le = cpu_to_le32(val);
> +    bool may_notify = false;
> +    bool master_was = false;
>  
>      trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
>  
> @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
>                       __func__, vdev->vbasedev.name, addr, val, len);
>      }
>  
> +    /* Bus Master Enabling/Disabling */
> +    if (pdev->failover_primary && current_cpu &&
> +        range_covers_byte(addr, len, PCI_COMMAND)) {
> +        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> +                        PCI_COMMAND_MASTER);
> +        may_notify = true;
> +    }
> +
>      /* MSI/MSI-X Enabling/Disabling */
>      if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
>          ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
>          /* Write everything to QEMU to keep emulated bits correct */
>          pci_default_write_config(pdev, addr, val, len);
>      }
> +
> +    if (may_notify) {
> +        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> +                             PCI_COMMAND_MASTER);
> +        if (master_was != master_now) {
> +            vfio_failover_notify(vdev, master_now);
> +        }
> +    }
>  }
>  
>  /*

It's very easy to have guest trigger a high load of events by playing
with the bus master enable bits.  How about instead sending an event
that just says "something changed" without the current status and have
management issue a query command to check the status. QEMU then does not
need to re-send an event until management issues a query command.


> @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>      vdev->req_enabled = false;
>  }
>  
> +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> +{
> +    PCIDevice *pdev = &vdev->pdev;
> +    const char *n;
> +    gchar *path;
> +
> +    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> +    path = object_get_canonical_path(OBJECT(vdev));
> +    qapi_event_send_failover_primary_changed(!!n, n, path, status);
> +}
> +
>  static void vfio_realize(PCIDevice *pdev, Error **errp)
>  {
>      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
>      vfio_put_group(group);
>  }
>  
> +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
> +{
> +    PCIDevice *pdev = &vdev->pdev;
> +
> +    /*
> +     * Guest driver may not get the chance to disable bus mastering
> +     * before the device object gets to be unrealized. In that event,
> +     * send out a "disabled" notification on behalf of guest driver.
> +     */
> +    if (pdev->failover_primary &&
> +        pdev->bus_master_enable_region.enabled) {
> +        vfio_failover_notify(vdev, false);
> +    }
> +}
> +

With the idea above this won't be necessary anymore.


>  static void vfio_exitfn(PCIDevice *pdev)
>  {
>      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>  
> +    /*
> +     * During the guest reboot sequence, it is sometimes possible that
> +     * the guest may not get sufficient time to complete the entire driver
> +     * removal sequence, near the end of which a PCI config space write to
> +     * disable bus mastering can be intercepted by device. In such cases,
> +     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
> +     * is imperative to generate the event on the guest's behalf if the
> +     * guest fails to make it.
> +     */
> +    vfio_exit_failover_notify(vdev);
> +
>      vfio_unregister_req_notifier(vdev);
>      vfio_unregister_err_notifier(vdev);
>      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> diff --git a/qapi/net.json b/qapi/net.json
> index 04a9de9..e5992c8 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -727,3 +727,29 @@
>  ##
>  { 'event': 'FAILOVER_UNPLUG_PRIMARY',
>    'data': {'*device': 'str', 'path': 'str'} }
> +
> +##
> +# @FAILOVER_PRIMARY_CHANGED:
> +#
> +# Emitted whenever the driver of failover primary is loaded or unloaded
> +# by the guest.
> +#
> +# @device: device name
> +#
> +# @path: device path
> +#
> +# @enabled: true if driver is loaded thus device is enabled in guest
> +#
> +# Since: 3.0
> +#
> +# Example:
> +#
> +# <- { "event": "FAILOVER_PRIMARY_CHANGED",


This event doesn't seem to be failover specific.
How about a more generic name?



> +#      "data": { "device": "vfio-0",
> +#                "path": "/machine/peripheral/vfio-0" },
> +#                "enabled": "true" },
> +#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
> +#
> +##
> +{ 'event': 'FAILOVER_PRIMARY_CHANGED',
> +  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] virtio_net: Add support for "Data Path Switching" during Live Migration.
  2018-12-10 16:15 ` [Qemu-devel] [PATCH 2/3] virtio_net: Add support for "Data Path Switching" during Live Migration Venu Busireddy
@ 2018-12-10 17:41   ` Michael S. Tsirkin
  2018-12-10 20:22     ` si-wei liu
  2018-12-10 19:28   ` Eric Blake
  1 sibling, 1 reply; 12+ messages in thread
From: Michael S. Tsirkin @ 2018-12-10 17:41 UTC (permalink / raw)
  To: Venu Busireddy; +Cc: Marcel Apfelbaum, virtio-dev, qemu-devel

On Mon, Dec 10, 2018 at 11:15:47AM -0500, Venu Busireddy wrote:
> Added two new events, FAILOVER_PLUG_PRIMARY and FAILOVER_UNPLUG_PRIMARY.
> The first is emitted when the guest negotiates the F_STANDBY feature
> bit. The second is emitted when the virtio_net driver is removed, either
> manually or as a result of guest reboot.

So the names mean "should plug" and "should unplug" right?

It seems inelegant to tell upper layers what they should do.
After all they are upper layers because presumably
there is a policy question as opposed to a mechanism.
Can we expose information without telling managmenent what to do?

Alternatively, is there a real need to unplug the device completely?
Would it work to just hide the device from guest instead?  QEMU can do
it itself and this is the direction that Sameeh has been taking.

> Management stack can use these
> events to determine when to plug/unplug the VF device to/from the guest.
> 
> Also, the Virtual Functions will be automatically removed from the guest
> if the guest is rebooted. To properly identify the VFIO devices that
> must be removed, a new property named "x-failover-primary" is added to
> the vfio-pci devices. Only the vfio-pci devices that have this property
> enabled are removed from the guest upon reboot.

If this property needs to be set by management then
it must not start with "x-" - that prefix means
"unstable do not use from external tools".

> 
> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>



> ---
>  hw/acpi/pcihp.c                | 27 ++++++++++++++++++++++++++
>  hw/net/virtio-net.c            | 23 ++++++++++++++++++++++
>  hw/vfio/pci.c                  |  3 +++
>  hw/vfio/pci.h                  |  1 +
>  include/hw/pci/pci.h           |  1 +
>  include/hw/virtio/virtio-net.h |  4 ++++
>  qapi/net.json                  | 44 ++++++++++++++++++++++++++++++++++++++++++
>  7 files changed, 103 insertions(+)
> 
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index 80d42e1..2a3ffd3 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
>      }
>  }
>  
> +static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
> +{
> +    BusChild *kid, *next;
> +    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
> +
> +    if (!bus) {
> +        return;
> +    }
> +    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
> +        DeviceState *qdev = kid->child;
> +        PCIDevice *pdev = PCI_DEVICE(qdev);
> +        int slot = PCI_SLOT(pdev->devfn);
> +
> +        if (pdev->failover_primary) {
> +            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
> +        }
> +    }
> +}
> +
>  static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
>  {
>      BusChild *kid, *next;
> @@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
>      int i;
>  
>      for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
> +        /*
> +         * Set the acpi_pcihp_pci_status[].down bits of all the
> +         * failover_primary devices so that the devices are ejected
> +         * from the guest. We can't use the qdev_unplug() as well as the
> +         * hotplug_handler to unplug the devices, because the guest may
> +         * not be in a state to cooperate.
> +         */
> +        acpi_pcihp_cleanup_failover_primary(s, i);
>          acpi_pcihp_update_hotplug_bus(s, i);
>      }
>  }
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 411f8fb..d37f33c 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -248,6 +248,28 @@ static void virtio_net_drop_tx_queue_data(VirtIODevice *vdev, VirtQueue *vq)
>      }
>  }
>  
> +static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
> +{
> +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
> +
> +    if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_STANDBY)) {
> +        gchar *path = object_get_canonical_path(OBJECT(n->qdev));
> +        /*
> +         * Emit the FAILOVER_PLUG_PRIMARY event
> +         *   when the status transitions from 0 to VIRTIO_CONFIG_S_DRIVER_OK
> +         * Emit the FAILOVER_UNPLUG_PRIMARY event
> +         *   when the status transitions from VIRTIO_CONFIG_S_DRIVER_OK to 0
> +         */
> +        if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
> +                (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
> +            FAILOVER_NOTIFY_EVENT(plug, n, path);
> +        } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
> +                (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +            FAILOVER_NOTIFY_EVENT(unplug, n, path);
> +        }
> +    }
> +}
> +
>  static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>  {
>      VirtIONet *n = VIRTIO_NET(vdev);
> @@ -256,6 +278,7 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>      uint8_t queue_status;
>  
>      virtio_net_vnet_endian_status(n, status);
> +    virtio_net_failover_notify_event(n, status);
>      virtio_net_vhost_status(n, status);
>  
>      for (i = 0; i < n->max_queues; i++) {
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 5c7bd96..ce1f33c 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -3077,6 +3077,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>      vfio_register_err_notifier(vdev);
>      vfio_register_req_notifier(vdev);
>      vfio_setup_resetfn_quirk(vdev);
> +    pdev->failover_primary = vdev->failover_primary;
>  
>      return;
>  
> @@ -3219,6 +3220,8 @@ static Property vfio_pci_dev_properties[] = {
>                                     qdev_prop_nv_gpudirect_clique, uint8_t),
>      DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
>                                  OFF_AUTOPCIBAR_OFF),
> +    DEFINE_PROP_BOOL("x-failover-primary", VFIOPCIDevice,
> +                     failover_primary, false),
>      /*
>       * TODO - support passed fds... is this necessary?
>       * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index b1ae4c0..06ca661 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -167,6 +167,7 @@ typedef struct VFIOPCIDevice {
>      bool no_vfio_ioeventfd;
>      bool enable_ramfb;
>      VFIODisplay *dpy;
> +    bool failover_primary;
>  } VFIOPCIDevice;
>  
>  uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index e6514bb..b0111d1 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -351,6 +351,7 @@ struct PCIDevice {
>      MSIVectorUseNotifier msix_vector_use_notifier;
>      MSIVectorReleaseNotifier msix_vector_release_notifier;
>      MSIVectorPollNotifier msix_vector_poll_notifier;
> +    bool failover_primary;
>  };
>  
>  void pci_register_bar(PCIDevice *pci_dev, int region_num,
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index 4d7f3c8..a697903 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -22,6 +22,10 @@
>  #define VIRTIO_NET(obj) \
>          OBJECT_CHECK(VirtIONet, (obj), TYPE_VIRTIO_NET)
>  
> +#define FAILOVER_NOTIFY_EVENT(type, n, path) \
> +        qapi_event_send_failover_##type##_primary \
> +                (!!n->netclient_name, n->netclient_name, path)
> +
>  #define TX_TIMER_INTERVAL 150000 /* 150 us */
>  
>  /* Limit the number of packets that can be sent via a single flush
> diff --git a/qapi/net.json b/qapi/net.json
> index 8f99fd9..04a9de9 100644
> --- a/qapi/net.json
> +++ b/qapi/net.json
> @@ -683,3 +683,47 @@
>  ##
>  { 'event': 'NIC_RX_FILTER_CHANGED',
>    'data': { '*name': 'str', 'path': 'str' } }
> +
> +##
> +# @FAILOVER_PLUG_PRIMARY:
> +#
> +# Emitted when the guest successfully loads the driver after the STANDBY
> +# feature bit is negotiated.
> +#
> +# @device: Indicates the virtio_net device.
> +#
> +# @path: Indicates the device path.
> +#
> +# Since: 3.0
> +#
> +# Example:
> +#
> +# <- {"timestamp": {"seconds": 1432121972, "microseconds": 744001},
> +#     "event": "FAILOVER_PLUG_PRIMARY",
> +#     "data": {"device": "net0", "path": "/machine/peripheral/net0/virtio-backend"} }
> +#
> +##
> +{ 'event': 'FAILOVER_PLUG_PRIMARY',
> +  'data': {'*device': 'str', 'path': 'str'} }
> +
> +##
> +# @FAILOVER_UNPLUG_PRIMARY:
> +#
> +# Emitted when the guest resets the virtio_net driver.
> +# The reset can be the result of either unloading the driver or a reboot.
> +#
> +# @device: Indicates the virtio_net device.
> +#
> +# @path: Indicates the device path.
> +#
> +# Since: 3.0
> +#
> +# Example:
> +#
> +# <- {"timestamp": {"seconds": 1432121972, "microseconds": 744001},
> +#     "event": "FAILOVER_UNPLUG_PRIMARY",
> +#     "data": {"device": "net0", "path": "/machine/peripheral/net0/virtio-backend"} }
> +#
> +##
> +{ 'event': 'FAILOVER_UNPLUG_PRIMARY',
> +  'data': {'*device': 'str', 'path': 'str'} }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] virtio_net: Add support for "Data Path Switching" during Live Migration.
  2018-12-10 16:15 ` [Qemu-devel] [PATCH 2/3] virtio_net: Add support for "Data Path Switching" during Live Migration Venu Busireddy
  2018-12-10 17:41   ` Michael S. Tsirkin
@ 2018-12-10 19:28   ` Eric Blake
  2019-01-07 17:58     ` Venu Busireddy
  1 sibling, 1 reply; 12+ messages in thread
From: Eric Blake @ 2018-12-10 19:28 UTC (permalink / raw)
  To: Venu Busireddy, Michael S. Tsirkin, Marcel Apfelbaum
  Cc: virtio-dev, qemu-devel

On 12/10/18 10:15 AM, Venu Busireddy wrote:
> Added two new events, FAILOVER_PLUG_PRIMARY and FAILOVER_UNPLUG_PRIMARY.
> The first is emitted when the guest negotiates the F_STANDBY feature
> bit. The second is emitted when the virtio_net driver is removed, either
> manually or as a result of guest reboot. Management stack can use these
> events to determine when to plug/unplug the VF device to/from the guest.
> 
> Also, the Virtual Functions will be automatically removed from the guest
> if the guest is rebooted. To properly identify the VFIO devices that
> must be removed, a new property named "x-failover-primary" is added to
> the vfio-pci devices. Only the vfio-pci devices that have this property
> enabled are removed from the guest upon reboot.
> 
> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> ---

> +++ b/qapi/net.json
> @@ -683,3 +683,47 @@
>   ##
>   { 'event': 'NIC_RX_FILTER_CHANGED',
>     'data': { '*name': 'str', 'path': 'str' } }
> +
> +##
> +# @FAILOVER_PLUG_PRIMARY:
> +#
> +# Emitted when the guest successfully loads the driver after the STANDBY
> +# feature bit is negotiated.
> +#
> +# @device: Indicates the virtio_net device.
> +#
> +# @path: Indicates the device path.
> +#
> +# Since: 3.0

You've missed 3.0, and even 3.1.  This should be 4.0.

> +#
> +# Example:
> +#
> +# <- {"timestamp": {"seconds": 1432121972, "microseconds": 744001},
> +#     "event": "FAILOVER_PLUG_PRIMARY",
> +#     "data": {"device": "net0", "path": "/machine/peripheral/net0/virtio-backend"} }
> +#
> +##
> +{ 'event': 'FAILOVER_PLUG_PRIMARY',
> +  'data': {'*device': 'str', 'path': 'str'} }
> +
> +##
> +# @FAILOVER_UNPLUG_PRIMARY:
> +#
> +# Emitted when the guest resets the virtio_net driver.
> +# The reset can be the result of either unloading the driver or a reboot.
> +#
> +# @device: Indicates the virtio_net device.
> +#
> +# @path: Indicates the device path.
> +#
> +# Since: 3.0

and again


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] virtio_net: Add support for "Data Path Switching" during Live Migration.
  2018-12-10 17:41   ` Michael S. Tsirkin
@ 2018-12-10 20:22     ` si-wei liu
  2018-12-10 23:07       ` Michael S. Tsirkin
  0 siblings, 1 reply; 12+ messages in thread
From: si-wei liu @ 2018-12-10 20:22 UTC (permalink / raw)
  To: Michael S. Tsirkin, Venu Busireddy
  Cc: Marcel Apfelbaum, virtio-dev, qemu-devel, Liran Alon



On 12/10/2018 9:41 AM, Michael S. Tsirkin wrote:
> On Mon, Dec 10, 2018 at 11:15:47AM -0500, Venu Busireddy wrote:
>> Added two new events, FAILOVER_PLUG_PRIMARY and FAILOVER_UNPLUG_PRIMARY.
>> The first is emitted when the guest negotiates the F_STANDBY feature
>> bit. The second is emitted when the virtio_net driver is removed, either
>> manually or as a result of guest reboot.
> So the names mean "should plug" and "should unplug" right?
Sounds about right, but management stack may ignore the message if not 
going to plug in the primary device.

> It seems inelegant to tell upper layers what they should do.
> After all they are upper layers because presumably
> there is a policy question as opposed to a mechanism.
> Can we expose information without telling managmenent what to do?
Is it just the name itself that is offensive? The information exposed to 
upper layer is just that time is ready to plug/unplug the paired device. 
Would the names FAILOVER_STANDY_SET and FAILOVER_STANDY_CLEAR fit your 
expectation?

> Alternatively, is there a real need to unplug the device completely?
> Would it work to just hide the device from guest instead?  QEMU can do
> it itself and this is the direction that Sameeh has been taking.
See below in the cover letter:

3. While the hidden device model (viz. coupled device model) is still
    being explored for automatic hot plugging (QEMU) and automatic datapath
    switching (host-kernel), this series provides a supplemental set
    of interfaces if management software wants to drive the SR-IOV live
    migration on its own. It should not conflict with the hidden device
    model but just offers simplicity of implementation.

As said this is a supplemental implementation that involves and 
leverages management software before we can converge to the ideal 
hidden/coupled device model.

As I understand it, we're still exploring the best way to handle the 
datapath (MAC filter) switching problem for the hidden (coupled) device 
model, right. Even if we can hide the device there's still need to 
inform upper layer for datapath switching that is currently being 
handled in userspace management stack. I think the best fit for the 
hidden (coupled) model is that all datapath switching can be handled 
automatically and transparently in the kernel without needing to notify 
userspace and depend on userspace reaction.

>
>> Management stack can use these
>> events to determine when to plug/unplug the VF device to/from the guest.
>>
>> Also, the Virtual Functions will be automatically removed from the guest
>> if the guest is rebooted. To properly identify the VFIO devices that
>> must be removed, a new property named "x-failover-primary" is added to
>> the vfio-pci devices. Only the vfio-pci devices that have this property
>> enabled are removed from the guest upon reboot.
> If this property needs to be set by management then
> it must not start with "x-" - that prefix means
> "unstable do not use from external tools".

Sure, will remove "x-" (sorry, missed to correct it before sending out 
publicly).

Thanks,
-Siwei

>
>> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
>
>
>> ---
>>   hw/acpi/pcihp.c                | 27 ++++++++++++++++++++++++++
>>   hw/net/virtio-net.c            | 23 ++++++++++++++++++++++
>>   hw/vfio/pci.c                  |  3 +++
>>   hw/vfio/pci.h                  |  1 +
>>   include/hw/pci/pci.h           |  1 +
>>   include/hw/virtio/virtio-net.h |  4 ++++
>>   qapi/net.json                  | 44 ++++++++++++++++++++++++++++++++++++++++++
>>   7 files changed, 103 insertions(+)
>>
>> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
>> index 80d42e1..2a3ffd3 100644
>> --- a/hw/acpi/pcihp.c
>> +++ b/hw/acpi/pcihp.c
>> @@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
>>       }
>>   }
>>   
>> +static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
>> +{
>> +    BusChild *kid, *next;
>> +    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
>> +
>> +    if (!bus) {
>> +        return;
>> +    }
>> +    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
>> +        DeviceState *qdev = kid->child;
>> +        PCIDevice *pdev = PCI_DEVICE(qdev);
>> +        int slot = PCI_SLOT(pdev->devfn);
>> +
>> +        if (pdev->failover_primary) {
>> +            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
>> +        }
>> +    }
>> +}
>> +
>>   static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
>>   {
>>       BusChild *kid, *next;
>> @@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
>>       int i;
>>   
>>       for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
>> +        /*
>> +         * Set the acpi_pcihp_pci_status[].down bits of all the
>> +         * failover_primary devices so that the devices are ejected
>> +         * from the guest. We can't use the qdev_unplug() as well as the
>> +         * hotplug_handler to unplug the devices, because the guest may
>> +         * not be in a state to cooperate.
>> +         */
>> +        acpi_pcihp_cleanup_failover_primary(s, i);
>>           acpi_pcihp_update_hotplug_bus(s, i);
>>       }
>>   }
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 411f8fb..d37f33c 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -248,6 +248,28 @@ static void virtio_net_drop_tx_queue_data(VirtIODevice *vdev, VirtQueue *vq)
>>       }
>>   }
>>   
>> +static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
>> +{
>> +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
>> +
>> +    if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_STANDBY)) {
>> +        gchar *path = object_get_canonical_path(OBJECT(n->qdev));
>> +        /*
>> +         * Emit the FAILOVER_PLUG_PRIMARY event
>> +         *   when the status transitions from 0 to VIRTIO_CONFIG_S_DRIVER_OK
>> +         * Emit the FAILOVER_UNPLUG_PRIMARY event
>> +         *   when the status transitions from VIRTIO_CONFIG_S_DRIVER_OK to 0
>> +         */
>> +        if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
>> +                (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
>> +            FAILOVER_NOTIFY_EVENT(plug, n, path);
>> +        } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
>> +                (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
>> +            FAILOVER_NOTIFY_EVENT(unplug, n, path);
>> +        }
>> +    }
>> +}
>> +
>>   static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>>   {
>>       VirtIONet *n = VIRTIO_NET(vdev);
>> @@ -256,6 +278,7 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
>>       uint8_t queue_status;
>>   
>>       virtio_net_vnet_endian_status(n, status);
>> +    virtio_net_failover_notify_event(n, status);
>>       virtio_net_vhost_status(n, status);
>>   
>>       for (i = 0; i < n->max_queues; i++) {
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index 5c7bd96..ce1f33c 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -3077,6 +3077,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>>       vfio_register_err_notifier(vdev);
>>       vfio_register_req_notifier(vdev);
>>       vfio_setup_resetfn_quirk(vdev);
>> +    pdev->failover_primary = vdev->failover_primary;
>>   
>>       return;
>>   
>> @@ -3219,6 +3220,8 @@ static Property vfio_pci_dev_properties[] = {
>>                                      qdev_prop_nv_gpudirect_clique, uint8_t),
>>       DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
>>                                   OFF_AUTOPCIBAR_OFF),
>> +    DEFINE_PROP_BOOL("x-failover-primary", VFIOPCIDevice,
>> +                     failover_primary, false),
>>       /*
>>        * TODO - support passed fds... is this necessary?
>>        * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
>> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
>> index b1ae4c0..06ca661 100644
>> --- a/hw/vfio/pci.h
>> +++ b/hw/vfio/pci.h
>> @@ -167,6 +167,7 @@ typedef struct VFIOPCIDevice {
>>       bool no_vfio_ioeventfd;
>>       bool enable_ramfb;
>>       VFIODisplay *dpy;
>> +    bool failover_primary;
>>   } VFIOPCIDevice;
>>   
>>   uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>> index e6514bb..b0111d1 100644
>> --- a/include/hw/pci/pci.h
>> +++ b/include/hw/pci/pci.h
>> @@ -351,6 +351,7 @@ struct PCIDevice {
>>       MSIVectorUseNotifier msix_vector_use_notifier;
>>       MSIVectorReleaseNotifier msix_vector_release_notifier;
>>       MSIVectorPollNotifier msix_vector_poll_notifier;
>> +    bool failover_primary;
>>   };
>>   
>>   void pci_register_bar(PCIDevice *pci_dev, int region_num,
>> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
>> index 4d7f3c8..a697903 100644
>> --- a/include/hw/virtio/virtio-net.h
>> +++ b/include/hw/virtio/virtio-net.h
>> @@ -22,6 +22,10 @@
>>   #define VIRTIO_NET(obj) \
>>           OBJECT_CHECK(VirtIONet, (obj), TYPE_VIRTIO_NET)
>>   
>> +#define FAILOVER_NOTIFY_EVENT(type, n, path) \
>> +        qapi_event_send_failover_##type##_primary \
>> +                (!!n->netclient_name, n->netclient_name, path)
>> +
>>   #define TX_TIMER_INTERVAL 150000 /* 150 us */
>>   
>>   /* Limit the number of packets that can be sent via a single flush
>> diff --git a/qapi/net.json b/qapi/net.json
>> index 8f99fd9..04a9de9 100644
>> --- a/qapi/net.json
>> +++ b/qapi/net.json
>> @@ -683,3 +683,47 @@
>>   ##
>>   { 'event': 'NIC_RX_FILTER_CHANGED',
>>     'data': { '*name': 'str', 'path': 'str' } }
>> +
>> +##
>> +# @FAILOVER_PLUG_PRIMARY:
>> +#
>> +# Emitted when the guest successfully loads the driver after the STANDBY
>> +# feature bit is negotiated.
>> +#
>> +# @device: Indicates the virtio_net device.
>> +#
>> +# @path: Indicates the device path.
>> +#
>> +# Since: 3.0
>> +#
>> +# Example:
>> +#
>> +# <- {"timestamp": {"seconds": 1432121972, "microseconds": 744001},
>> +#     "event": "FAILOVER_PLUG_PRIMARY",
>> +#     "data": {"device": "net0", "path": "/machine/peripheral/net0/virtio-backend"} }
>> +#
>> +##
>> +{ 'event': 'FAILOVER_PLUG_PRIMARY',
>> +  'data': {'*device': 'str', 'path': 'str'} }
>> +
>> +##
>> +# @FAILOVER_UNPLUG_PRIMARY:
>> +#
>> +# Emitted when the guest resets the virtio_net driver.
>> +# The reset can be the result of either unloading the driver or a reboot.
>> +#
>> +# @device: Indicates the virtio_net device.
>> +#
>> +# @path: Indicates the device path.
>> +#
>> +# Since: 3.0
>> +#
>> +# Example:
>> +#
>> +# <- {"timestamp": {"seconds": 1432121972, "microseconds": 744001},
>> +#     "event": "FAILOVER_UNPLUG_PRIMARY",
>> +#     "data": {"device": "net0", "path": "/machine/peripheral/net0/virtio-backend"} }
>> +#
>> +##
>> +{ 'event': 'FAILOVER_UNPLUG_PRIMARY',
>> +  'data': {'*device': 'str', 'path': 'str'} }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
  2018-12-10 17:31   ` Michael S. Tsirkin
@ 2018-12-10 20:23     ` si-wei liu
  2019-01-07 18:01     ` [Qemu-devel] [virtio-dev] " Venu Busireddy
  1 sibling, 0 replies; 12+ messages in thread
From: si-wei liu @ 2018-12-10 20:23 UTC (permalink / raw)
  To: Michael S. Tsirkin, Venu Busireddy
  Cc: Marcel Apfelbaum, qemu-devel, virtio-dev



On 12/10/2018 9:31 AM, Michael S. Tsirkin wrote:
> On Mon, Dec 10, 2018 at 11:15:48AM -0500, Venu Busireddy wrote:
>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>
>> When a VF is hotplugged into the guest, datapath switching will be
>> performed immediately, which is sub-optimal in terms of timing, and
>> could end up with substantial network downtime. One of ways to shorten
>> this downtime is to switch the datapath only after the VF is seen to get
>> enabled by guest, indicated by the bus master bit in VF's PCI config
>> space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
>> at that time to indicate this condition. Then management stack can kick
>> off datapath switching upon receiving the event.
>>
>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> As management stacks can lose events, it's necessary
> to also have a query command to check device status.

Hmm, makes sense. Will do.

-Siwei
>
>> ---
>>   hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>   qapi/net.json | 26 ++++++++++++++++++++++++++
>>   2 files changed, 83 insertions(+)
>>
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index ce1f33c..ea24ca2 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -34,6 +34,7 @@
>>   #include "pci.h"
>>   #include "trace.h"
>>   #include "qapi/error.h"
>> +#include "qapi/qapi-events-net.h"
>>   
>>   #define MSIX_CAP_LENGTH 12
>>   
>> @@ -42,6 +43,7 @@
>>   
>>   static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>>   static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>> +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
>>   
>>   /*
>>    * Disabling BAR mmaping can be slow, but toggling it around INTx can
>> @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
>>   {
>>       VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>>       uint32_t val_le = cpu_to_le32(val);
>> +    bool may_notify = false;
>> +    bool master_was = false;
>>   
>>       trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
>>   
>> @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
>>                        __func__, vdev->vbasedev.name, addr, val, len);
>>       }
>>   
>> +    /* Bus Master Enabling/Disabling */
>> +    if (pdev->failover_primary && current_cpu &&
>> +        range_covers_byte(addr, len, PCI_COMMAND)) {
>> +        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
>> +                        PCI_COMMAND_MASTER);
>> +        may_notify = true;
>> +    }
>> +
>>       /* MSI/MSI-X Enabling/Disabling */
>>       if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
>>           ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
>> @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
>>           /* Write everything to QEMU to keep emulated bits correct */
>>           pci_default_write_config(pdev, addr, val, len);
>>       }
>> +
>> +    if (may_notify) {
>> +        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
>> +                             PCI_COMMAND_MASTER);
>> +        if (master_was != master_now) {
>> +            vfio_failover_notify(vdev, master_now);
>> +        }
>> +    }
>>   }
>>   
>>   /*
> It's very easy to have guest trigger a high load of events by playing
> with the bus master enable bits.  How about instead sending an event
> that just says "something changed" without the current status and have
> management issue a query command to check the status. QEMU then does not
> need to re-send an event until management issues a query command.
>
>
>> @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>>       vdev->req_enabled = false;
>>   }
>>   
>> +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
>> +{
>> +    PCIDevice *pdev = &vdev->pdev;
>> +    const char *n;
>> +    gchar *path;
>> +
>> +    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
>> +    path = object_get_canonical_path(OBJECT(vdev));
>> +    qapi_event_send_failover_primary_changed(!!n, n, path, status);
>> +}
>> +
>>   static void vfio_realize(PCIDevice *pdev, Error **errp)
>>   {
>>       VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>> @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
>>       vfio_put_group(group);
>>   }
>>   
>> +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
>> +{
>> +    PCIDevice *pdev = &vdev->pdev;
>> +
>> +    /*
>> +     * Guest driver may not get the chance to disable bus mastering
>> +     * before the device object gets to be unrealized. In that event,
>> +     * send out a "disabled" notification on behalf of guest driver.
>> +     */
>> +    if (pdev->failover_primary &&
>> +        pdev->bus_master_enable_region.enabled) {
>> +        vfio_failover_notify(vdev, false);
>> +    }
>> +}
>> +
> With the idea above this won't be necessary anymore.
>
>
>>   static void vfio_exitfn(PCIDevice *pdev)
>>   {
>>       VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>>   
>> +    /*
>> +     * During the guest reboot sequence, it is sometimes possible that
>> +     * the guest may not get sufficient time to complete the entire driver
>> +     * removal sequence, near the end of which a PCI config space write to
>> +     * disable bus mastering can be intercepted by device. In such cases,
>> +     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
>> +     * is imperative to generate the event on the guest's behalf if the
>> +     * guest fails to make it.
>> +     */
>> +    vfio_exit_failover_notify(vdev);
>> +
>>       vfio_unregister_req_notifier(vdev);
>>       vfio_unregister_err_notifier(vdev);
>>       pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>> diff --git a/qapi/net.json b/qapi/net.json
>> index 04a9de9..e5992c8 100644
>> --- a/qapi/net.json
>> +++ b/qapi/net.json
>> @@ -727,3 +727,29 @@
>>   ##
>>   { 'event': 'FAILOVER_UNPLUG_PRIMARY',
>>     'data': {'*device': 'str', 'path': 'str'} }
>> +
>> +##
>> +# @FAILOVER_PRIMARY_CHANGED:
>> +#
>> +# Emitted whenever the driver of failover primary is loaded or unloaded
>> +# by the guest.
>> +#
>> +# @device: device name
>> +#
>> +# @path: device path
>> +#
>> +# @enabled: true if driver is loaded thus device is enabled in guest
>> +#
>> +# Since: 3.0
>> +#
>> +# Example:
>> +#
>> +# <- { "event": "FAILOVER_PRIMARY_CHANGED",
>
> This event doesn't seem to be failover specific.
> How about a more generic name?
>
>
>
>> +#      "data": { "device": "vfio-0",
>> +#                "path": "/machine/peripheral/vfio-0" },
>> +#                "enabled": "true" },
>> +#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
>> +#
>> +##
>> +{ 'event': 'FAILOVER_PRIMARY_CHANGED',
>> +  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] virtio_net: Add support for "Data Path Switching" during Live Migration.
  2018-12-10 20:22     ` si-wei liu
@ 2018-12-10 23:07       ` Michael S. Tsirkin
  0 siblings, 0 replies; 12+ messages in thread
From: Michael S. Tsirkin @ 2018-12-10 23:07 UTC (permalink / raw)
  To: si-wei liu
  Cc: Venu Busireddy, Marcel Apfelbaum, virtio-dev, qemu-devel,
	Liran Alon

On Mon, Dec 10, 2018 at 12:22:37PM -0800, si-wei liu wrote:
> 
> 
> On 12/10/2018 9:41 AM, Michael S. Tsirkin wrote:
> > On Mon, Dec 10, 2018 at 11:15:47AM -0500, Venu Busireddy wrote:
> > > Added two new events, FAILOVER_PLUG_PRIMARY and FAILOVER_UNPLUG_PRIMARY.
> > > The first is emitted when the guest negotiates the F_STANDBY feature
> > > bit. The second is emitted when the virtio_net driver is removed, either
> > > manually or as a result of guest reboot.
> > So the names mean "should plug" and "should unplug" right?
> Sounds about right, but management stack may ignore the message if not going
> to plug in the primary device.
> 
> > It seems inelegant to tell upper layers what they should do.
> > After all they are upper layers because presumably
> > there is a policy question as opposed to a mechanism.
> > Can we expose information without telling managmenent what to do?
> Is it just the name itself that is offensive? The information exposed to
> upper layer is just that time is ready to plug/unplug the paired device.
> Would the names FAILOVER_STANDY_SET and FAILOVER_STANDY_CLEAR fit your
> expectation?

So really the same thing applies as with the other events:
event should just signal a change, status should be
reported with a query command. Avoids event floods
and handles management restarts gracefully.


> > Alternatively, is there a real need to unplug the device completely?
> > Would it work to just hide the device from guest instead?  QEMU can do
> > it itself and this is the direction that Sameeh has been taking.
> See below in the cover letter:
> 
> 3. While the hidden device model (viz. coupled device model) is still
>    being explored for automatic hot plugging (QEMU) and automatic datapath
>    switching (host-kernel), this series provides a supplemental set
>    of interfaces if management software wants to drive the SR-IOV live
>    migration on its own. It should not conflict with the hidden device
>    model but just offers simplicity of implementation.
> 
> As said this is a supplemental implementation that involves and leverages
> management software before we can converge to the ideal hidden/coupled
> device model.

So I don't think there's a problem with sending an event when
the feature bit gets set/cleared, as long as there's
no implication that management is required to react
in a certain way. So yes, it's a naming issue at that level.

> As I understand it, we're still exploring the best way to handle the
> datapath (MAC filter) switching problem for the hidden (coupled) device
> model, right. Even if we can hide the device there's still need to inform
> upper layer for datapath switching that is currently being handled in
> userspace management stack. I think the best fit for the hidden (coupled)
> model is that all datapath switching can be handled automatically and
> transparently in the kernel without needing to notify userspace and depend
> on userspace reaction.
> 
> > 
> > > Management stack can use these
> > > events to determine when to plug/unplug the VF device to/from the guest.
> > > 
> > > Also, the Virtual Functions will be automatically removed from the guest
> > > if the guest is rebooted. To properly identify the VFIO devices that
> > > must be removed, a new property named "x-failover-primary" is added to
> > > the vfio-pci devices. Only the vfio-pci devices that have this property
> > > enabled are removed from the guest upon reboot.
> > If this property needs to be set by management then
> > it must not start with "x-" - that prefix means
> > "unstable do not use from external tools".
> 
> Sure, will remove "x-" (sorry, missed to correct it before sending out
> publicly).
> 
> Thanks,
> -Siwei
> 
> > 
> > > Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> > 
> > 
> > > ---
> > >   hw/acpi/pcihp.c                | 27 ++++++++++++++++++++++++++
> > >   hw/net/virtio-net.c            | 23 ++++++++++++++++++++++
> > >   hw/vfio/pci.c                  |  3 +++
> > >   hw/vfio/pci.h                  |  1 +
> > >   include/hw/pci/pci.h           |  1 +
> > >   include/hw/virtio/virtio-net.h |  4 ++++
> > >   qapi/net.json                  | 44 ++++++++++++++++++++++++++++++++++++++++++
> > >   7 files changed, 103 insertions(+)
> > > 
> > > diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> > > index 80d42e1..2a3ffd3 100644
> > > --- a/hw/acpi/pcihp.c
> > > +++ b/hw/acpi/pcihp.c
> > > @@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, unsigned bsel, unsigned slo
> > >       }
> > >   }
> > > +static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
> > > +{
> > > +    BusChild *kid, *next;
> > > +    PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
> > > +
> > > +    if (!bus) {
> > > +        return;
> > > +    }
> > > +    QTAILQ_FOREACH_SAFE(kid, &bus->qbus.children, sibling, next) {
> > > +        DeviceState *qdev = kid->child;
> > > +        PCIDevice *pdev = PCI_DEVICE(qdev);
> > > +        int slot = PCI_SLOT(pdev->devfn);
> > > +
> > > +        if (pdev->failover_primary) {
> > > +            s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
> > > +        }
> > > +    }
> > > +}
> > > +
> > >   static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
> > >   {
> > >       BusChild *kid, *next;
> > > @@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
> > >       int i;
> > >       for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
> > > +        /*
> > > +         * Set the acpi_pcihp_pci_status[].down bits of all the
> > > +         * failover_primary devices so that the devices are ejected
> > > +         * from the guest. We can't use the qdev_unplug() as well as the
> > > +         * hotplug_handler to unplug the devices, because the guest may
> > > +         * not be in a state to cooperate.
> > > +         */
> > > +        acpi_pcihp_cleanup_failover_primary(s, i);
> > >           acpi_pcihp_update_hotplug_bus(s, i);
> > >       }
> > >   }
> > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > > index 411f8fb..d37f33c 100644
> > > --- a/hw/net/virtio-net.c
> > > +++ b/hw/net/virtio-net.c
> > > @@ -248,6 +248,28 @@ static void virtio_net_drop_tx_queue_data(VirtIODevice *vdev, VirtQueue *vq)
> > >       }
> > >   }
> > > +static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
> > > +{
> > > +    VirtIODevice *vdev = VIRTIO_DEVICE(n);
> > > +
> > > +    if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_STANDBY)) {
> > > +        gchar *path = object_get_canonical_path(OBJECT(n->qdev));
> > > +        /*
> > > +         * Emit the FAILOVER_PLUG_PRIMARY event
> > > +         *   when the status transitions from 0 to VIRTIO_CONFIG_S_DRIVER_OK
> > > +         * Emit the FAILOVER_UNPLUG_PRIMARY event
> > > +         *   when the status transitions from VIRTIO_CONFIG_S_DRIVER_OK to 0
> > > +         */
> > > +        if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
> > > +                (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
> > > +            FAILOVER_NOTIFY_EVENT(plug, n, path);
> > > +        } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
> > > +                (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
> > > +            FAILOVER_NOTIFY_EVENT(unplug, n, path);
> > > +        }
> > > +    }
> > > +}
> > > +
> > >   static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
> > >   {
> > >       VirtIONet *n = VIRTIO_NET(vdev);
> > > @@ -256,6 +278,7 @@ static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
> > >       uint8_t queue_status;
> > >       virtio_net_vnet_endian_status(n, status);
> > > +    virtio_net_failover_notify_event(n, status);
> > >       virtio_net_vhost_status(n, status);
> > >       for (i = 0; i < n->max_queues; i++) {
> > > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > > index 5c7bd96..ce1f33c 100644
> > > --- a/hw/vfio/pci.c
> > > +++ b/hw/vfio/pci.c
> > > @@ -3077,6 +3077,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> > >       vfio_register_err_notifier(vdev);
> > >       vfio_register_req_notifier(vdev);
> > >       vfio_setup_resetfn_quirk(vdev);
> > > +    pdev->failover_primary = vdev->failover_primary;
> > >       return;
> > > @@ -3219,6 +3220,8 @@ static Property vfio_pci_dev_properties[] = {
> > >                                      qdev_prop_nv_gpudirect_clique, uint8_t),
> > >       DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
> > >                                   OFF_AUTOPCIBAR_OFF),
> > > +    DEFINE_PROP_BOOL("x-failover-primary", VFIOPCIDevice,
> > > +                     failover_primary, false),
> > >       /*
> > >        * TODO - support passed fds... is this necessary?
> > >        * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
> > > diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> > > index b1ae4c0..06ca661 100644
> > > --- a/hw/vfio/pci.h
> > > +++ b/hw/vfio/pci.h
> > > @@ -167,6 +167,7 @@ typedef struct VFIOPCIDevice {
> > >       bool no_vfio_ioeventfd;
> > >       bool enable_ramfb;
> > >       VFIODisplay *dpy;
> > > +    bool failover_primary;
> > >   } VFIOPCIDevice;
> > >   uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
> > > diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> > > index e6514bb..b0111d1 100644
> > > --- a/include/hw/pci/pci.h
> > > +++ b/include/hw/pci/pci.h
> > > @@ -351,6 +351,7 @@ struct PCIDevice {
> > >       MSIVectorUseNotifier msix_vector_use_notifier;
> > >       MSIVectorReleaseNotifier msix_vector_release_notifier;
> > >       MSIVectorPollNotifier msix_vector_poll_notifier;
> > > +    bool failover_primary;
> > >   };
> > >   void pci_register_bar(PCIDevice *pci_dev, int region_num,
> > > diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> > > index 4d7f3c8..a697903 100644
> > > --- a/include/hw/virtio/virtio-net.h
> > > +++ b/include/hw/virtio/virtio-net.h
> > > @@ -22,6 +22,10 @@
> > >   #define VIRTIO_NET(obj) \
> > >           OBJECT_CHECK(VirtIONet, (obj), TYPE_VIRTIO_NET)
> > > +#define FAILOVER_NOTIFY_EVENT(type, n, path) \
> > > +        qapi_event_send_failover_##type##_primary \
> > > +                (!!n->netclient_name, n->netclient_name, path)
> > > +
> > >   #define TX_TIMER_INTERVAL 150000 /* 150 us */
> > >   /* Limit the number of packets that can be sent via a single flush
> > > diff --git a/qapi/net.json b/qapi/net.json
> > > index 8f99fd9..04a9de9 100644
> > > --- a/qapi/net.json
> > > +++ b/qapi/net.json
> > > @@ -683,3 +683,47 @@
> > >   ##
> > >   { 'event': 'NIC_RX_FILTER_CHANGED',
> > >     'data': { '*name': 'str', 'path': 'str' } }
> > > +
> > > +##
> > > +# @FAILOVER_PLUG_PRIMARY:
> > > +#
> > > +# Emitted when the guest successfully loads the driver after the STANDBY
> > > +# feature bit is negotiated.
> > > +#
> > > +# @device: Indicates the virtio_net device.
> > > +#
> > > +# @path: Indicates the device path.
> > > +#
> > > +# Since: 3.0
> > > +#
> > > +# Example:
> > > +#
> > > +# <- {"timestamp": {"seconds": 1432121972, "microseconds": 744001},
> > > +#     "event": "FAILOVER_PLUG_PRIMARY",
> > > +#     "data": {"device": "net0", "path": "/machine/peripheral/net0/virtio-backend"} }
> > > +#
> > > +##
> > > +{ 'event': 'FAILOVER_PLUG_PRIMARY',
> > > +  'data': {'*device': 'str', 'path': 'str'} }
> > > +
> > > +##
> > > +# @FAILOVER_UNPLUG_PRIMARY:
> > > +#
> > > +# Emitted when the guest resets the virtio_net driver.
> > > +# The reset can be the result of either unloading the driver or a reboot.
> > > +#
> > > +# @device: Indicates the virtio_net device.
> > > +#
> > > +# @path: Indicates the device path.
> > > +#
> > > +# Since: 3.0
> > > +#
> > > +# Example:
> > > +#
> > > +# <- {"timestamp": {"seconds": 1432121972, "microseconds": 744001},
> > > +#     "event": "FAILOVER_UNPLUG_PRIMARY",
> > > +#     "data": {"device": "net0", "path": "/machine/peripheral/net0/virtio-backend"} }
> > > +#
> > > +##
> > > +{ 'event': 'FAILOVER_UNPLUG_PRIMARY',
> > > +  'data': {'*device': 'str', 'path': 'str'} }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] virtio_net: Add support for "Data Path Switching" during Live Migration.
  2018-12-10 19:28   ` Eric Blake
@ 2019-01-07 17:58     ` Venu Busireddy
  0 siblings, 0 replies; 12+ messages in thread
From: Venu Busireddy @ 2019-01-07 17:58 UTC (permalink / raw)
  To: Eric Blake; +Cc: Michael S. Tsirkin, Marcel Apfelbaum, virtio-dev, qemu-devel

On 2018-12-10 13:28:42 -0600, Eric Blake wrote:
> On 12/10/18 10:15 AM, Venu Busireddy wrote:
> > Added two new events, FAILOVER_PLUG_PRIMARY and FAILOVER_UNPLUG_PRIMARY.
> > The first is emitted when the guest negotiates the F_STANDBY feature
> > bit. The second is emitted when the virtio_net driver is removed, either
> > manually or as a result of guest reboot. Management stack can use these
> > events to determine when to plug/unplug the VF device to/from the guest.
> > 
> > Also, the Virtual Functions will be automatically removed from the guest
> > if the guest is rebooted. To properly identify the VFIO devices that
> > must be removed, a new property named "x-failover-primary" is added to
> > the vfio-pci devices. Only the vfio-pci devices that have this property
> > enabled are removed from the guest upon reboot.
> > 
> > Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> > ---
> 
> > +++ b/qapi/net.json
> > @@ -683,3 +683,47 @@
> >   ##
> >   { 'event': 'NIC_RX_FILTER_CHANGED',
> >     'data': { '*name': 'str', 'path': 'str' } }
> > +
> > +##
> > +# @FAILOVER_PLUG_PRIMARY:
> > +#
> > +# Emitted when the guest successfully loads the driver after the STANDBY
> > +# feature bit is negotiated.
> > +#
> > +# @device: Indicates the virtio_net device.
> > +#
> > +# @path: Indicates the device path.
> > +#
> > +# Since: 3.0
> 
> You've missed 3.0, and even 3.1.  This should be 4.0.

Sorry for the delay in responding...

Posted v2 including your suggestion at
https://lists.oasis-open.org/archives/virtio-dev/201901/msg00046.html

Thanks!

> 
> > +#
> > +# Example:
> > +#
> > +# <- {"timestamp": {"seconds": 1432121972, "microseconds": 744001},
> > +#     "event": "FAILOVER_PLUG_PRIMARY",
> > +#     "data": {"device": "net0", "path": "/machine/peripheral/net0/virtio-backend"} }
> > +#
> > +##
> > +{ 'event': 'FAILOVER_PLUG_PRIMARY',
> > +  'data': {'*device': 'str', 'path': 'str'} }
> > +
> > +##
> > +# @FAILOVER_UNPLUG_PRIMARY:
> > +#
> > +# Emitted when the guest resets the virtio_net driver.
> > +# The reset can be the result of either unloading the driver or a reboot.
> > +#
> > +# @device: Indicates the virtio_net device.
> > +#
> > +# @path: Indicates the device path.
> > +#
> > +# Since: 3.0
> 
> and again
> 
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [virtio-dev] Re: [PATCH 3/3] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
  2018-12-10 17:31   ` Michael S. Tsirkin
  2018-12-10 20:23     ` si-wei liu
@ 2019-01-07 18:01     ` Venu Busireddy
  1 sibling, 0 replies; 12+ messages in thread
From: Venu Busireddy @ 2019-01-07 18:01 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Marcel Apfelbaum, Si-Wei Liu, virtio-dev, qemu-devel

On 2018-12-10 12:31:43 -0500, Michael S. Tsirkin wrote:
> On Mon, Dec 10, 2018 at 11:15:48AM -0500, Venu Busireddy wrote:
> > From: Si-Wei Liu <si-wei.liu@oracle.com>
> > 
> > When a VF is hotplugged into the guest, datapath switching will be
> > performed immediately, which is sub-optimal in terms of timing, and
> > could end up with substantial network downtime. One of ways to shorten
> > this downtime is to switch the datapath only after the VF is seen to get
> > enabled by guest, indicated by the bus master bit in VF's PCI config
> > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > at that time to indicate this condition. Then management stack can kick
> > off datapath switching upon receiving the event.
> > 
> > Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
> > Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
> 
> As management stacks can lose events, it's necessary
> to also have a query command to check device status.

Thanks for the feedback. Implemented the changes, and posted v2:

    https://lists.oasis-open.org/archives/virtio-dev/201901/msg00046.html


> > ---
> >  hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  qapi/net.json | 26 ++++++++++++++++++++++++++
> >  2 files changed, 83 insertions(+)
> > 
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index ce1f33c..ea24ca2 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -34,6 +34,7 @@
> >  #include "pci.h"
> >  #include "trace.h"
> >  #include "qapi/error.h"
> > +#include "qapi/qapi-events-net.h"
> >  
> >  #define MSIX_CAP_LENGTH 12
> >  
> > @@ -42,6 +43,7 @@
> >  
> >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
> >  
> >  /*
> >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >  {
> >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> >      uint32_t val_le = cpu_to_le32(val);
> > +    bool may_notify = false;
> > +    bool master_was = false;
> >  
> >      trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
> >  
> > @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >                       __func__, vdev->vbasedev.name, addr, val, len);
> >      }
> >  
> > +    /* Bus Master Enabling/Disabling */
> > +    if (pdev->failover_primary && current_cpu &&
> > +        range_covers_byte(addr, len, PCI_COMMAND)) {
> > +        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > +                        PCI_COMMAND_MASTER);
> > +        may_notify = true;
> > +    }
> > +
> >      /* MSI/MSI-X Enabling/Disabling */
> >      if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
> >          ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> > @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >          /* Write everything to QEMU to keep emulated bits correct */
> >          pci_default_write_config(pdev, addr, val, len);
> >      }
> > +
> > +    if (may_notify) {
> > +        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > +                             PCI_COMMAND_MASTER);
> > +        if (master_was != master_now) {
> > +            vfio_failover_notify(vdev, master_now);
> > +        }
> > +    }
> >  }
> >  
> >  /*
> 
> It's very easy to have guest trigger a high load of events by playing
> with the bus master enable bits.  How about instead sending an event
> that just says "something changed" without the current status and have
> management issue a query command to check the status. QEMU then does not
> need to re-send an event until management issues a query command.
> 
> 
> > @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> >      vdev->req_enabled = false;
> >  }
> >  
> > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> > +{
> > +    PCIDevice *pdev = &vdev->pdev;
> > +    const char *n;
> > +    gchar *path;
> > +
> > +    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> > +    path = object_get_canonical_path(OBJECT(vdev));
> > +    qapi_event_send_failover_primary_changed(!!n, n, path, status);
> > +}
> > +
> >  static void vfio_realize(PCIDevice *pdev, Error **errp)
> >  {
> >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> > @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
> >      vfio_put_group(group);
> >  }
> >  
> > +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
> > +{
> > +    PCIDevice *pdev = &vdev->pdev;
> > +
> > +    /*
> > +     * Guest driver may not get the chance to disable bus mastering
> > +     * before the device object gets to be unrealized. In that event,
> > +     * send out a "disabled" notification on behalf of guest driver.
> > +     */
> > +    if (pdev->failover_primary &&
> > +        pdev->bus_master_enable_region.enabled) {
> > +        vfio_failover_notify(vdev, false);
> > +    }
> > +}
> > +
> 
> With the idea above this won't be necessary anymore.
> 
> 
> >  static void vfio_exitfn(PCIDevice *pdev)
> >  {
> >      VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> >  
> > +    /*
> > +     * During the guest reboot sequence, it is sometimes possible that
> > +     * the guest may not get sufficient time to complete the entire driver
> > +     * removal sequence, near the end of which a PCI config space write to
> > +     * disable bus mastering can be intercepted by device. In such cases,
> > +     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
> > +     * is imperative to generate the event on the guest's behalf if the
> > +     * guest fails to make it.
> > +     */
> > +    vfio_exit_failover_notify(vdev);
> > +
> >      vfio_unregister_req_notifier(vdev);
> >      vfio_unregister_err_notifier(vdev);
> >      pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
> > diff --git a/qapi/net.json b/qapi/net.json
> > index 04a9de9..e5992c8 100644
> > --- a/qapi/net.json
> > +++ b/qapi/net.json
> > @@ -727,3 +727,29 @@
> >  ##
> >  { 'event': 'FAILOVER_UNPLUG_PRIMARY',
> >    'data': {'*device': 'str', 'path': 'str'} }
> > +
> > +##
> > +# @FAILOVER_PRIMARY_CHANGED:
> > +#
> > +# Emitted whenever the driver of failover primary is loaded or unloaded
> > +# by the guest.
> > +#
> > +# @device: device name
> > +#
> > +# @path: device path
> > +#
> > +# @enabled: true if driver is loaded thus device is enabled in guest
> > +#
> > +# Since: 3.0
> > +#
> > +# Example:
> > +#
> > +# <- { "event": "FAILOVER_PRIMARY_CHANGED",
> 
> 
> This event doesn't seem to be failover specific.
> How about a more generic name?
> 
> 
> 
> > +#      "data": { "device": "vfio-0",
> > +#                "path": "/machine/peripheral/vfio-0" },
> > +#                "enabled": "true" },
> > +#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
> > +#
> > +##
> > +{ 'event': 'FAILOVER_PRIMARY_CHANGED',
> > +  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: virtio-dev-unsubscribe@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-help@lists.oasis-open.org
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-01-07 18:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-12-10 16:15 [Qemu-devel] [PATCH 0/3] Support for datapath switching during live migration Venu Busireddy
2018-12-10 16:15 ` [Qemu-devel] [PATCH 1/3] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit Venu Busireddy
2018-12-10 16:15 ` [Qemu-devel] [PATCH 2/3] virtio_net: Add support for "Data Path Switching" during Live Migration Venu Busireddy
2018-12-10 17:41   ` Michael S. Tsirkin
2018-12-10 20:22     ` si-wei liu
2018-12-10 23:07       ` Michael S. Tsirkin
2018-12-10 19:28   ` Eric Blake
2019-01-07 17:58     ` Venu Busireddy
2018-12-10 16:15 ` [Qemu-devel] [PATCH 3/3] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover Venu Busireddy
2018-12-10 17:31   ` Michael S. Tsirkin
2018-12-10 20:23     ` si-wei liu
2019-01-07 18:01     ` [Qemu-devel] [virtio-dev] " Venu Busireddy

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).