qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: si-wei liu <si-wei.liu@oracle.com>
To: Alex Williamson <alex.williamson@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>
Cc: Marcel Apfelbaum <marcel@redhat.com>,
	Venu Busireddy <venu.busireddy@oracle.com>,
	qemu-devel@nongnu.org, virtio-dev@lists.oasis-open.org
Subject: Re: [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover
Date: Mon, 7 Jan 2019 17:13:53 -0800	[thread overview]
Message-ID: <6155b136-fa44-445b-4a08-b9bdbc5cd372@oracle.com> (raw)
In-Reply-To: <20190107164115.6daffe19@x1.home>



On 01/07/2019 03:41 PM, Alex Williamson wrote:
> On Mon, 7 Jan 2019 18:22:20 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
>> On Mon, Jan 07, 2019 at 04:17:17PM -0700, Alex Williamson wrote:
>>> On Mon,  7 Jan 2019 17:29:43 -0500
>>> Venu Busireddy <venu.busireddy@oracle.com> wrote:
>>>    
>>>> From: Si-Wei Liu <si-wei.liu@oracle.com>
>>>>
>>>> When a VF is hotplugged into the guest, datapath switching will be
>>>> performed immediately, which is sub-optimal in terms of timing, and
>>>> could end up with substantial network downtime. One of ways to shorten
>>>> this downtime is to switch the datapath only after the VF is seen to get
>>>> enabled by guest, indicated by the bus master bit in VF's PCI config
>>>> space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
>>>> at that time to indicate this condition. Then management stack can kick
>>>> off datapath switching upon receiving the event.
>>>>
>>>> Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
>>>> Signed-off-by: Venu Busireddy <venu.busireddy@oracle.com>
>>>> ---
>>>>   hw/vfio/pci.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>   qapi/net.json | 26 ++++++++++++++++++++++++++
>>>>   2 files changed, 83 insertions(+)
>>> Why is this done at the vfio driver layer rather than the PCI core
>>> layer?  We write everything through using pci_default_write_config(), I
>>> don't see that anything here is particularly vfio specific.  Please copy
>>> me on any changes in hw/vfio.  Thanks,
>>>
>>> Alex
>> Hmm so you are saying let's send events for each device?
>> I don't have a problem with this but in this case
>> I think I would like to see a per-device option "send events".
>> We don't want a ton of events in the simple default config.
> In the below we're only sending events for PCIDevice.failover_primary,
> seems like that would filter out the rest of the non-NIC PCI devices as
> well as it does for non-NIC VFIO PCI devices.  The only thing remotely
> vfio specific below is that it might notify based on the vfio device
> name, but it's a fallback to PCIDevice.qdev.id.
Not exactly. It will first try to use the qdev ID to notify. If qdev id 
is missing (vfio-pci device could live without it),  then sysfsdev name 
will be used instead (in the form of host device 
"<bus>:<device>.<function>" location rather than ID). The intent was 
indeed to make this notification applicable to every possible vfio-pci 
device, even those without a qdev ID.

>   A real ID could just
> be a requirement to make use of this.
I'm fine to make qdev-id required for failover_primary PCI device. But 
please be noted, this is a shrinkage rather than generalization that has 
to apply to all other non-VFIO PCI devices that don't have to specify a 
qdev ID today.  I'm not sure if it's a good idea to make it restricted 
this early.

Thanks,
-Siwei

>   Thanks,
>
> Alex
>
>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>> index bd83b58..adcc95a 100644
>>>> --- a/hw/vfio/pci.c
>>>> +++ b/hw/vfio/pci.c
>>>> @@ -34,6 +34,7 @@
>>>>   #include "pci.h"
>>>>   #include "trace.h"
>>>>   #include "qapi/error.h"
>>>> +#include "qapi/qapi-events-net.h"
>>>>   
>>>>   #define MSIX_CAP_LENGTH 12
>>>>   
>>>> @@ -42,6 +43,7 @@
>>>>   
>>>>   static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>>>>   static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
>>>> +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
>>>>   
>>>>   /*
>>>>    * Disabling BAR mmaping can be slow, but toggling it around INTx can
>>>> @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
>>>>   {
>>>>       VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>>>>       uint32_t val_le = cpu_to_le32(val);
>>>> +    bool may_notify = false;
>>>> +    bool master_was = false;
>>>>   
>>>>       trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
>>>>   
>>>> @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
>>>>                        __func__, vdev->vbasedev.name, addr, val, len);
>>>>       }
>>>>   
>>>> +    /* Bus Master Enabling/Disabling */
>>>> +    if (pdev->failover_primary && current_cpu &&
>>>> +        range_covers_byte(addr, len, PCI_COMMAND)) {
>>>> +        master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
>>>> +                        PCI_COMMAND_MASTER);
>>>> +        may_notify = true;
>>>> +    }
>>>> +
>>>>       /* MSI/MSI-X Enabling/Disabling */
>>>>       if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
>>>>           ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
>>>> @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
>>>>           /* Write everything to QEMU to keep emulated bits correct */
>>>>           pci_default_write_config(pdev, addr, val, len);
>>>>       }
>>>> +
>>>> +    if (may_notify) {
>>>> +        bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
>>>> +                             PCI_COMMAND_MASTER);
>>>> +        if (master_was != master_now) {
>>>> +            vfio_failover_notify(vdev, master_now);
>>>> +        }
>>>> +    }
>>>>   }
>>>>   
>>>>   /*
>>>> @@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
>>>>       vdev->req_enabled = false;
>>>>   }
>>>>   
>>>> +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
>>>> +{
>>>> +    PCIDevice *pdev = &vdev->pdev;
>>>> +    const char *n;
>>>> +    gchar *path;
>>>> +
>>>> +    n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
>>>> +    path = object_get_canonical_path(OBJECT(vdev));
>>>> +    qapi_event_send_failover_primary_changed(!!n, n, path, status);
>>>> +}
>>>> +
>>>>   static void vfio_realize(PCIDevice *pdev, Error **errp)
>>>>   {
>>>>       VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>>>> @@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
>>>>       vfio_put_group(group);
>>>>   }
>>>>   
>>>> +static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
>>>> +{
>>>> +    PCIDevice *pdev = &vdev->pdev;
>>>> +
>>>> +    /*
>>>> +     * Guest driver may not get the chance to disable bus mastering
>>>> +     * before the device object gets to be unrealized. In that event,
>>>> +     * send out a "disabled" notification on behalf of guest driver.
>>>> +     */
>>>> +    if (pdev->failover_primary &&
>>>> +        pdev->bus_master_enable_region.enabled) {
>>>> +        vfio_failover_notify(vdev, false);
>>>> +    }
>>>> +}
>>>> +
>>>>   static void vfio_exitfn(PCIDevice *pdev)
>>>>   {
>>>>       VFIOPCIDevice *vdev = PCI_VFIO(pdev);
>>>>   
>>>> +    /*
>>>> +     * During the guest reboot sequence, it is sometimes possible that
>>>> +     * the guest may not get sufficient time to complete the entire driver
>>>> +     * removal sequence, near the end of which a PCI config space write to
>>>> +     * disable bus mastering can be intercepted by device. In such cases,
>>>> +     * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
>>>> +     * is imperative to generate the event on the guest's behalf if the
>>>> +     * guest fails to make it.
>>>> +     */
>>>> +    vfio_exit_failover_notify(vdev);
>>>> +
>>>>       vfio_unregister_req_notifier(vdev);
>>>>       vfio_unregister_err_notifier(vdev);
>>>>       pci_device_set_intx_routing_notifier(&vdev->pdev, NULL);
>>>> diff --git a/qapi/net.json b/qapi/net.json
>>>> index 633ac87..a5b8d70 100644
>>>> --- a/qapi/net.json
>>>> +++ b/qapi/net.json
>>>> @@ -757,3 +757,29 @@
>>>>   ##
>>>>   { 'command': 'query-standby-status', 'data': { '*device': 'str' },
>>>>     'returns': ['StandbyStatusInfo'] }
>>>> +
>>>> +##
>>>> +# @FAILOVER_PRIMARY_CHANGED:
>>>> +#
>>>> +# Emitted whenever the driver of failover primary is loaded or unloaded
>>>> +# by the guest.
>>>> +#
>>>> +# @device: device name
>>>> +#
>>>> +# @path: device path
>>>> +#
>>>> +# @enabled: true if driver is loaded thus device is enabled in guest
>>>> +#
>>>> +# Since: 3.0
>>>> +#
>>>> +# Example:
>>>> +#
>>>> +# <- { "event": "FAILOVER_PRIMARY_CHANGED",
>>>> +#      "data": { "device": "vfio-0",
>>>> +#                "path": "/machine/peripheral/vfio-0" },
>>>> +#                "enabled": "true" },
>>>> +#      "timestamp": { "seconds": 1539935213, "microseconds": 753529 } }
>>>> +#
>>>> +##
>>>> +{ 'event': 'FAILOVER_PRIMARY_CHANGED',
>>>> +  'data': { '*device': 'str', 'path': 'str', 'enabled': 'bool' } }
>>>>    
>

  parent reply	other threads:[~2019-01-08  1:14 UTC|newest]

Thread overview: 32+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-07 22:29 [Qemu-devel] [PATCH v3 0/5] Support for datapath switching during live migration Venu Busireddy
2019-01-07 22:29 ` [Qemu-devel] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit Venu Busireddy
2019-01-08 16:56   ` Dongli Zhang
2019-01-08 17:25     ` Venu Busireddy
2019-01-09  0:14       ` Dongli Zhang
2019-01-09  0:18         ` Samudrala, Sridhar
2019-01-09  0:39           ` Dongli Zhang
2019-01-09  4:17             ` Michael S. Tsirkin
2019-01-07 22:29 ` [Qemu-devel] [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration Venu Busireddy
2019-01-09 13:39   ` [Qemu-devel] [virtio-dev] " Cornelia Huck
2019-01-09 15:56   ` [Qemu-devel] " Michael S. Tsirkin
2019-01-11  2:09     ` si-wei liu
2019-01-11  3:20       ` Michael S. Tsirkin
2019-01-11  7:09         ` [Qemu-devel] [virtio-dev] " si-wei liu
2019-01-14 11:10           ` Roman Kagan
2019-01-07 22:29 ` [Qemu-devel] [PATCH v3 3/5] virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event Venu Busireddy
2019-01-08  0:10   ` Michael S. Tsirkin
2019-01-07 22:29 ` [Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover Venu Busireddy
2019-01-07 23:17   ` Alex Williamson
2019-01-07 23:22     ` Michael S. Tsirkin
2019-01-07 23:41       ` Alex Williamson
2019-01-08  0:12         ` Michael S. Tsirkin
2019-01-08  0:24           ` Alex Williamson
2019-01-08  0:43             ` Michael S. Tsirkin
2019-01-08  1:13         ` si-wei liu [this message]
2019-01-07 22:29 ` [Qemu-devel] [PATCH v3 5/5] pci: query command extension to check the bus master enabling status of the failover-primary device Venu Busireddy
2019-01-07 23:32 ` [Qemu-devel] [PATCH v3 0/5] Support for datapath switching during live migration Michael S. Tsirkin
2019-01-08  1:45   ` [Qemu-devel] [virtio-dev] " si-wei liu
2019-01-08  2:25     ` Michael S. Tsirkin
2019-01-09  4:55       ` si-wei liu
2019-01-09 13:39         ` Michael S. Tsirkin
2019-01-11  6:57           ` si-wei liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6155b136-fa44-445b-4a08-b9bdbc5cd372@oracle.com \
    --to=si-wei.liu@oracle.com \
    --cc=alex.williamson@redhat.com \
    --cc=marcel@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=venu.busireddy@oracle.com \
    --cc=virtio-dev@lists.oasis-open.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).