From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:58820)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <chen.fan.fnst@cn.fujitsu.com>) id 1aCH6t-0006JR-TR
	for qemu-devel@nongnu.org; Thu, 24 Dec 2015 20:24:01 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <chen.fan.fnst@cn.fujitsu.com>) id 1aCH6p-0000RU-8P
	for qemu-devel@nongnu.org; Thu, 24 Dec 2015 20:23:59 -0500
Received: from [59.151.112.132] (port=22035 helo=heian.cn.fujitsu.com)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <chen.fan.fnst@cn.fujitsu.com>) id 1aCH6o-0000QO-At
	for qemu-devel@nongnu.org; Thu, 24 Dec 2015 20:23:55 -0500
Message-ID: <567C995B.5070908@cn.fujitsu.com>
Date: Fri, 25 Dec 2015 09:18:19 +0800
From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
MIME-Version: 1.0
References: <cover.1447748073.git.chen.fan.fnst@cn.fujitsu.com>
	<35d75210aef922bd049d93e0c8dbe13a2b5b1dd3.1447748073.git.chen.fan.fnst@cn.fujitsu.com>
	<1450384318.2674.110.camel@redhat.com>
	<56737D85.3030305@cn.fujitsu.com>
	<1450732026.3781.89.camel@redhat.com>
	<567B7E41.9030008@cn.fujitsu.com>
	<20151224162247-mutt-send-email-mst@redhat.com>
In-Reply-To: <20151224162247-mutt-send-email-mst@redhat.com>
Content-Type: text/plain; charset="windows-1252"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v14 Resend 10/13] pci: add pci device
 pre-post reset callbacks for host bus reset
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>, Cao jin <caoj.fnst@cn.fujitsu.com>, qemu-devel@nongnu.org


On 12/24/2015 10:34 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 24, 2015 at 01:10:25PM +0800, Chen Fan wrote:
>> On 12/22/2015 05:07 AM, Alex Williamson wrote:
>>> On Fri, 2015-12-18 at 11:29 +0800, Chen Fan wrote:
>>>> On 12/18/2015 04:31 AM, Alex Williamson wrote:
>>>>> On Thu, 2015-12-17 at 09:41 +0800, Cao jin wrote:
>>>>>> From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>>>>>>
>>>>>> Particularly, For vfio devices, Once need to recovery devices
>>>>>> by bus reset such as AER, we always need to reset the host bus
>>>>>> to recovery the devices under the bus, so we need to add pci
>>>>>> device
>>>>>> callbacks to specify to do host bus reset.
>>>>>>
>>>>>> Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
>>>>>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
>>>>>> ---
>>>>>>    hw/pci/pci.c         | 18 ++++++++++++++++++
>>>>>>    hw/pci/pci_bridge.c  |  9 +++++++++
>>>>>>    hw/vfio/pci.c        | 26 ++++++++++++++++++++++++++
>>>>>>    hw/vfio/pci.h        |  2 ++
>>>>>>    include/hw/pci/pci.h |  7 +++++++
>>>>>>    5 files changed, 62 insertions(+)
>>>>>>
>>>>>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>>>>>> index f6ca6ef..64fa2cc 100644
>>>>>> --- a/hw/pci/pci.c
>>>>>> +++ b/hw/pci/pci.c
>>>>>> @@ -247,6 +247,24 @@ static void pci_do_device_reset(PCIDevice
>>>>>> *dev)
>>>>>>        msix_reset(dev);
>>>>>>    }
>>>>>> +void pci_device_pre_reset(PCIBus *bus, PCIDevice *dev, void
>>>>>> *unused)
>>>>>> +{
>>>>>> +    PCIDeviceClass *dc = PCI_DEVICE_GET_CLASS(dev);
>>>>>> +
>>>>>> +    if (dc->pre_reset) {
>>>>>> +        dc->pre_reset(dev);
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +void pci_device_post_reset(PCIBus *bus, PCIDevice *dev, void
>>>>>> *unused)
>>>>>> +{
>>>>>> +    PCIDeviceClass *dc = PCI_DEVICE_GET_CLASS(dev);
>>>>>> +
>>>>>> +    if (dc->post_reset) {
>>>>>> +        dc->post_reset(dev);
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>>    /*
>>>>>>     * This function is called on #RST and FLR.
>>>>>>     * FLR if PCI_EXP_DEVCTL_BCR_FLR is set
>>>>>> diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
>>>>>> index 40c97b1..ddb76ab 100644
>>>>>> --- a/hw/pci/pci_bridge.c
>>>>>> +++ b/hw/pci/pci_bridge.c
>>>>>> @@ -267,8 +267,17 @@ void pci_bridge_write_config(PCIDevice *d,
>>>>>>        newctl = pci_get_word(d->config + PCI_BRIDGE_CONTROL);
>>>>>>        if (~oldctl & newctl & PCI_BRIDGE_CTL_BUS_RESET) {
>>>>>> +        /*
>>>>>> +         * Notify all vfio-pci devices under the bus
>>>>>> +         * should do physical bus reset.
>>>>>> +         */
>>>>>> +        PCIBus *sec_bus = pci_bridge_get_sec_bus(s);
>>>>>> +        pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
>>>>>> +                            pci_device_pre_reset, NULL);
>>>>>>            /* Trigger hot reset on 0->1 transition. */
>>>>>>            qbus_reset_all(&s->sec_bus.qbus);
>>>>>> +        pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
>>>>>> +                            pci_device_post_reset, NULL);
>>>>>>        }
>>>>>>    }
>>>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>>>> index e17dc89..df32618 100644
>>>>>> --- a/hw/vfio/pci.c
>>>>>> +++ b/hw/vfio/pci.c
>>>>>> @@ -39,6 +39,7 @@
>>>>>>    static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
>>>>>>    static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool
>>>>>> enabled);
>>>>>> +static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single);
>>>>>>    /*
>>>>>>     * Disabling BAR mmaping can be slow, but toggling it around
>>>>>> INTx
>>>>>> can
>>>>>> @@ -1888,6 +1889,8 @@ static int
>>>>>> vfio_check_host_bus_reset(VFIOPCIDevice *vdev)
>>>>>>        /* List all affected devices by bus reset */
>>>>>>        devices = &info->devices[0];
>>>>>> +    vdev->single_depend_dev = (info->count == 1);
>>>>>> +
>>>>>>        /* Verify that we have all the groups required */
>>>>>>        for (i = 0; i < info->count; i++) {
>>>>>>            PCIHostDeviceAddress host;
>>>>>> @@ -2029,10 +2032,26 @@ static int
>>>>>> vfio_check_bus_reset(NotifierWithReturn *n, void *opaque)
>>>>>>        return vfio_check_host_bus_reset(vdev);
>>>>>>    }
>>>>>> +static void vfio_aer_pre_reset(PCIDevice *pdev)
>>>>>> +{
>>>>>> +    VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>>>>>> +
>>>>>> +    vdev->aer_reset = true;
>>>>>> +    vfio_pci_hot_reset(vdev, vdev->single_depend_dev);
>>>>>> +}
>>>>> Doesn't this lead to multiple host bus resets per guest bus reset
>>>>> in
>>>>> many cases?  It looks like we'll do it once per vfio-pci device,
>>>>> even
>>>>> if those devices are on the same host bus.  That's a 1 second
>>>>> operation
>>>>> per device.  Can we avoid that?  Maybe some sort of sequence ID
>>>>> could
>>>>> help a device figure out whether it's already been reset as part of
>>>>> a
>>>>> dependent device for this particular guest bus reset.  Thanks,
>>>> That's right, I missed this case, but I don't understand the scenario
>>>> how to
>>>> use a sequence ID to mark the device if been reset. can you detail it
>>>> ?
>>> I don't really have a concrete idea for a sequence ID, it was just a
>>> thought that maybe if each bus reset had a sequence ID then devices
>>> could know whether they've already been reset for that sequence ID.
>>>   The basic problem we have is that reset callbacks are per device and
>>> it's difficult to infer which individual resets are part of that bus
>>> reset.  In fact, do we propagate resets correctly down secondary
>>> bridges?  We're triggering off a VM write of the bridge control bus
>>> reset bit triggering from 0->1 and we then call qbus_reset_all() on
>>> that qbus, which I think is just going to call pci_bridge_reset() for
>>> any other bridges, which doesn't do anything about resetting deeper
>>> subordinate buses.  I think that means that if we had a root port with
>>> a switch below it and endpoints below that, if the VM triggered a
>>> secondary bus reset at the root port, the endpoints would never see it,
>>> which is not how real hardware works.
>> Hi Alex,
>>
>>      After think over the subordinate case, I think the qbus_reset_all()
>> should also reset the deeper endpoints. which recursive the qbus and qdev
>> to reset all devices blow sec bus and bridge device independently.
>>
>> MST's suggestion in another mail to add a hot reset callback for sec bus
>> reset
>> should be good to avoid the couple of pre_reset/post_reset callbacks.
>> thanks:)
>>
>> but for the multiple resets triggering on the same bus, I think the virtual
>> bus-level
>> callback do not guarantee the host bus reset only once for all dependent
>> devices.
>> because the host reset dependent devices may not on the same virtual bus.
>> we allowed that the dependent devices address below the bus. so the devices
>> still don't know the host bus is in reset. here I think we should have a
>> host bus
>> model bind to all the dependent devices. when devices do hot reset, we mark
>> the bus in reset. then other dependent devices know the bus is reset. how
>> about
>> this?
>>
>> Thanks,
>> Chen
>
> What does "below bus" mean here?
> Below what?
I just said that assume the interacted devices on the same host bus. we 
allowed
one device with aer was assigned on a virtual bus and the others were 
assigned
on the secondary bus.

Thanks,
Chen

>
>
>>>> additional, there was a mechanism to compute device whether need to
>>>> be reset
>>>> by hot reset. so I simply modify the code as the following:
>>>>
>>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>>> index a9bc67e..42774ca 100644
>>>> --- a/hw/vfio/pci.c
>>>> +++ b/hw/vfio/pci.c
>>>> @@ -2063,13 +2063,19 @@ static void vfio_aer_pre_reset(PCIDevice
>>>> *pdev)
>>>>        VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>>>>
>>>>        vdev->aer_reset = true;
>>>> -    vfio_pci_hot_reset(vdev, vdev->single_depend_dev);
>>>> +    vdev->vbasedev.needs_reset = true;
>>>>    }
>>>>
>>>>    static void vfio_aer_post_reset(PCIDevice *pdev)
>>>>    {
>>>>        VFIOPCIDevice *vdev = DO_UPCAST(VFIOPCIDevice, pdev, pdev);
>>>>
>>>> +    if (!vdev->single_depend_dev && vdev->vbasedev.needs_reset) {
>>>> +        vfio_pci_hot_reset(vdev, false);
>>>> +    } else {
>>>> +        vfio_pci_hot_reset(vdev, true);
>>>> +    }
>>>> +
>>>>        vdev->aer_reset = false;
>>>>    }
>>>>
>>>> what do you think of this ?
>>> I think it might be a bigger problem than that subtle change.  I wonder
>>> if we really need a better model of the reset line through the
>>> subordinate buses.  When reset is asserted, we'd set a bus_in_reset
>>> flag on the bus and trigger downstream bridges to do the same.  Then
>>> when the user de-asserts reset, we'd call qbus_reset_all() and
>>> propagate it through to downstream buses.  That way the per device
>>> reset callback could check to see if the bus is in reset and aer
>>> devices can then know to do a bus reset.  Finally, the bus_in_reset
>>> flag would be cleared on all the affected buses.  I'm sure there are
>>> numerous details missing there, but it seems like it might be a
>>> reasonable approach.  Thanks,
>>>
>>> Alex
>>>
>>>
>>> .
>>>
>>
>
> .
>