From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:50443)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <chen.fan.fnst@cn.fujitsu.com>) id 1YabmD-0001In-Vu
	for qemu-devel@nongnu.org; Tue, 24 Mar 2015 23:14:43 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <chen.fan.fnst@cn.fujitsu.com>) id 1Yabm9-0007Hh-Tn
	for qemu-devel@nongnu.org; Tue, 24 Mar 2015 23:14:41 -0400
Received: from [59.151.112.132] (port=1863 helo=heian.cn.fujitsu.com)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <chen.fan.fnst@cn.fujitsu.com>) id 1Yabm9-0007Gy-7K
	for qemu-devel@nongnu.org; Tue, 24 Mar 2015 23:14:37 -0400
Message-ID: <55122678.3010808@cn.fujitsu.com>
Date: Wed, 25 Mar 2015 11:07:36 +0800
From: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
MIME-Version: 1.0
References: <cover.1426155431.git.chen.fan.fnst@cn.fujitsu.com>				
	<3c81eaae84d6b1fa6e229e765a534fdf180e1ce4.1426155432.git.chen.fan.fnst@cn.fujitsu.com>			
	<1426286084.3643.144.camel@redhat.com>
	<55064870.6040209@cn.fujitsu.com>		
	<1426477927.3643.160.camel@redhat.com>
	<550687B1.7020504@cn.fujitsu.com>	
	<1426514950.3643.169.camel@redhat.com>
	<55121525.2040408@cn.fujitsu.com>
	<1427251289.3643.829.camel@redhat.com>
In-Reply-To: <1427251289.3643.829.camel@redhat.com>
Content-Type: text/plain; charset="utf-8"; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v5 5/7] vfio-pci: pass the aer error to
	guest
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: izumi.taku@jp.fujitsu.com, qemu-devel@nongnu.org


On 03/25/2015 10:41 AM, Alex Williamson wrote:
> On Wed, 2015-03-25 at 09:53 +0800, Chen Fan wrote:
>> On 03/16/2015 10:09 PM, Alex Williamson wrote:
>>> On Mon, 2015-03-16 at 15:35 +0800, Chen Fan wrote:
>>>> On 03/16/2015 11:52 AM, Alex Williamson wrote:
>>>>> On Mon, 2015-03-16 at 11:05 +0800, Chen Fan wrote:
>>>>>> On 03/14/2015 06:34 AM, Alex Williamson wrote:
>>>>>>> On Thu, 2015-03-12 at 18:23 +0800, Chen Fan wrote:
>>>>>>>> when the vfio device encounters an uncorrectable error in host,
>>>>>>>> the vfio_pci driver will signal the eventfd registered by this
>>>>>>>> vfio device, the results in the qemu eventfd handler getting
>>>>>>>> invoked.
>>>>>>>>
>>>>>>>> this patch is to pass the error to guest and have the guest driver
>>>>>>>> recover from the error.
>>>>>>> What is going to be the typical recovery mechanism for the guest?  I'm
>>>>>>> concerned that the topology of the device in the guest doesn't
>>>>>>> necessarily match the topology of the device in the host, so if the
>>>>>>> guest were to attempt a bus reset to recover a device, for instance,
>>>>>>> what happens?
>>>>>> the recovery mechanism is that when guest got an aer error from a device,
>>>>>> guest will clean the corresponding status bit in device register. and for
>>>>>> need reset device, the guest aer driver would reset all devices under bus.
>>>>> Sorry, I'm still confused, how does the guest aer driver reset all
>>>>> devices under a bus?  Are we talking about function-level, device
>>>>> specific reset mechanisms or secondary bus resets?  If the guest is
>>>>> performing secondary bus resets, what guarantee do they have that it
>>>>> will translate to a physical secondary bus reset?  vfio may only do an
>>>>> FLR when the bus is reset or it may not be able to do anything depending
>>>>> on the available function-level resets and physical and virtual topology
>>>>> of the device.  Thanks,
>>>> in general, functions depends on the corresponding device driver behaviors
>>>> to do the recovery. e.g: implemented the error_detect, slot_reset callbacks.
>>>> and for link reset, it usually do secondary bus reset.
>>>>
>>>> and do we must require to the physical secondary bus reset for vfio device
>>>> as bus reset?
>>> That depends on how the guest driver attempts recovery, doesn't it?
>>> There are only a very limited number of cases where a secondary bus
>>> reset initiated by the guest will translate to a secondary bus reset of
>>> the physical device (iirc, single function device without FLR).  In most
>>> cases, it will at best be translated to an FLR.  VFIO really only does
>>> bus resets on VM reset because that's the only time we know that it's ok
>>> to reset multiple devices.  If the guest driver is depending on a
>>> secondary bus reset to put the device into a recoverable state and we're
>>> not able to provide that, then we're actually reducing containment of
>>> the error by exposing AER to the guest and allowing it to attempt
>>> recovery.  So in practice, I'm afraid we're risking the integrity of the
>>> VM by exposing AER to the guest and making it think that it can perform
>>> recovery operations that are not effective.  Thanks,
>> I also have seen that if device without FLR, it seems can do hot reset
>> by ioctl VFIO_DEVICE_PCI_HOT_RESET to reset the physical slot or bus
>> in vfio_pci_reset. does it satisfy the recovery issues that you said?
> The hot reset interface can only be used when a) the user (QEMU) owns
> all of the devices on the bus and b) we know we're resetting all of the
> devices.  That mostly limits its use to VM reset.  I think that on a
> secondary bus reset, we don't know the scope of the reset at the QEMU
> vfio driver, so we only make use of reset methods with a function-level
> scope.  That would only result in a secondary bus reset if that's the
> reset mechanism used by the host kernel's PCI code (pci_reset_function),
> which is limited to single function devices on a secondary bus, with no
> other reset mechanisms.  The host reset is also only available in some
> configurations, for instance if we have a dual-port NIC where each
> function is a separate IOMMU group, then we clearly cannot do a hot
> reset unless both functions are assigned to the same VM _and_ appear to
> the guest on the same virtual bus.  So even if we could know the scope
> of the reset in the QEMU vfio driver, we can only make use of it under
> very strict guest configurations.  Thanks,

it seems difficult to allow guest to participate in recovery.
but I think that we might be able to capture the vfio_pci_reset
result. if vfio device reset fail. then we stop the VM.

Thanks,
Chen

>
> Alex
>
> .
>