From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43658) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YaaJE-0004vX-1O for qemu-devel@nongnu.org; Tue, 24 Mar 2015 21:40:41 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YaaJA-0004k5-VZ for qemu-devel@nongnu.org; Tue, 24 Mar 2015 21:40:39 -0400 Received: from [59.151.112.132] (port=53464 helo=heian.cn.fujitsu.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YaaJA-0004iy-8j for qemu-devel@nongnu.org; Tue, 24 Mar 2015 21:40:36 -0400 Message-ID: <5512106A.9090101@cn.fujitsu.com> Date: Wed, 25 Mar 2015 09:33:30 +0800 From: Chen Fan MIME-Version: 1.0 References: <3c81eaae84d6b1fa6e229e765a534fdf180e1ce4.1426155432.git.chen.fan.fnst@cn.fujitsu.com> <1426286084.3643.144.camel@redhat.com> <55064870.6040209@cn.fujitsu.com> <1426477927.3643.160.camel@redhat.com> <550687B1.7020504@cn.fujitsu.com> <1426514950.3643.169.camel@redhat.com> In-Reply-To: <1426514950.3643.169.camel@redhat.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v5 5/7] vfio-pci: pass the aer error to guest List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: izumi.taku@jp.fujitsu.com, qemu-devel@nongnu.org On 03/16/2015 10:09 PM, Alex Williamson wrote: > On Mon, 2015-03-16 at 15:35 +0800, Chen Fan wrote: >> On 03/16/2015 11:52 AM, Alex Williamson wrote: >>> On Mon, 2015-03-16 at 11:05 +0800, Chen Fan wrote: >>>> On 03/14/2015 06:34 AM, Alex Williamson wrote: >>>>> On Thu, 2015-03-12 at 18:23 +0800, Chen Fan wrote: >>>>>> when the vfio device encounters an uncorrectable error in host, >>>>>> the vfio_pci driver will signal the eventfd registered by this >>>>>> vfio device, the results in the qemu eventfd handler getting >>>>>> invoked. >>>>>> >>>>>> this patch is to pass the error to guest and have the guest driver >>>>>> recover from the error. >>>>> What is going to be the typical recovery mechanism for the guest? I'm >>>>> concerned that the topology of the device in the guest doesn't >>>>> necessarily match the topology of the device in the host, so if the >>>>> guest were to attempt a bus reset to recover a device, for instance, >>>>> what happens? >>>> the recovery mechanism is that when guest got an aer error from a device, >>>> guest will clean the corresponding status bit in device register. and for >>>> need reset device, the guest aer driver would reset all devices under bus. >>> Sorry, I'm still confused, how does the guest aer driver reset all >>> devices under a bus? Are we talking about function-level, device >>> specific reset mechanisms or secondary bus resets? If the guest is >>> performing secondary bus resets, what guarantee do they have that it >>> will translate to a physical secondary bus reset? vfio may only do an >>> FLR when the bus is reset or it may not be able to do anything depending >>> on the available function-level resets and physical and virtual topology >>> of the device. Thanks, >> in general, functions depends on the corresponding device driver behaviors >> to do the recovery. e.g: implemented the error_detect, slot_reset callbacks. >> and for link reset, it usually do secondary bus reset. >> >> and do we must require to the physical secondary bus reset for vfio device >> as bus reset? > That depends on how the guest driver attempts recovery, doesn't it? > There are only a very limited number of cases where a secondary bus > reset initiated by the guest will translate to a secondary bus reset of > the physical device (iirc, single function device without FLR). In most > cases, it will at best be translated to an FLR. VFIO really only does > bus resets on VM reset because that's the only time we know that it's ok > to reset multiple devices. If the guest driver is depending on a > secondary bus reset to put the device into a recoverable state and we're > not able to provide that, then we're actually reducing containment of > the error by exposing AER to the guest and allowing it to attempt > recovery. So in practice, I'm afraid we're risking the integrity of the > VM by exposing AER to the guest and making it think that it can perform > recovery operations that are not effective. Thanks, Hi Alex, if guest driver need reset a vfio device by secondary bus reset when an aer occured. how about keeping the behavior by stopping VM and output an fatal error information to user. Thanks, Chen > > Alex > > . >