From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:43112)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1b5URK-00065p-5T
	for qemu-devel@nongnu.org; Wed, 25 May 2016 04:45:23 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1b5URG-0002yr-VY
	for qemu-devel@nongnu.org; Wed, 25 May 2016 04:45:18 -0400
Received: from mx1.redhat.com ([209.132.183.28]:37064)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mst@redhat.com>) id 1b5URG-0002yk-Mj
	for qemu-devel@nongnu.org; Wed, 25 May 2016 04:45:14 -0400
Date: Wed, 25 May 2016 11:45:11 +0300
From: "Michael S. Tsirkin" <mst@redhat.com>
Message-ID: <20160525113623-mutt-send-email-mst@redhat.com>
References: <1459856523-17085-1-git-send-email-caoj.fnst@cn.fujitsu.com>
	<1459856523-17085-12-git-send-email-caoj.fnst@cn.fujitsu.com>
	<20160411153827.3884ded1@t450s.home>
	<570EEC42.3040300@cn.fujitsu.com> <571EE2D6.4000100@cn.fujitsu.com>
	<20160426084815.24ec5200@t450s.home>
	<20160524134742-mutt-send-email-mst@redhat.com>
	<20160524205406.6dabaf71@ul30vt.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20160524205406.6dabaf71@ul30vt.home>
Subject: Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume
 notification handler for aer resume
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Chen Fan <chen.fan.fnst@cn.fujitsu.com>, Cao jin <caoj.fnst@cn.fujitsu.com>, izumi.taku@jp.fujitsu.com, qemu-devel@nongnu.org

On Tue, May 24, 2016 at 08:54:06PM -0600, Alex Williamson wrote:
> On Tue, 24 May 2016 13:49:12 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Tue, Apr 26, 2016 at 08:48:15AM -0600, Alex Williamson wrote:
> > > I think that means that if we want to switch from a
> > > simple halt-on-error to a mechanism for the guest to handle recovery,
> > > we need to disable access to the device between being notified that the
> > > error occurred and being notified to resume.  
> > 
> > But this isn't what happens on bare metal.
> > Errors are reported asynchronously and host might access the device
> > meanwhile.  These accesses might or might not trigger more errors, but
> > fundamentally this should not matter too much as device is going to be
> > reset.
> 
> Bare metal also doesn't have a hypervisor underneath performing a PCI
> bus reset,

This is where I get lost. I assumed we do reset when guest
requests it. Isn't that the case? Why not?

> there's only one OS trying to control the device at a time,
> so we have some clear differences from bare metal that I don't know we
> can avoid.  The thought here was that we need to notify the guest at the
> earliest point we can, but let the host recovery run to completion
> before allowing the user to interact with the device.  Perhaps there is
> no need to block region access to the device (ie. config space & BAR
> resources), but I think we do need to somehow synchronize the bus resets
> or else we get situations like that observed previously where the bus is
> still in reset while userspace trys to proceed with using it.
>

Why do we have to trigger reset upon an error?
Why not wait for guest to request reset?

> The next question then would be whether that's QEMU's job or something
> that should be done in the host kernel.  It's been proposed to add yet
> another eventfd for the kernel vfio-pci to signal QEMU when a resume
> notification has occured, but perhaps the better approach would be for
> the hot reset ioctl (and base reset ioctl) to handle this situation more
> transparently.  We could immediately return -EAGAIN and allow QEMU to
> delay itself for any reset ioctl received after the AER error detected
> event, but before the resume event.  We could also allow some sort of
> timeout, that the ioctl might enter an interruptible sleep, woken on
> the resume notification or timeout.  That sounds a bit better to me as
> the specification of what's allowed between the error detected
> notification and the resume notification is otherwise pretty poorly
> defined.

So if guest started reset, it might take a while for
device to come out of that state, and access during this
time might trigger errors. But that's already possible
for guest to trigger, right?  How is this different?


>  Do you think we can run completely asynchronous, letting the
> host and guest bus resets race?  Thanks,
> 
> Alex

I have a feeling we need to put some code out,
disabled by default, and see how it behaves in the field.
For example ability to trigger UR errors seems benign but
I think we are trying to prevent them now because of
something we saw in the field.

-- 
MST