From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Chen Fan <chen.fan.fnst@cn.fujitsu.com>,
Cao jin <caoj.fnst@cn.fujitsu.com>,
izumi.taku@jp.fujitsu.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume
Date: Wed, 25 May 2016 11:45:11 +0300 [thread overview]
Message-ID: <20160525113623-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <20160524205406.6dabaf71@ul30vt.home>
On Tue, May 24, 2016 at 08:54:06PM -0600, Alex Williamson wrote:
> On Tue, 24 May 2016 13:49:12 +0300
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Tue, Apr 26, 2016 at 08:48:15AM -0600, Alex Williamson wrote:
> > > I think that means that if we want to switch from a
> > > simple halt-on-error to a mechanism for the guest to handle recovery,
> > > we need to disable access to the device between being notified that the
> > > error occurred and being notified to resume.
> >
> > But this isn't what happens on bare metal.
> > Errors are reported asynchronously and host might access the device
> > meanwhile. These accesses might or might not trigger more errors, but
> > fundamentally this should not matter too much as device is going to be
> > reset.
>
> Bare metal also doesn't have a hypervisor underneath performing a PCI
> bus reset,
This is where I get lost. I assumed we do reset when guest
requests it. Isn't that the case? Why not?
> there's only one OS trying to control the device at a time,
> so we have some clear differences from bare metal that I don't know we
> can avoid. The thought here was that we need to notify the guest at the
> earliest point we can, but let the host recovery run to completion
> before allowing the user to interact with the device. Perhaps there is
> no need to block region access to the device (ie. config space & BAR
> resources), but I think we do need to somehow synchronize the bus resets
> or else we get situations like that observed previously where the bus is
> still in reset while userspace trys to proceed with using it.
>
Why do we have to trigger reset upon an error?
Why not wait for guest to request reset?
> The next question then would be whether that's QEMU's job or something
> that should be done in the host kernel. It's been proposed to add yet
> another eventfd for the kernel vfio-pci to signal QEMU when a resume
> notification has occured, but perhaps the better approach would be for
> the hot reset ioctl (and base reset ioctl) to handle this situation more
> transparently. We could immediately return -EAGAIN and allow QEMU to
> delay itself for any reset ioctl received after the AER error detected
> event, but before the resume event. We could also allow some sort of
> timeout, that the ioctl might enter an interruptible sleep, woken on
> the resume notification or timeout. That sounds a bit better to me as
> the specification of what's allowed between the error detected
> notification and the resume notification is otherwise pretty poorly
> defined.
So if guest started reset, it might take a while for
device to come out of that state, and access during this
time might trigger errors. But that's already possible
for guest to trigger, right? How is this different?
> Do you think we can run completely asynchronous, letting the
> host and guest bus resets race? Thanks,
>
> Alex
I have a feeling we need to put some code out,
disabled by default, and see how it behaves in the field.
For example ability to trigger UR errors seems benign but
I think we are trying to prevent them now because of
something we saw in the field.
--
MST
next prev parent reply other threads:[~2016-05-25 8:45 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-05 11:41 [Qemu-devel] [patch v6 00/12] vfio-pci: pass the aer error to guest, part2 Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 01/12] vfio: extract vfio_get_hot_reset_info as a single function Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 02/12] vfio: squeeze out vfio_pci_do_hot_reset for support bus reset Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 03/12] vfio: add pcie extended capability support Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 04/12] vfio: add aer support for vfio device Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 05/12] vfio: refine function vfio_pci_host_match Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 06/12] vfio: add check host bus reset is support or not Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 07/12] pci: add a pci_function_is_valid callback to check function if valid Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 08/12] vfio: add check aer functionality for hotplug device Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 09/12] vfio: vote the function 0 to do host bus reset when aer occurred Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 10/12] vfio-pci: pass the aer error to guest Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume Cao jin
2016-04-11 21:38 ` Alex Williamson
2016-04-14 1:02 ` Chen Fan
2016-04-26 3:39 ` Chen Fan
2016-04-26 14:48 ` Alex Williamson
2016-05-06 1:38 ` Chen Fan
2016-05-06 16:39 ` Alex Williamson
2016-05-11 3:11 ` Zhou Jie
2016-05-11 20:20 ` Alex Williamson
2016-05-24 10:49 ` Michael S. Tsirkin
2016-05-25 1:08 ` Zhou Jie
2016-05-25 2:54 ` Alex Williamson
2016-05-25 8:45 ` Michael S. Tsirkin [this message]
2016-05-25 14:22 ` Alex Williamson
2016-04-05 11:42 ` [Qemu-devel] [patch v6 12/12] vfio: add 'aer' property to expose aercap Cao jin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160525113623-mutt-send-email-mst@redhat.com \
--to=mst@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=caoj.fnst@cn.fujitsu.com \
--cc=chen.fan.fnst@cn.fujitsu.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).