Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Alex Williamson <alex.williamson@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Chen Fan <chen.fan.fnst@cn.fujitsu.com>,
	Cao jin <caoj.fnst@cn.fujitsu.com>,
	izumi.taku@jp.fujitsu.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume
Date: Tue, 24 May 2016 20:54:06 -0600	[thread overview]
Message-ID: <20160524205406.6dabaf71@ul30vt.home> (raw)
In-Reply-To: <20160524134742-mutt-send-email-mst@redhat.com>

On Tue, 24 May 2016 13:49:12 +0300
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Apr 26, 2016 at 08:48:15AM -0600, Alex Williamson wrote:
> > I think that means that if we want to switch from a
> > simple halt-on-error to a mechanism for the guest to handle recovery,
> > we need to disable access to the device between being notified that the
> > error occurred and being notified to resume.  
> 
> But this isn't what happens on bare metal.
> Errors are reported asynchronously and host might access the device
> meanwhile.  These accesses might or might not trigger more errors, but
> fundamentally this should not matter too much as device is going to be
> reset.

Bare metal also doesn't have a hypervisor underneath performing a PCI
bus reset, there's only one OS trying to control the device at a time,
so we have some clear differences from bare metal that I don't know we
can avoid.  The thought here was that we need to notify the guest at the
earliest point we can, but let the host recovery run to completion
before allowing the user to interact with the device.  Perhaps there is
no need to block region access to the device (ie. config space & BAR
resources), but I think we do need to somehow synchronize the bus resets
or else we get situations like that observed previously where the bus is
still in reset while userspace trys to proceed with using it.

The next question then would be whether that's QEMU's job or something
that should be done in the host kernel.  It's been proposed to add yet
another eventfd for the kernel vfio-pci to signal QEMU when a resume
notification has occured, but perhaps the better approach would be for
the hot reset ioctl (and base reset ioctl) to handle this situation more
transparently.  We could immediately return -EAGAIN and allow QEMU to
delay itself for any reset ioctl received after the AER error detected
event, but before the resume event.  We could also allow some sort of
timeout, that the ioctl might enter an interruptible sleep, woken on
the resume notification or timeout.  That sounds a bit better to me as
the specification of what's allowed between the error detected
notification and the resume notification is otherwise pretty poorly
defined.  Do you think we can run completely asynchronous, letting the
host and guest bus resets race?  Thanks,

Alex

next prev parent reply	other threads:[~2016-05-25  2:54 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-05 11:41 [Qemu-devel] [patch v6 00/12] vfio-pci: pass the aer error to guest, part2 Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 01/12] vfio: extract vfio_get_hot_reset_info as a single function Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 02/12] vfio: squeeze out vfio_pci_do_hot_reset for support bus reset Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 03/12] vfio: add pcie extended capability support Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 04/12] vfio: add aer support for vfio device Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 05/12] vfio: refine function vfio_pci_host_match Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 06/12] vfio: add check host bus reset is support or not Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 07/12] pci: add a pci_function_is_valid callback to check function if valid Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 08/12] vfio: add check aer functionality for hotplug device Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 09/12] vfio: vote the function 0 to do host bus reset when aer occurred Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 10/12] vfio-pci: pass the aer error to guest Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume Cao jin
2016-04-11 21:38   ` Alex Williamson
2016-04-14  1:02     ` Chen Fan
2016-04-26  3:39       ` Chen Fan
2016-04-26 14:48         ` Alex Williamson
2016-05-06  1:38           ` Chen Fan
2016-05-06 16:39             ` Alex Williamson
2016-05-11  3:11               ` Zhou Jie
2016-05-11 20:20                 ` Alex Williamson
2016-05-24 10:49           ` Michael S. Tsirkin
2016-05-25  1:08             ` Zhou Jie
2016-05-25  2:54             ` Alex Williamson [this message]
2016-05-25  8:45               ` Michael S. Tsirkin
2016-05-25 14:22                 ` Alex Williamson
2016-04-05 11:42 ` [Qemu-devel] [patch v6 12/12] vfio: add 'aer' property to expose aercap Cao jin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160524205406.6dabaf71@ul30vt.home \
    --to=alex.williamson@redhat.com \
    --cc=caoj.fnst@cn.fujitsu.com \
    --cc=chen.fan.fnst@cn.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).