From: Alex Williamson <alex.williamson@redhat.com>
To: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Cc: Chen Fan <chen.fan.fnst@cn.fujitsu.com>,
izumi.taku@jp.fujitsu.com, Cao jin <caoj.fnst@cn.fujitsu.com>,
qemu-devel@nongnu.org, mst@redhat.com
Subject: Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume
Date: Wed, 11 May 2016 14:20:28 -0600 [thread overview]
Message-ID: <20160511142028.2f6f511d@t450s.home> (raw)
In-Reply-To: <f006881e-70f7-7745-040a-ba3bbdfa6663@cn.fujitsu.com>
On Wed, 11 May 2016 11:11:39 +0800
Zhou Jie <zhoujie2011@cn.fujitsu.com> wrote:
> Hi, Alex
> What do you think about the following solution?
> 1. Detect support for resume notification.
> If host vfio driver does not have resume notifier flags,
> Directly fail to boot up VM as with aer enabled.
It's not a flag used to detect the resume notifier, but simply probing
IRQ_INFO for the index allocated for this notification.
> 2. Immediately notify the VM on error detected.
> 3. Stall any access to the device until resume is signaled.
> Disable mmaps, drop writes, return -1 for reads.
> 4. Delay the guest directed bus reset.
> Don't reset bus in vfio_pci_reset function.
> 5. Wait for resume notification.
> If we don't get the resume notification from the host after
> some timeout, we would abort the guest directed bus reset
> altogether and make the device disappear,
> Initiating an unplug of the device to prevent it from further
> interacting with the VM.
> 6. After get the resume notification.
> Reset bus.
> It the second bus reset. Because the host did bus reset already.
> But as you said we shouldn't necessarily design the API that
> strictly around the current behavior of the Linux AER handler.
Otherwise it sounds like what I had proposed. Thanks,
Alex
> On 2016/5/7 0:39, Alex Williamson wrote:
> > On Fri, 6 May 2016 09:38:41 +0800
> > Chen Fan <chen.fan.fnst@cn.fujitsu.com> wrote:
> >
> >> On 04/26/2016 10:48 PM, Alex Williamson wrote:
> >>> On Tue, 26 Apr 2016 11:39:02 +0800
> >>> Chen Fan<chen.fan.fnst@cn.fujitsu.com> wrote:
> >>>
> >>>> On 04/14/2016 09:02 AM, Chen Fan wrote:
> >>>>> On 04/12/2016 05:38 AM, Alex Williamson wrote:
> >>>>>> On Tue, 5 Apr 2016 19:42:02 +0800
> >>>>>> Cao jin<caoj.fnst@cn.fujitsu.com> wrote:
> >>>>>>
> >>>>>>> From: Chen Fan<chen.fan.fnst@cn.fujitsu.com>
> >>>>>>>
> >>>>>>> for supporting aer recovery, host and guest would run the same aer
> >>>>>>> recovery code, that would do the secondary bus reset if the error
> >>>>>>> is fatal, the aer recovery process:
> >>>>>>> 1. error_detected
> >>>>>>> 2. reset_link (if fatal)
> >>>>>>> 3. slot_reset/mmio_enabled
> >>>>>>> 4. resume
> >>>>>>>
> >>>>>>> it indicates that host will do secondary bus reset to reset
> >>>>>>> the physical devices under bus in step 2, that would cause
> >>>>>>> devices in D3 status in a short time. but in qemu, we register
> >>>>>>> an error detected handler, that would be invoked as host broadcasts
> >>>>>>> the error-detected event in step 1, in order to avoid guest do
> >>>>>>> reset_link when host do reset_link simultaneously. it may cause
> >>>>>>> fatal error. we introduce a resmue notifier to assure host reset
> >>>>>>> completely. then do guest aer injection.
> >>>>>> Why is it safe to continue running the VM between the error detected
> >>>>>> notification and the resume notification? We're just pushing back the
> >>>>>> point at which we inject the AER into the guest, potentially negating
> >>>>>> any benefit by allowing the VM to consume bad data. Shouldn't we
> >>>>>> instead be immediately notifying the VM on error detected, but stalling
> >>>>>> any access to the device until resume is signaled? How do we know that
> >>>>>> resume will ever be signaled? We have both the problem that we may be
> >>>>>> running on an older kernel that won't support a resume notification and
> >>>>>> the problem that seeing a resume notification depends on the host being
> >>>>>> able to successfully complete a link reset after fatal error. We can
> >>>>>> detect support for resume notification, but we still need a strategy
> >>>>>> for never receiving it. Thanks,
> >>>>> That's make sense, but I haven't came up with a good idea. do you have
> >>>>> any idea, Alex?
> >>> I don't know that there are any good solutions here. We need to
> >>> respond to the current error notifier interrupt and not regress from
> >>> our support there. I think that means that if we want to switch from a
> >>> simple halt-on-error to a mechanism for the guest to handle recovery,
> >>> we need to disable access to the device between being notified that the
> >>> error occurred and being notified to resume. We can do that by
> >>> disabling mmaps to the device and preventing access via the slow path
> >>> handlers. I don't know what the best solution is for preventing access,
> >>> do we block and pause the VM or do we drop writes and return -1 for
> >>> reads, that's something that needs to be determined. We also need to
> >>> inject the AER into the VM at the point we're notified of an error
> >>> because the VM needs to know as soon as possible to stop using the
> >>> device or trusting any data from it. The next coordination point would
> >>> be something like the resume notifier that you've added and there are
> >>> numerous questions around the interaction of that with the guest
> >>> handling. Clearly we can't do a guest directed bus reset until we get
> >>> the resume notifier, so do we block that execution path in QEMU until
> >>> the resume notification is received? What happens if we don't get that
> >>> notification? Is there any way that we can rely on the host having
> >>> done a bus reset to the point where we don't need to act on the guest
> >>> directed reset? These are all things that need to be figured out.
> >>> Thanks,
> >> Maybe we can simply pause the vcpu running and avoid the VM to
> >> access the device. and add two flags in VFIO_DEVICE_GET_INFO to query
> >> whether the vfio pci driver has a resume notifier,
> >> if it does not have resume notifier flags, we can directly fail to boot
> >> up VM
> >> as with aer enabled.
> >
> > We can already tell if a resume interrupt is supported between the IRQ
> > count in vfio_device_info and a probe with vfio_irq_info, what would
> > additional flags in vfio_device_info tell us beyond a resume interrupt
> > being supported? Is pausing the VM acceptable from a service guarantee
> > perspective to users? A bus reset can take a full second and I imagine
> > deeper PCI hierarchies can push that out depending on what level the
> > error occurs. A second of downtime may be enough to trigger failovers
> > to other systems. If we were to disable mmaps when a fault occurs, we
> > could trap any further device access, drop writes, return -1 for
> > reads. This seems reasonable since we've already notified the VM that
> > the device had a fault. The synchronization point seems like when the
> > guest tries to do a bus reset, we need to block that until we get the
> > resume notification from the host. Perhaps if that doesn't occur after
> > some timeout, we would abort the guest directed bus reset altogether
> > and make the device disappear, perhaps even initiating an unplug of the
> > device to prevent it from further interacting with the VM.
> >
> >> otherwise, we should wait for resume notifier coming to
> >> restart the cpu. about the problem of the reduplicated bus reset by host
> >> and guest,
> >> I think qemu can according to the error is fatal or non-fatal to decide
> >> whether need
> >> to do a bus reset on guest, I think it's not critical and could be
> >> resolved later.
> >
> > The vfio error interrupt doesn't signal non-fatal errors afaik. I'm
> > also not sure we have an guarantee that the host has performed a bus
> > reset, we shouldn't necessarily design the API that strictly around the
> > current behavior of the Linux AER handler. So I don't know that
> > there's any practical way to avoid duplicate bus resets between host
> > and guest recovery. Thanks,
> >
> > Alex
> >
> >
> > .
> >
>
>
>
next prev parent reply other threads:[~2016-05-11 20:20 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-05 11:41 [Qemu-devel] [patch v6 00/12] vfio-pci: pass the aer error to guest, part2 Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 01/12] vfio: extract vfio_get_hot_reset_info as a single function Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 02/12] vfio: squeeze out vfio_pci_do_hot_reset for support bus reset Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 03/12] vfio: add pcie extended capability support Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 04/12] vfio: add aer support for vfio device Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 05/12] vfio: refine function vfio_pci_host_match Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 06/12] vfio: add check host bus reset is support or not Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 07/12] pci: add a pci_function_is_valid callback to check function if valid Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 08/12] vfio: add check aer functionality for hotplug device Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 09/12] vfio: vote the function 0 to do host bus reset when aer occurred Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 10/12] vfio-pci: pass the aer error to guest Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume Cao jin
2016-04-11 21:38 ` Alex Williamson
2016-04-14 1:02 ` Chen Fan
2016-04-26 3:39 ` Chen Fan
2016-04-26 14:48 ` Alex Williamson
2016-05-06 1:38 ` Chen Fan
2016-05-06 16:39 ` Alex Williamson
2016-05-11 3:11 ` Zhou Jie
2016-05-11 20:20 ` Alex Williamson [this message]
2016-05-24 10:49 ` Michael S. Tsirkin
2016-05-25 1:08 ` Zhou Jie
2016-05-25 2:54 ` Alex Williamson
2016-05-25 8:45 ` Michael S. Tsirkin
2016-05-25 14:22 ` Alex Williamson
2016-04-05 11:42 ` [Qemu-devel] [patch v6 12/12] vfio: add 'aer' property to expose aercap Cao jin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20160511142028.2f6f511d@t450s.home \
--to=alex.williamson@redhat.com \
--cc=caoj.fnst@cn.fujitsu.com \
--cc=chen.fan.fnst@cn.fujitsu.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=zhoujie2011@cn.fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).