qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Zhou Jie <zhoujie2011@cn.fujitsu.com>
Cc: Chen Fan <chen.fan.fnst@cn.fujitsu.com>,
	izumi.taku@jp.fujitsu.com, Cao jin <caoj.fnst@cn.fujitsu.com>,
	qemu-devel@nongnu.org, mst@redhat.com
Subject: Re: [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume
Date: Wed, 11 May 2016 14:20:28 -0600	[thread overview]
Message-ID: <20160511142028.2f6f511d@t450s.home> (raw)
In-Reply-To: <f006881e-70f7-7745-040a-ba3bbdfa6663@cn.fujitsu.com>

On Wed, 11 May 2016 11:11:39 +0800
Zhou Jie <zhoujie2011@cn.fujitsu.com> wrote:

> Hi, Alex
>      What do you think about the following solution?
>      1. Detect support for resume notification.
>         If host vfio driver does not have resume notifier flags,
>         Directly fail to boot up VM as with aer enabled.

It's not a flag used to detect the resume notifier, but simply probing
IRQ_INFO for the index allocated for this notification.

>      2. Immediately notify the VM on error detected.
>      3. Stall any access to the device until resume is signaled.
>         Disable mmaps, drop writes, return -1 for reads.
>      4. Delay the guest directed bus reset.
>         Don't reset bus in vfio_pci_reset function.
>      5. Wait for resume notification.
>         If we don't get the resume notification from the host after
>         some timeout, we would abort the guest directed bus reset
>         altogether and make the device disappear,
>         Initiating an unplug of the device to prevent it from further
>         interacting with the VM.
>      6. After get the resume notification.
>         Reset bus.
>         It the second bus reset. Because the host did bus reset already.
>         But as you said we shouldn't necessarily design the API that
>         strictly around the current behavior of the Linux AER handler.

Otherwise it sounds like what I had proposed.  Thanks,

Alex

> On 2016/5/7 0:39, Alex Williamson wrote:
> > On Fri, 6 May 2016 09:38:41 +0800
> > Chen Fan <chen.fan.fnst@cn.fujitsu.com> wrote:
> >  
> >> On 04/26/2016 10:48 PM, Alex Williamson wrote:  
> >>> On Tue, 26 Apr 2016 11:39:02 +0800
> >>> Chen Fan<chen.fan.fnst@cn.fujitsu.com>  wrote:
> >>>  
> >>>> On 04/14/2016 09:02 AM, Chen Fan wrote:  
> >>>>> On 04/12/2016 05:38 AM, Alex Williamson wrote:  
> >>>>>> On Tue, 5 Apr 2016 19:42:02 +0800
> >>>>>> Cao jin<caoj.fnst@cn.fujitsu.com>  wrote:
> >>>>>>  
> >>>>>>> From: Chen Fan<chen.fan.fnst@cn.fujitsu.com>
> >>>>>>>
> >>>>>>> for supporting aer recovery, host and guest would run the same aer
> >>>>>>> recovery code, that would do the secondary bus reset if the error
> >>>>>>> is fatal, the aer recovery process:
> >>>>>>>     1. error_detected
> >>>>>>>     2. reset_link (if fatal)
> >>>>>>>     3. slot_reset/mmio_enabled
> >>>>>>>     4. resume
> >>>>>>>
> >>>>>>> it indicates that host will do secondary bus reset to reset
> >>>>>>> the physical devices under bus in step 2, that would cause
> >>>>>>> devices in D3 status in a short time. but in qemu, we register
> >>>>>>> an error detected handler, that would be invoked as host broadcasts
> >>>>>>> the error-detected event in step 1, in order to avoid guest do
> >>>>>>> reset_link when host do reset_link simultaneously. it may cause
> >>>>>>> fatal error. we introduce a resmue notifier to assure host reset
> >>>>>>> completely. then do guest aer injection.  
> >>>>>> Why is it safe to continue running the VM between the error detected
> >>>>>> notification and the resume notification?  We're just pushing back the
> >>>>>> point at which we inject the AER into the guest, potentially negating
> >>>>>> any benefit by allowing the VM to consume bad data.  Shouldn't we
> >>>>>> instead be immediately notifying the VM on error detected, but stalling
> >>>>>> any access to the device until resume is signaled?  How do we know that
> >>>>>> resume will ever be signaled?  We have both the problem that we may be
> >>>>>> running on an older kernel that won't support a resume notification and
> >>>>>> the problem that seeing a resume notification depends on the host being
> >>>>>> able to successfully complete a link reset after fatal error. We can
> >>>>>> detect support for resume notification, but we still need a strategy
> >>>>>> for never receiving it.  Thanks,  
> >>>>> That's make sense, but I haven't came up with a good idea. do you have
> >>>>> any idea, Alex?  
> >>> I don't know that there are any good solutions here.  We need to
> >>> respond to the current error notifier interrupt and not regress from
> >>> our support there.  I think that means that if we want to switch from a
> >>> simple halt-on-error to a mechanism for the guest to handle recovery,
> >>> we need to disable access to the device between being notified that the
> >>> error occurred and being notified to resume.  We can do that by
> >>> disabling mmaps to the device and preventing access via the slow path
> >>> handlers.  I don't know what the best solution is for preventing access,
> >>> do we block and pause the VM or do we drop writes and return -1 for
> >>> reads, that's something that needs to be determined.  We also need to
> >>> inject the AER into the VM at the point we're notified of an error
> >>> because the VM needs to know as soon as possible to stop using the
> >>> device or trusting any data from it.  The next coordination point would
> >>> be something like the resume notifier that you've added and there are
> >>> numerous questions around the interaction of that with the guest
> >>> handling.  Clearly we can't do a guest directed bus reset until we get
> >>> the resume notifier, so do we block that execution path in QEMU until
> >>> the resume notification is received?  What happens if we don't get that
> >>> notification?  Is there any way that we can rely on the host having
> >>> done a bus reset to the point where we don't need to act on the guest
> >>> directed reset?  These are all things that need to be figured out.
> >>> Thanks,  
> >> Maybe we can simply pause the vcpu running and avoid the VM to
> >> access the device. and add two flags in VFIO_DEVICE_GET_INFO to query
> >> whether the vfio pci driver has a resume notifier,
> >> if it does not have resume notifier flags, we can directly fail to boot
> >> up VM
> >> as with aer enabled.  
> >
> > We can already tell if a resume interrupt is supported between the IRQ
> > count in vfio_device_info and a probe with vfio_irq_info, what would
> > additional flags in vfio_device_info tell us beyond a resume interrupt
> > being supported?  Is pausing the VM acceptable from a service guarantee
> > perspective to users?  A bus reset can take a full second and I imagine
> > deeper PCI hierarchies can push that out depending on what level the
> > error occurs.  A second of downtime may be enough to trigger failovers
> > to other systems.  If we were to disable mmaps when a fault occurs, we
> > could trap any further device access, drop writes, return -1 for
> > reads.  This seems reasonable since we've already notified the VM that
> > the device had a fault.  The synchronization point seems like when the
> > guest tries to do a bus reset, we need to block that until we get the
> > resume notification from the host.  Perhaps if that doesn't occur after
> > some timeout, we would abort the guest directed bus reset altogether
> > and make the device disappear, perhaps even initiating an unplug of the
> > device to prevent it from further interacting with the VM.
> >  
> >> otherwise, we should wait for resume notifier coming to
> >> restart the cpu. about the problem of the reduplicated bus reset by host
> >> and guest,
> >> I think qemu can according to the error is fatal or non-fatal to decide
> >> whether need
> >> to do a bus reset on guest, I think it's not critical and could be
> >> resolved later.  
> >
> > The vfio error interrupt doesn't signal non-fatal errors afaik.  I'm
> > also not sure we have an guarantee that the host has performed a bus
> > reset, we shouldn't necessarily design the API that strictly around the
> > current behavior of the Linux AER handler.  So I don't know that
> > there's any practical way to avoid duplicate bus resets between host
> > and guest recovery.  Thanks,
> >
> > Alex
> >
> >
> > .
> >  
> 
> 
> 

  reply	other threads:[~2016-05-11 20:20 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-05 11:41 [Qemu-devel] [patch v6 00/12] vfio-pci: pass the aer error to guest, part2 Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 01/12] vfio: extract vfio_get_hot_reset_info as a single function Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 02/12] vfio: squeeze out vfio_pci_do_hot_reset for support bus reset Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 03/12] vfio: add pcie extended capability support Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 04/12] vfio: add aer support for vfio device Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 05/12] vfio: refine function vfio_pci_host_match Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 06/12] vfio: add check host bus reset is support or not Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 07/12] pci: add a pci_function_is_valid callback to check function if valid Cao jin
2016-04-05 11:41 ` [Qemu-devel] [patch v6 08/12] vfio: add check aer functionality for hotplug device Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 09/12] vfio: vote the function 0 to do host bus reset when aer occurred Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 10/12] vfio-pci: pass the aer error to guest Cao jin
2016-04-05 11:42 ` [Qemu-devel] [patch v6 11/12] vfio: register aer resume notification handler for aer resume Cao jin
2016-04-11 21:38   ` Alex Williamson
2016-04-14  1:02     ` Chen Fan
2016-04-26  3:39       ` Chen Fan
2016-04-26 14:48         ` Alex Williamson
2016-05-06  1:38           ` Chen Fan
2016-05-06 16:39             ` Alex Williamson
2016-05-11  3:11               ` Zhou Jie
2016-05-11 20:20                 ` Alex Williamson [this message]
2016-05-24 10:49           ` Michael S. Tsirkin
2016-05-25  1:08             ` Zhou Jie
2016-05-25  2:54             ` Alex Williamson
2016-05-25  8:45               ` Michael S. Tsirkin
2016-05-25 14:22                 ` Alex Williamson
2016-04-05 11:42 ` [Qemu-devel] [patch v6 12/12] vfio: add 'aer' property to expose aercap Cao jin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160511142028.2f6f511d@t450s.home \
    --to=alex.williamson@redhat.com \
    --cc=caoj.fnst@cn.fujitsu.com \
    --cc=chen.fan.fnst@cn.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=zhoujie2011@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).