qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: Chen Fan <chen.fan.fnst@cn.fujitsu.com>,
	izumi.taku@jp.fujitsu.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v5 7/7] pc: add PC_I440FX_COMPAT to disable aercap for vifo device
Date: Wed, 18 Mar 2015 19:56:37 +0100	[thread overview]
Message-ID: <20150318195506-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <1426702095.3643.414.camel@redhat.com>

On Wed, Mar 18, 2015 at 12:08:15PM -0600, Alex Williamson wrote:
> On Wed, 2015-03-18 at 18:45 +0100, Michael S. Tsirkin wrote:
> > On Wed, Mar 18, 2015 at 11:11:28AM -0600, Alex Williamson wrote:
> > > On Wed, 2015-03-18 at 17:44 +0100, Michael S. Tsirkin wrote:
> > > > On Wed, Mar 18, 2015 at 09:45:29AM -0600, Alex Williamson wrote:
> > > > > On Wed, 2015-03-18 at 16:02 +0100, Michael S. Tsirkin wrote:
> > > > > > On Wed, Mar 18, 2015 at 08:50:54AM -0600, Alex Williamson wrote:
> > > > > > > On Wed, 2015-03-18 at 15:36 +0100, Michael S. Tsirkin wrote:
> > > > > > > > On Wed, Mar 18, 2015 at 08:15:01AM -0600, Alex Williamson wrote:
> > > > > > > > > On Wed, 2015-03-18 at 15:05 +0100, Michael S. Tsirkin wrote:
> > > > > > > > > > On Wed, Mar 18, 2015 at 08:02:26AM -0600, Alex Williamson wrote:
> > > > > > > > > > > On Wed, 2015-03-18 at 14:23 +0100, Michael S. Tsirkin wrote:
> > > > > > > > > > > > typo in subject: vfio, not vifo.
> > > > > > > > > > > > 
> > > > > > > > > > > > On Thu, Mar 12, 2015 at 06:23:59PM +0800, Chen Fan wrote:
> > > > > > > > > > > > > for piix4 chipset, we don't need to expose aer, so introduce
> > > > > > > > > > > > > PC_I440FX_COMPAT for all piix4 machines to disable aercap,
> > > > > > > > > > > > > and add HW_COMPAT_2_2 to disable aercap for all lower
> > > > > > > > > > > > > than 2.3.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> > > > > > > > > > > > 
> > > > > > > > > > > > Well vfio is never migrated ATM.
> > > > > > > > > > > > So why is compat code needed at all?
> > > > > > > > > > > 
> > > > > > > > > > > It's not for migration, it's to maintain current behavior on existing
> > > > > > > > > > > platforms.  If someone gets an uncorrected AER error on q35 machine type
> > > > > > > > > > > today, the VM stops.  With this change, AER would be exposed to the
> > > > > > > > > > > guest and the guest could handle it.  The compat change therefore
> > > > > > > > > > > maintains the stop VM behavior on existing q35 machine types.
> > > > > > > > > > 
> > > > > > > > > > If stop VM behaviour is useful, expose it to users.
> > > > > > > > > > If not, then don't.
> > > > > > > > > > I don't see why does it have to be tied to machine types.
> > > > > > > > > 
> > > > > > > > > Because q35-2.2 machine type will currently do a stop VM on uncorrected
> > > > > > > > > AER error.  If we don't tie that to a machine option then q35-2.2 would
> > > > > > > > > suddenly start exposing the error to the guest.  That's a fairly
> > > > > > > > > significant change in behavior for a static machine type.
> > > > > > > > 
> > > > > > > > I don't think you can classify it as a behaviour change. VM stop is not
> > > > > > > > guest visible behaviour.
> > > > > > > 
> > > > > > > In one case, an uncorrected AER occurs and the VM is stopped by QEMU.
> > > > > > > In the other case, the guest is notified and may attempt corrective
> > > > > > > action... or maybe the guest doesn't understand AER and the user is
> > > > > > > depending on the previous behavior.  That is absolutely a behavior
> > > > > > > change.
> > > > > > > 
> > > > > > > > Are you worrying about guests misbehaving when they see these errors?
> > > > > > > > Then you want this as user-controlled, supported option.
> > > > > > > 
> > > > > > > Whether the option is user visible is tangential to whether the behavior
> > > > > > > of existing machine types should be maintained.  Existing machine types
> > > > > > > can impose a different default than current machine types.
> > > > > > >
> > > > > > > > In other words: we only tie things to machine types when we
> > > > > > > > have to. This code gets almost no testing, and is a lot of
> > > > > > > > work to test. This one sounds like "just in case" is not a good
> > > > > > > > motivation.
> > > > > > > 
> > > > > > > It seems like an obvious use case for using machine types to maintain
> > > > > > > compatibility with previous behavior, which is exactly why we have
> > > > > > > machine types.  If we're not going to use it, why do we have it?
> > > > > > 
> > > > > > We have machine types because of the following issues:
> > > > > > - some silent changes confuse guests. For example guest installed with
> > > > > >   one machine type might not boot if you try to use it after
> > > > > >   changing something, or - in case of windows - throw up warnings.
> > > > > > - some changes break migration
> > > > > > 
> > > > > > Looks like none of these cases.
> > > > > > If AER is unsafe, turn it off by default for everyone.
> > > > > 
> > > > > This is silly, we have the tools, let's use them.
> > > > 
> > > > It's a very expensive tool, maintainance-wise. We often don't
> > > > have the choice but I'm not going to use this tool by choice
> > > > unless we know why we are doing this.
> > > > 
> > > > > If a user is running
> > > > > a VM that gets a VM stop on AER error one day and they upgrade QEMU and
> > > > > restart it, they should get the same behavior, whether a migration is
> > > > > involved or not.
> > > > 
> > > > You keep saying this, but why should it? Answer that question, the rest
> > > > will follow.
> > > 
> > > My answer is that it's a user visible change in behavior that they may
> > > rely on and would not expect to change within an existing machine type.
> > > Silently modifying the default may expose them to error conditions that
> > > were previously handled by QEMU and adds AER handling requirements to
> > > the guest.
> > > 
> > > Ignoring the question about whether an AER can be reliably recovered in
> > > the guest for a moment, a typical PC hardware platform will signal the
> > > running OS with AER errors, so it seems that *if* we can achieve parity
> > > in a virtual machine with that signaling, and more importantly the
> > > recovery process, then the default going forward should be to mimic bare
> > > metal behavior, thus changing the default from what we do today.
> > > 
> > > As it is now, I think we have too many outstanding questions about how
> > > recover occurs to change the default.
> > > 
> > > So let me return the question, if we were to resolve those questions and
> > > change the default handling from what we do today, creating a user
> > > visible behavior change and imposing new requirements for AER handling
> > > in the guest, why is that not worthy of machine type stability?  I'd
> > > certainly expect it from a distro specific machine type.
> > 
> > This really depends on what guests do with this.
> > We try to avoid guest visible changes during migration
> > (even that's not 100%).
> > There aren't such requirements for devices where migration
> > is disabled, e.g. we did many guest visible changes in q35.
> > 
> > Latest machine types are better tested. Deviating from latest machine
> > just for a theoretical "this is guest visible" isn't going to give
> > you better stability, it will give you worse stability.
> 
> You keep calling this theoretical.  As it exists today, an uncorrected
> AER error results in a VM stop with no guest intervention or support
> required.  If we were to change the default, the guest would instead be
> notified of the AER and given the opportunity to recover.  That's not
> just guest visible, that's a complete change in error handling paradigm.

OK, so maybe it's a feature that users should have control over.
But tying it to machine types makes no sense.

> > If you are worried about guest bugs, they are not going
> > away just because we release new QEMU.So if guests are
> > buggy with this feature, this needs a solution for all
> > machine types.
> 
> Guest bugs are not my issue.
> 
> > Anyway, if we keep the default, that's even easier.
> 
> Agreed, and that may be where we end if we can't come up with a
> reasonable expectation that the guest can recover from an AER.  However,
> I think machine compatibility should be fair game if we do want to flip
> that switch.
> 
> > > > > Maybe the default should be disabled, this patch
> > > > > series hasn't yet even convinced me that there's a worthwhile general
> > > > > case where the guest can recover, but using the existing machine
> > > > > compatibility infrastructure should be at our disposal if we do think
> > > > > the default going forward should be different than the behavior today.
> > > > 
> > > > I'm sorry, I don't think it's the right tool for the job.
> > > > 
> > > 
> > > 
> 
> 

  reply	other threads:[~2015-03-18 18:56 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-12 10:23 [Qemu-devel] [PATCH v5 0/7] pass aer error to guest for vfio device Chen Fan
2015-03-12 10:23 ` [Qemu-devel] [PATCH v5 1/7] vfio: add pcie extanded capability support Chen Fan
2015-03-12 10:23 ` [Qemu-devel] [PATCH v5 2/7] aer: impove pcie_aer_init to support vfio device Chen Fan
2015-03-13 22:25   ` Alex Williamson
2015-03-16  2:30     ` Chen Fan
2015-03-12 10:23 ` [Qemu-devel] [PATCH v5 3/7] vfio: add aer support for " Chen Fan
2015-03-13 22:28   ` Alex Williamson
2015-03-12 10:23 ` [Qemu-devel] [PATCH v5 4/7] pcie_aer: expose pcie_aer_msg() interface Chen Fan
2015-03-13 22:30   ` Alex Williamson
2015-03-18 13:29   ` Michael S. Tsirkin
2015-03-19  1:33     ` Chen Fan
2015-03-12 10:23 ` [Qemu-devel] [PATCH v5 5/7] vfio-pci: pass the aer error to guest Chen Fan
2015-03-13 22:34   ` Alex Williamson
2015-03-16  3:05     ` Chen Fan
2015-03-16  3:52       ` Alex Williamson
2015-03-16  7:35         ` Chen Fan
2015-03-16 14:09           ` Alex Williamson
2015-03-25  1:33             ` Chen Fan
2015-03-25  2:31               ` Alex Williamson
2015-03-25  1:53             ` Chen Fan
2015-03-25  2:41               ` Alex Williamson
2015-03-25  3:07                 ` Chen Fan
2015-04-01  4:12                 ` Chen Fan
2015-04-01 15:46                   ` Alex Williamson
2015-04-08  8:59                     ` Chen Fan
2015-04-08 15:36                       ` Alex Williamson
2015-04-15 10:30                         ` Chen Fan
2015-04-15 14:18                           ` Alex Williamson
2015-03-12 10:23 ` [Qemu-devel] [PATCH v5 6/7] vfio: add 'x-aer' property to expose aercap Chen Fan
2015-03-18 13:23   ` Michael S. Tsirkin
2015-03-18 14:09     ` Alex Williamson
2015-03-12 10:23 ` [Qemu-devel] [PATCH v5 7/7] pc: add PC_I440FX_COMPAT to disable aercap for vifo device Chen Fan
2015-03-13 22:38   ` Alex Williamson
2015-03-16  2:48     ` Chen Fan
2015-03-16  2:49     ` Chen Fan
2015-03-18 13:23   ` Michael S. Tsirkin
2015-03-18 14:02     ` Alex Williamson
2015-03-18 14:05       ` Michael S. Tsirkin
2015-03-18 14:15         ` Alex Williamson
2015-03-18 14:36           ` Michael S. Tsirkin
2015-03-18 14:50             ` Alex Williamson
2015-03-18 15:02               ` Michael S. Tsirkin
2015-03-18 15:45                 ` Alex Williamson
2015-03-18 16:44                   ` Michael S. Tsirkin
2015-03-18 17:11                     ` Alex Williamson
2015-03-18 17:45                       ` Michael S. Tsirkin
2015-03-18 18:08                         ` Alex Williamson
2015-03-18 18:56                           ` Michael S. Tsirkin [this message]
2015-03-18 19:05                             ` Alex Williamson
2015-03-19 21:26                               ` Paolo Bonzini
2015-03-16  2:52 ` [Qemu-devel] [PATCH v5 0/7] pass aer error to guest for vfio device Chen Fan
2015-03-16  4:57   ` Michael S. Tsirkin
2015-03-19 21:44     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150318195506-mutt-send-email-mst@redhat.com \
    --to=mst@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=chen.fan.fnst@cn.fujitsu.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).