From: Alex Williamson <alex.williamson@redhat.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Chen Fan <chen.fan.fnst@cn.fujitsu.com>,
izumi.taku@jp.fujitsu.com, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH v5 7/7] pc: add PC_I440FX_COMPAT to disable aercap for vifo device
Date: Wed, 18 Mar 2015 13:05:38 -0600 [thread overview]
Message-ID: <1426705538.3643.425.camel@redhat.com> (raw)
In-Reply-To: <20150318195506-mutt-send-email-mst@redhat.com>
On Wed, 2015-03-18 at 19:56 +0100, Michael S. Tsirkin wrote:
> On Wed, Mar 18, 2015 at 12:08:15PM -0600, Alex Williamson wrote:
> > On Wed, 2015-03-18 at 18:45 +0100, Michael S. Tsirkin wrote:
> > > On Wed, Mar 18, 2015 at 11:11:28AM -0600, Alex Williamson wrote:
> > > > On Wed, 2015-03-18 at 17:44 +0100, Michael S. Tsirkin wrote:
> > > > > On Wed, Mar 18, 2015 at 09:45:29AM -0600, Alex Williamson wrote:
> > > > > > On Wed, 2015-03-18 at 16:02 +0100, Michael S. Tsirkin wrote:
> > > > > > > On Wed, Mar 18, 2015 at 08:50:54AM -0600, Alex Williamson wrote:
> > > > > > > > On Wed, 2015-03-18 at 15:36 +0100, Michael S. Tsirkin wrote:
> > > > > > > > > On Wed, Mar 18, 2015 at 08:15:01AM -0600, Alex Williamson wrote:
> > > > > > > > > > On Wed, 2015-03-18 at 15:05 +0100, Michael S. Tsirkin wrote:
> > > > > > > > > > > On Wed, Mar 18, 2015 at 08:02:26AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > On Wed, 2015-03-18 at 14:23 +0100, Michael S. Tsirkin wrote:
> > > > > > > > > > > > > typo in subject: vfio, not vifo.
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Thu, Mar 12, 2015 at 06:23:59PM +0800, Chen Fan wrote:
> > > > > > > > > > > > > > for piix4 chipset, we don't need to expose aer, so introduce
> > > > > > > > > > > > > > PC_I440FX_COMPAT for all piix4 machines to disable aercap,
> > > > > > > > > > > > > > and add HW_COMPAT_2_2 to disable aercap for all lower
> > > > > > > > > > > > > > than 2.3.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Signed-off-by: Chen Fan <chen.fan.fnst@cn.fujitsu.com>
> > > > > > > > > > > > >
> > > > > > > > > > > > > Well vfio is never migrated ATM.
> > > > > > > > > > > > > So why is compat code needed at all?
> > > > > > > > > > > >
> > > > > > > > > > > > It's not for migration, it's to maintain current behavior on existing
> > > > > > > > > > > > platforms. If someone gets an uncorrected AER error on q35 machine type
> > > > > > > > > > > > today, the VM stops. With this change, AER would be exposed to the
> > > > > > > > > > > > guest and the guest could handle it. The compat change therefore
> > > > > > > > > > > > maintains the stop VM behavior on existing q35 machine types.
> > > > > > > > > > >
> > > > > > > > > > > If stop VM behaviour is useful, expose it to users.
> > > > > > > > > > > If not, then don't.
> > > > > > > > > > > I don't see why does it have to be tied to machine types.
> > > > > > > > > >
> > > > > > > > > > Because q35-2.2 machine type will currently do a stop VM on uncorrected
> > > > > > > > > > AER error. If we don't tie that to a machine option then q35-2.2 would
> > > > > > > > > > suddenly start exposing the error to the guest. That's a fairly
> > > > > > > > > > significant change in behavior for a static machine type.
> > > > > > > > >
> > > > > > > > > I don't think you can classify it as a behaviour change. VM stop is not
> > > > > > > > > guest visible behaviour.
> > > > > > > >
> > > > > > > > In one case, an uncorrected AER occurs and the VM is stopped by QEMU.
> > > > > > > > In the other case, the guest is notified and may attempt corrective
> > > > > > > > action... or maybe the guest doesn't understand AER and the user is
> > > > > > > > depending on the previous behavior. That is absolutely a behavior
> > > > > > > > change.
> > > > > > > >
> > > > > > > > > Are you worrying about guests misbehaving when they see these errors?
> > > > > > > > > Then you want this as user-controlled, supported option.
> > > > > > > >
> > > > > > > > Whether the option is user visible is tangential to whether the behavior
> > > > > > > > of existing machine types should be maintained. Existing machine types
> > > > > > > > can impose a different default than current machine types.
> > > > > > > >
> > > > > > > > > In other words: we only tie things to machine types when we
> > > > > > > > > have to. This code gets almost no testing, and is a lot of
> > > > > > > > > work to test. This one sounds like "just in case" is not a good
> > > > > > > > > motivation.
> > > > > > > >
> > > > > > > > It seems like an obvious use case for using machine types to maintain
> > > > > > > > compatibility with previous behavior, which is exactly why we have
> > > > > > > > machine types. If we're not going to use it, why do we have it?
> > > > > > >
> > > > > > > We have machine types because of the following issues:
> > > > > > > - some silent changes confuse guests. For example guest installed with
> > > > > > > one machine type might not boot if you try to use it after
> > > > > > > changing something, or - in case of windows - throw up warnings.
> > > > > > > - some changes break migration
> > > > > > >
> > > > > > > Looks like none of these cases.
> > > > > > > If AER is unsafe, turn it off by default for everyone.
> > > > > >
> > > > > > This is silly, we have the tools, let's use them.
> > > > >
> > > > > It's a very expensive tool, maintainance-wise. We often don't
> > > > > have the choice but I'm not going to use this tool by choice
> > > > > unless we know why we are doing this.
> > > > >
> > > > > > If a user is running
> > > > > > a VM that gets a VM stop on AER error one day and they upgrade QEMU and
> > > > > > restart it, they should get the same behavior, whether a migration is
> > > > > > involved or not.
> > > > >
> > > > > You keep saying this, but why should it? Answer that question, the rest
> > > > > will follow.
> > > >
> > > > My answer is that it's a user visible change in behavior that they may
> > > > rely on and would not expect to change within an existing machine type.
> > > > Silently modifying the default may expose them to error conditions that
> > > > were previously handled by QEMU and adds AER handling requirements to
> > > > the guest.
> > > >
> > > > Ignoring the question about whether an AER can be reliably recovered in
> > > > the guest for a moment, a typical PC hardware platform will signal the
> > > > running OS with AER errors, so it seems that *if* we can achieve parity
> > > > in a virtual machine with that signaling, and more importantly the
> > > > recovery process, then the default going forward should be to mimic bare
> > > > metal behavior, thus changing the default from what we do today.
> > > >
> > > > As it is now, I think we have too many outstanding questions about how
> > > > recover occurs to change the default.
> > > >
> > > > So let me return the question, if we were to resolve those questions and
> > > > change the default handling from what we do today, creating a user
> > > > visible behavior change and imposing new requirements for AER handling
> > > > in the guest, why is that not worthy of machine type stability? I'd
> > > > certainly expect it from a distro specific machine type.
> > >
> > > This really depends on what guests do with this.
> > > We try to avoid guest visible changes during migration
> > > (even that's not 100%).
> > > There aren't such requirements for devices where migration
> > > is disabled, e.g. we did many guest visible changes in q35.
> > >
> > > Latest machine types are better tested. Deviating from latest machine
> > > just for a theoretical "this is guest visible" isn't going to give
> > > you better stability, it will give you worse stability.
> >
> > You keep calling this theoretical. As it exists today, an uncorrected
> > AER error results in a VM stop with no guest intervention or support
> > required. If we were to change the default, the guest would instead be
> > notified of the AER and given the opportunity to recover. That's not
> > just guest visible, that's a complete change in error handling paradigm.
>
> OK, so maybe it's a feature that users should have control over.
> But tying it to machine types makes no sense.
If we were to change the default, where else would you tie it? Machine
types are the only finger hold we have to maintain VM stability afaik.
next prev parent reply other threads:[~2015-03-18 19:06 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-03-12 10:23 [Qemu-devel] [PATCH v5 0/7] pass aer error to guest for vfio device Chen Fan
2015-03-12 10:23 ` [Qemu-devel] [PATCH v5 1/7] vfio: add pcie extanded capability support Chen Fan
2015-03-12 10:23 ` [Qemu-devel] [PATCH v5 2/7] aer: impove pcie_aer_init to support vfio device Chen Fan
2015-03-13 22:25 ` Alex Williamson
2015-03-16 2:30 ` Chen Fan
2015-03-12 10:23 ` [Qemu-devel] [PATCH v5 3/7] vfio: add aer support for " Chen Fan
2015-03-13 22:28 ` Alex Williamson
2015-03-12 10:23 ` [Qemu-devel] [PATCH v5 4/7] pcie_aer: expose pcie_aer_msg() interface Chen Fan
2015-03-13 22:30 ` Alex Williamson
2015-03-18 13:29 ` Michael S. Tsirkin
2015-03-19 1:33 ` Chen Fan
2015-03-12 10:23 ` [Qemu-devel] [PATCH v5 5/7] vfio-pci: pass the aer error to guest Chen Fan
2015-03-13 22:34 ` Alex Williamson
2015-03-16 3:05 ` Chen Fan
2015-03-16 3:52 ` Alex Williamson
2015-03-16 7:35 ` Chen Fan
2015-03-16 14:09 ` Alex Williamson
2015-03-25 1:33 ` Chen Fan
2015-03-25 2:31 ` Alex Williamson
2015-03-25 1:53 ` Chen Fan
2015-03-25 2:41 ` Alex Williamson
2015-03-25 3:07 ` Chen Fan
2015-04-01 4:12 ` Chen Fan
2015-04-01 15:46 ` Alex Williamson
2015-04-08 8:59 ` Chen Fan
2015-04-08 15:36 ` Alex Williamson
2015-04-15 10:30 ` Chen Fan
2015-04-15 14:18 ` Alex Williamson
2015-03-12 10:23 ` [Qemu-devel] [PATCH v5 6/7] vfio: add 'x-aer' property to expose aercap Chen Fan
2015-03-18 13:23 ` Michael S. Tsirkin
2015-03-18 14:09 ` Alex Williamson
2015-03-12 10:23 ` [Qemu-devel] [PATCH v5 7/7] pc: add PC_I440FX_COMPAT to disable aercap for vifo device Chen Fan
2015-03-13 22:38 ` Alex Williamson
2015-03-16 2:48 ` Chen Fan
2015-03-16 2:49 ` Chen Fan
2015-03-18 13:23 ` Michael S. Tsirkin
2015-03-18 14:02 ` Alex Williamson
2015-03-18 14:05 ` Michael S. Tsirkin
2015-03-18 14:15 ` Alex Williamson
2015-03-18 14:36 ` Michael S. Tsirkin
2015-03-18 14:50 ` Alex Williamson
2015-03-18 15:02 ` Michael S. Tsirkin
2015-03-18 15:45 ` Alex Williamson
2015-03-18 16:44 ` Michael S. Tsirkin
2015-03-18 17:11 ` Alex Williamson
2015-03-18 17:45 ` Michael S. Tsirkin
2015-03-18 18:08 ` Alex Williamson
2015-03-18 18:56 ` Michael S. Tsirkin
2015-03-18 19:05 ` Alex Williamson [this message]
2015-03-19 21:26 ` Paolo Bonzini
2015-03-16 2:52 ` [Qemu-devel] [PATCH v5 0/7] pass aer error to guest for vfio device Chen Fan
2015-03-16 4:57 ` Michael S. Tsirkin
2015-03-19 21:44 ` Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1426705538.3643.425.camel@redhat.com \
--to=alex.williamson@redhat.com \
--cc=chen.fan.fnst@cn.fujitsu.com \
--cc=izumi.taku@jp.fujitsu.com \
--cc=mst@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).