From: "Michael S. Tsirkin" <mst@redhat.com>
To: Juan Quintela <quintela@redhat.com>
Cc: "Jiri Denemark" <jdenemar@redhat.com>,
"Fiona Ebner" <f.ebner@proxmox.com>,
"Leonardo Bras" <leobras@redhat.com>,
"Eduardo Habkost" <eduardo@habkost.net>,
"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Yanan Wang" <wangyanan55@huawei.com>,
"Peter Xu" <peterx@redhat.com>,
qemu-devel@nongnu.org, "Daniel Berrange" <berrange@redhat.com>
Subject: Re: [PATCH v1 1/1] hw/pci: Disable PCI_ERR_UNCOR_MASK register for machine type < 8.0
Date: Sun, 28 May 2023 02:39:22 -0400 [thread overview]
Message-ID: <20230528023313-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <87ilcf4jdh.fsf@secure.mitica>
On Fri, May 26, 2023 at 09:55:22AM +0200, Juan Quintela wrote:
> Jiri Denemark <jdenemar@redhat.com> wrote:
> > On Thu, May 11, 2023 at 13:43:47 +0200, Juan Quintela wrote:
> >> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >>
> >> [Added libvirt people to the party, see the end of the message ]
> >
> > Sorry, I'm not that much into parties :-)
> >
> >> That would fix the:
> >>
> >> qemu-8.0 -M pc-7.2 -> qemu-8.0.1 -M pc-7.2
> >>
> >> It is worth it? Dunno. That is my question.
> >>
> >> And knowing from what qemu it has migrated from would not help. We
> >> would need to add a new tweak and means:
> >>
> >> This is a pc-7.2 machine that has been isntantiated in a qemu-8.0 and
> >> has the pciaerr bug. But wait, we have _that_.
> >>
> >> And it is called
> >>
> >> + { TYPE_PCI_DEVICE, "x-pcie-err-unc-mask", "off" },
> >>
> >> from the patch.
> >>
> >> We can teach libvirt about this glitch, and if he is migrating a pc-7.2
> >> machine in qemu-8.0 machine, And they want to migrate to a new qemu
> >> (call it qemu-8.1), it needs to be started:
> >>
> >> qemu-8.1 -M pc-7.2 <whatever pci devices need to do>,x-pci-err-unc-mask="true"
> >>
> >> Until the user reboots it and then that property can be reset to default
> >> value.
> >
> > Hmm and what would happen if eventually this machine gets migrated back
> > to qemu-8.0?
>
> It works.
> migrating to qemu-7.2 is what is not going to work.
> To migrate to qemu-8.0, you just need to drop the
> "x-pci-err-unc-mask=true" bit. And it would work.
>
> So, to be clear, this machine can migrate to:
>
> - qemu-8.0, you just need to drop the "x-pci-err-unc-mask=true" bit
>
> - qemu-8.0.1 or newer, you just need to maintain the
> "x-pci-err-unc-mask=true" bit.
>
> Let's just assume that qemu-7.2.1 don't get the
> "x-pci-err-unc-mask=true" bit, so it will not be able to migrate there.
>
>
> > Or even when the machine is stopped, started again, and
> > then migrated to qemu-8.0?
>
> If you do what I call a hard reset (i.e. poweroff + poweron so qemu
> dies), you should drop the "x-pci-err-unc-mask=true" bit. And then you
> can migrate to qemu-7.2 and all qemu-8.0.1 and newer.
>
> Basically what we need is a "mark" inside libvirt that means something
> like:
>
> - this is weird machine that looks like pc-7.2
> - but has "x-pci-err-unc-mask=true"
> - so it can only migrate to qemu-8.0 and newer.
> - but if it even reboots in qemu-8.0.1 or newer, we want it back to
> become a "normal" pc-7.2 machine (i.e. drop the
> x-pci-err-unc-mask=true).
>
> That would be the perfect world. But as we are in an imperfect world,
> something like:
>
> - this machine started in qemu-8.0 -M pc-7.2, we know this is broken and
> it can't migrate outside of qemu-8.0 because it would fail to go to
> either qemu-7.2 or qemu-8.0.1.
>
> I would argue that if you do the second option doing the "right" option
> i.e. the first one is not much more complicated, but that is a question
> that you should be better to answer.
>
> And then we have the other Michael question. How can we export that
> information so libvirt can use it.
>
> In this case we can comunicate libvirt:
> - In qemu-8.0 we broke pc-7.2.
> - The problem is fixed in qemu-8.0.1 using property
> "x-pci-err-unc-mask=false".
> - You can migrate from qemu-8.0 in newer if you set that property as
> true.
> - Guests started in qemu-8.0 -M pc-7.2 should reboot in qemu-8.0.1 or
> newer to become "normal pc-7.2".
> - If we publish this on qemu, we can only publish it on qemu-8.0.1 and
> newer.
> - Or we can publish it somewhere else and any libvirt can take this
> information.
> - Or we can comunicate this to libvirt, and they incorporate it on their
> source anywhere that you see fit.
And this is not an isolated instance. There are things like this in
almost each release.
My suggestion is a package with known bugs like this.
It would list these work arounds in some machine readable
format and would be essentially append only, making it
relatively safe even for very old RHEL distros to
pick up the latest version once in a while.
E.g. the fact we add bug workaround for 10.0 will not affect
7.2 so you do not need to fork with each release.
> The point here is that when we use a property on a machine type, it can
> be for two reasons:
>
> - We detected at the right time that we changed the value of something,
> and we did the right thing on hw_compat_X_Y, so libvirt needs to do
> nothing.
>
> - We *DID NOT* detect that we broke compatibility before release, and we
> need to make a property to identify that problem. This is where we
> need to do this dance.
>
> Notice that normally we detect lots of problems during development and
> this *should* not happen. But when it happens, we need to be able to do
> something.
>
> And also notice that normally we broke just some device, not a whole
> machine type. But as you can see we have broke it this time. We are
> trying to automate the detection of this kind of failures, but we are
> still on design stage, so we need to plan how to handle this.
>
> Any comments?
>
> Later, Juan.
>
>
>
>
prev parent reply other threads:[~2023-05-28 6:41 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-03 0:27 [PATCH v1 1/1] hw/pci: Disable PCI_ERR_UNCOR_MASK register for machine type < 8.0 Leonardo Bras
2023-05-03 9:32 ` Jonathan Cameron via
2023-05-03 15:54 ` Leonardo Bras Soares Passos
2023-05-03 15:10 ` Peter Xu
2023-05-03 17:04 ` Juan Quintela
2023-05-09 14:01 ` Peter Xu
2023-05-09 15:23 ` Michael S. Tsirkin
2023-05-09 15:32 ` Juan Quintela
2023-05-10 16:29 ` Michael Tokarev
2023-05-10 16:33 ` Michael S. Tsirkin
2023-05-10 16:42 ` Juan Quintela
2023-05-11 8:27 ` Fiona Ebner
2023-05-11 8:40 ` Juan Quintela
2023-05-18 7:34 ` Michael Tokarev
2023-05-18 11:33 ` Juan Quintela
2023-05-18 13:27 ` Peter Xu
2023-05-18 15:10 ` Michael S. Tsirkin
2023-05-18 15:27 ` Juan Quintela
2023-05-18 15:20 ` Juan Quintela
2023-05-11 10:48 ` Michael S. Tsirkin
2023-05-11 11:43 ` Juan Quintela
2023-05-11 12:20 ` Michael S. Tsirkin
2023-05-22 15:25 ` Jiri Denemark
2023-05-26 7:55 ` Juan Quintela
2023-05-28 6:39 ` Michael S. Tsirkin [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230528023313-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=berrange@redhat.com \
--cc=eduardo@habkost.net \
--cc=f.ebner@proxmox.com \
--cc=jdenemar@redhat.com \
--cc=leobras@redhat.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=wangyanan55@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).