From: "Michael S. Tsirkin" <mst@redhat.com>
To: Igor Mammedov <imammedo@redhat.com>
Cc: qemu-devel@nongnu.org, kraxel@redhat.com
Subject: Re: [PATCH 2/4] pcie: update slot power status only is power control is enabled
Date: Fri, 25 Feb 2022 04:51:21 -0500 [thread overview]
Message-ID: <20220225044907-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <20220225091830.2f684997@redhat.com>
On Fri, Feb 25, 2022 at 09:18:30AM +0100, Igor Mammedov wrote:
> On Thu, 24 Feb 2022 13:05:07 -0500
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Thu, Feb 24, 2022 at 12:44:09PM -0500, Igor Mammedov wrote:
> > > on creation a PCIDevice has power turned on at the end of pci_qdev_realize()
> > > however later on if PCIe slot isn't populated with any children
> > > it's power is turned off. It's fine if native hotplug is used
> > > as plug callback will power slot on among other things.
> > > However when ACPI hotplug is enabled it replaces native PCIe plug
> > > callbacks with ACPI specific ones (acpi_pcihp_device_*plug_cb) and
> > > as result slot stays powered off. It works fine as ACPI hotplug
> > > on guest side takes care of enumerating/initializing hotplugged
> > > device. But when later guest is migrated, call chain introduced by [1]
> > >
> > > pcie_cap_slot_post_load()
> > > -> pcie_cap_update_power()
> > > -> pcie_set_power_device()
> > > -> pci_set_power()
> > > -> pci_update_mappings()
> > >
> > > will disable earlier initialized BARs for the hotplugged device
> > > in powered off slot due to commit [2] which disables BARs if
> > > power is off. As result guest OS after migration will be very
> > > much confused [3], still thinking that it has working device,
> > > which isn't true anymore due to disabled BARs.
> > >
> > > Fix it by honoring PCI_EXP_SLTCAP_PCP and updating power status
> > > only if capability is enabled. Follow up patch will disable
> > > PCI_EXP_SLTCAP_PCP overriding COMPAT_PROP_PCP property when
> > > PCIe slot is under ACPI PCI hotplug control.
> > >
> > > See [3] for reproducer.
> > >
> > > 1)
> > > Fixes: commit d5daff7d312 (pcie: implement slot power control for pcie root ports)
> > > 2)
> > > commit 23786d13441 (pci: implement power state)
> > > 3)
> > > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2053584
> > >
> >
> >
> > Correct format for the last paragraph:
> >
> >
> > Fixes: d5daff7d312 ("pcie: implement slot power control for pcie root ports")
> > Fixes: 23786d13441 ("pci: implement power state")
> > Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2053584
>
> ok, will fix it up on respin like this to have references:
>
> 1)
> Fixes: d5daff7d312 ("pcie: implement slot power control for pcie root ports")
> 2)
> Fixes: 23786d13441 ("pci: implement power state")
> Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=2053584
Just drop references, a bit of duplication is not a problem. E.g.
in powered off slot due to commit 23786d13441 ("pci: implement power state") which disables BARs if
Trailer tags belong in a group at the end with no interruptions, not all
tools handle them otherwise.
> >
> > > Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> > > ---
> > > hw/pci/pcie.c | 5 ++---
> > > 1 file changed, 2 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > index d7d73a31e4..2339729a7c 100644
> > > --- a/hw/pci/pcie.c
> > > +++ b/hw/pci/pcie.c
> > > @@ -383,10 +383,9 @@ static void pcie_cap_update_power(PCIDevice *hotplug_dev)
> > >
> > > if (sltcap & PCI_EXP_SLTCAP_PCP) {
> > > power = (sltctl & PCI_EXP_SLTCTL_PCC) == PCI_EXP_SLTCTL_PWR_ON;
> > > + pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> > > + pcie_set_power_device, &power);
> > > }
> > > -
> > > - pci_for_each_device(sec_bus, pci_bus_num(sec_bus),
> > > - pcie_set_power_device, &power);
> >
> > I think this is correct. However, I wonder whether for 6.2 compatiblity
> > as a hack we should sometimes skip the power update even when
> > PCI_EXP_SLTCAP_PCP exists. Will that not work around the issue for
> > these machine types?
>
> pc-q35-6.2 is broken utterly.
> With pc-q35-6.1, it's a mess. Here is a ping-pong migration matrix for it
>
> v6.1 | v6.2 | Fix
> v6.1 ok | broken | ok (#1)
> v6.2 | broken | broken (#2)
>
> [1] has PCI_EXP_SLTCAP_PCP due to x-pcihp-enable-pcie-pcp-cap=on
> i.e. pci_config is exactly the same as in qemu-v6.1
> [2] PCI_EXP_SLTCAP_PCP is enabled + empty slot is powered off
> (+ state is migrated)
>
> there are some invariants that might work in one direction,
> but it won't survive ping-pong migration. And more importantly
> for upstream we care mostly care for old -> new working,
> and it's direction that is broken in v6.2.
>
> > And assuming we want bug for bug compat anyway, why not just put
> > it here? It seems easier to reason about frankly ...
>
> It should be possible hack PCI core to fixup broken power state
> on incoming migration at (at postload time), but that would just
> create more confusion, where in some cases migration would work
> and in some would not (depending on used qemu versions).
>
> Lets just declare v6.2 qemu broken, with upgrade/downgrade to
> (7.0/6.1) as suggested solution.
>
> PS:
> I'd very much prefer avoid adding hacks for ACPI pcihp sake to
> PCI core, and let PCI code behave as it's supposed to per spec.
> It's already bad enough with pcihp layered on top of PCI,
> making PCI code depend on pcihp will just make it more fragile.
>
> > > }
> > >
> > > /*
> > > --
> > > 2.31.1
> >
next prev parent reply other threads:[~2022-02-25 10:48 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-24 17:44 [PATCH 0/4] Fix broken PCIe device after migration Igor Mammedov
2022-02-24 17:44 ` [PATCH 1/4] pci: expose TYPE_XIO3130_DOWNSTREAM name Igor Mammedov
2022-02-24 17:44 ` [PATCH 2/4] pcie: update slot power status only is power control is enabled Igor Mammedov
2022-02-24 18:05 ` Michael S. Tsirkin
2022-02-25 8:18 ` Igor Mammedov
2022-02-25 9:51 ` Michael S. Tsirkin [this message]
2022-02-25 10:05 ` Michael S. Tsirkin
2022-02-25 10:12 ` Gerd Hoffmann
2022-02-25 10:35 ` Michael S. Tsirkin
2022-02-25 13:02 ` Igor Mammedov
2022-02-25 13:08 ` Michael S. Tsirkin
2022-02-25 13:35 ` Igor Mammedov
2022-02-25 13:48 ` Michael S. Tsirkin
2022-02-25 15:39 ` Igor Mammedov
2022-02-28 7:39 ` Gerd Hoffmann
2022-02-28 8:55 ` Igor Mammedov
2022-02-24 17:44 ` [PATCH 3/4] acpi: pcihp: disable power control on PCIe slot Igor Mammedov
2022-02-24 17:44 ` [PATCH 4/4] q35: compat: keep hotplugged PCIe device broken after migration for 6.2-older machine types Igor Mammedov
2022-02-24 18:11 ` Michael S. Tsirkin
2022-02-25 8:25 ` Igor Mammedov
2022-02-24 18:08 ` [PATCH 0/4] Fix broken PCIe device after migration Michael S. Tsirkin
2022-02-25 9:01 ` Igor Mammedov
2022-02-25 9:58 ` Michael S. Tsirkin
2022-02-25 13:18 ` Igor Mammedov
2022-02-25 13:50 ` Michael S. Tsirkin
2022-02-25 15:50 ` Igor Mammedov
2022-02-27 10:22 ` Michael S. Tsirkin
2022-02-28 7:49 ` Gerd Hoffmann
2022-02-25 14:32 ` Igor Mammedov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220225044907-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=imammedo@redhat.com \
--cc=kraxel@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).