From: Bjorn Helgaas <helgaas@kernel.org>
To: "Limonciello, Mario" <Mario.Limonciello@amd.com>
Cc: "Natikar, Basavaraj" <Basavaraj.Natikar@amd.com>,
"bhelgaas@google.com" <bhelgaas@google.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
"thomas@glanzmann.de" <thomas@glanzmann.de>,
"Rafael J. Wysocki" <rjw@rjwysocki.net>
Subject: Re: [PATCH] PCI: Add quirk to clear MSI-X
Date: Mon, 20 Mar 2023 12:14:47 -0500 [thread overview]
Message-ID: <20230320171447.GA2293285@bhelgaas> (raw)
In-Reply-To: <MN0PR12MB61017F7AD76AC3E3C296E6D1E2809@MN0PR12MB6101.namprd12.prod.outlook.com>
[+cc Rafael for RESUME_EARLY quirk question]
On Mon, Mar 20, 2023 at 01:32:16AM +0000, Limonciello, Mario wrote:
> > -----Original Message-----
> > From: Bjorn Helgaas <helgaas@kernel.org>
> > Sent: Friday, March 10, 2023 16:14
> > To: Limonciello, Mario <Mario.Limonciello@amd.com>
> > Cc: Natikar, Basavaraj <Basavaraj.Natikar@amd.com>; Natikar, Basavaraj
> > <Basavaraj.Natikar@amd.com>; bhelgaas@google.com; linux-
> > pci@vger.kernel.org; thomas@glanzmann.de
> > Subject: Re: [PATCH] PCI: Add quirk to clear MSI-X
> >
> > On Thu, Mar 09, 2023 at 06:57:38PM -0600, Mario Limonciello wrote:
> > > On 3/9/23 16:30, Bjorn Helgaas wrote:
> > > > On Thu, Mar 09, 2023 at 12:32:41PM -0600, Limonciello, Mario wrote:
> > > > > On 3/9/2023 12:25, Bjorn Helgaas wrote:
> > > > > ...
> > > >
> > > > > > > > https://gitlab.freedesktop.org/agd5f/linux/-
> > /commit/07494a25fc8881e122c242a46b5c53e0e4403139
> > > > > >
> > > > > > That nbio_v7.2.c patch and this patch don't look anything
> > > > > > alike. It looks like the nbio_v7.2.c patch might run
> > > > > > once? Could *this* be done once at enumeration-time, too?
> > > > >
> > > > > They don't look anything alike because they're attacking the
> > > > > problem from different angles.
> > > >
> > > > Why do we need different angles?
> > >
> > > The GPU driver approach only works if the GPU is enabled. If
> > > the GPU could never be disabled then it alone would be
> > > sufficient.
> > >
> > > > > The NBIO patch fixes the initialization value for the
> > > > > internal registers. This is what the BIOS "should" have
> > > > > done. When the internal registers are configured properly
> > > > > then the behavior the kernel expects works as well.
> > > > >
> > > > > The NBIO patch will run both at amdgpu startup as well as
> > > > > when resuming from suspend.
> > > >
> > > > If initializing something as BIOS should have done makes the
> > > > hardware work correctly, isn't once enough? Why does the NBIO
> > > > patch need to run at resume-time?
> > >
> > > During suspend some internal registers are in a power domain
> > > that the state will be lost. These are typically restored by
> > > the BIOS to the values defined in initialization tables before
> > > handing control back to the OS.
> >
> > I don't quite get this. I thought I read that if BIOS had
> > initialized the hardware correctly, a D0->D3hot->D0 transition
> > would work without any issues. Linux can do this with PMCSR
> > writes and BIOS isn't involved at all.
>
> During a suspend transition not all registers are powered. Firmware
> will capture some during the suspend transition and restore some of
> them for the resume transition, but it's up to the firmware whether
> this one is included.
>
> Furthermore most IP blocks in amdgpu typically initialize the same
> during both startup and resume to ensure that firmware couldn't have
> mucked with the expected golden state.
We're spending way more time on this than makes sense, but I do think
it's important that the commit log is accurate and makes sense even to
people who don't know the internals of the device.
It *sounds* like what's happening is:
- OS writes PMCSR to put device in D3hot
- BIOS traps D0->D3hot transition via something like SMI and
captures MSI-X state
- Device enters D3hot
- Device internal MSI-X state is lost
- BIOS traps D3hot->D0 transition via SMI
- Device enters D0
- BIOS restores MSI-X state
- OS resumes use of device
If that's what's happening, the fact that the device loses the
internal state in D3hot sounds like a *hardware* defect -- if you put
the device in a system without a BIOS, the D0->D3hot->D0 transitions
would not work as required by the PCIe spec.
We can call the fact that BIOS lacks the MSI-X save/restore a BIOS
defect, but the only reason BIOS would *need* that save/restore is
because of the underlying *hardware* defect.
If that's the case, I would expect a commit log something like this:
The AMD [1022:15b8] USB controller loses some internal functional
MSI-X context when transitioning from D0 to D3hot. BIOS normally
traps D0->D3hot and D3hot->D0 transitions so it can save and restore
that internal context, but some firmware in the field lacks this
workaround.
If MSI-X is enabled, toggle the PCI_MSIX_FLAGS_ENABLE bit when
resuming to D0, which resynchronizes the internal state that was
lost in D3hot.
Rafael, do we run the DECLARE_PCI_FIXUP_RESUME_EARLY quirks for *all*
D3hot->D0 transitions?
I'm concerned about places like pci_pm_reset(), where we do
D0->D3hot->D0 to do the reset. Or vfio_pm_config_write(), where it
looks like a guest could do that without running the quirk.
Current proposed patch is:
https://lore.kernel.org/r/ddbbfb50-24b6-202f-7452-c8959901c739@amd.com
Bjorn
next prev parent reply other threads:[~2023-03-20 17:20 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-03-06 7:23 [PATCH] PCI: Add quirk to clear MSI-X Basavaraj Natikar
2023-03-06 8:14 ` Thomas Glanzmann
2023-03-08 22:44 ` Bjorn Helgaas
2023-03-08 23:04 ` Limonciello, Mario
2023-03-09 7:34 ` Basavaraj Natikar
2023-03-09 18:25 ` Bjorn Helgaas
2023-03-09 18:32 ` Limonciello, Mario
2023-03-09 22:30 ` Bjorn Helgaas
2023-03-10 0:57 ` Mario Limonciello
2023-03-10 7:41 ` Basavaraj Natikar
2023-03-10 22:13 ` Bjorn Helgaas
2023-03-20 1:32 ` Limonciello, Mario
2023-03-20 17:14 ` Bjorn Helgaas [this message]
2023-03-20 17:20 ` Limonciello, Mario
2023-03-20 19:36 ` Bjorn Helgaas
2023-03-20 19:47 ` Limonciello, Mario
2023-03-20 21:30 ` Bjorn Helgaas
2023-03-20 21:37 ` Limonciello, Mario
2023-03-20 22:08 ` Bjorn Helgaas
2023-03-20 22:52 ` Mario Limonciello
2023-03-21 11:07 ` Bjorn Helgaas
2023-03-28 13:15 ` Basavaraj Natikar
2023-03-28 13:25 ` Limonciello, Mario
2023-03-28 17:42 ` Bjorn Helgaas
2023-03-10 7:22 ` Basavaraj Natikar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230320171447.GA2293285@bhelgaas \
--to=helgaas@kernel.org \
--cc=Basavaraj.Natikar@amd.com \
--cc=Mario.Limonciello@amd.com \
--cc=bhelgaas@google.com \
--cc=linux-pci@vger.kernel.org \
--cc=rjw@rjwysocki.net \
--cc=thomas@glanzmann.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.