Linux PCI subsystem development
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: "Limonciello, Mario" <Mario.Limonciello@amd.com>
Cc: "Natikar, Basavaraj" <Basavaraj.Natikar@amd.com>,
	"bhelgaas@google.com" <bhelgaas@google.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"thomas@glanzmann.de" <thomas@glanzmann.de>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>
Subject: Re: [PATCH] PCI: Add quirk to clear MSI-X
Date: Mon, 20 Mar 2023 12:14:47 -0500	[thread overview]
Message-ID: <20230320171447.GA2293285@bhelgaas> (raw)
In-Reply-To: <MN0PR12MB61017F7AD76AC3E3C296E6D1E2809@MN0PR12MB6101.namprd12.prod.outlook.com>

[+cc Rafael for RESUME_EARLY quirk question]

On Mon, Mar 20, 2023 at 01:32:16AM +0000, Limonciello, Mario wrote:
> > -----Original Message-----
> > From: Bjorn Helgaas <helgaas@kernel.org>
> > Sent: Friday, March 10, 2023 16:14
> > To: Limonciello, Mario <Mario.Limonciello@amd.com>
> > Cc: Natikar, Basavaraj <Basavaraj.Natikar@amd.com>; Natikar, Basavaraj
> > <Basavaraj.Natikar@amd.com>; bhelgaas@google.com; linux-
> > pci@vger.kernel.org; thomas@glanzmann.de
> > Subject: Re: [PATCH] PCI: Add quirk to clear MSI-X
> > 
> > On Thu, Mar 09, 2023 at 06:57:38PM -0600, Mario Limonciello wrote:
> > > On 3/9/23 16:30, Bjorn Helgaas wrote:
> > > > On Thu, Mar 09, 2023 at 12:32:41PM -0600, Limonciello, Mario wrote:
> > > > > On 3/9/2023 12:25, Bjorn Helgaas wrote:
> > > > > ...
> > > >
> > > > > > > > https://gitlab.freedesktop.org/agd5f/linux/-
> > /commit/07494a25fc8881e122c242a46b5c53e0e4403139
> > > > > >
> > > > > > That nbio_v7.2.c patch and this patch don't look anything
> > > > > > alike.  It looks like the nbio_v7.2.c patch might run
> > > > > > once?  Could *this* be done once at enumeration-time, too?
> > > > >
> > > > > They don't look anything alike because they're attacking the
> > > > > problem from different angles.
> > > >
> > > > Why do we need different angles?
> > >
> > > The GPU driver approach only works if the GPU is enabled.  If
> > > the GPU could never be disabled then it alone would be
> > > sufficient.
> > >
> > > > > The NBIO patch fixes the initialization value for the
> > > > > internal registers.  This is what the BIOS "should" have
> > > > > done.  When the internal registers are configured properly
> > > > > then the behavior the kernel expects works as well.
> > > > >
> > > > > The NBIO patch will run both at amdgpu startup as well as
> > > > > when resuming from suspend.
> > > >
> > > > If initializing something as BIOS should have done makes the
> > > > hardware work correctly, isn't once enough?  Why does the NBIO
> > > > patch need to run at resume-time?
> > >
> > > During suspend some internal registers are in a power domain
> > > that the state will be lost.  These are typically restored by
> > > the BIOS to the values defined in initialization tables before
> > > handing control back to the OS.
> > 
> > I don't quite get this.  I thought I read that if BIOS had
> > initialized the hardware correctly, a D0->D3hot->D0 transition
> > would work without any issues.  Linux can do this with PMCSR
> > writes and BIOS isn't involved at all.
> 
> During a suspend transition not all registers are powered.  Firmware
> will capture some during the suspend transition and restore some of
> them for the resume transition, but it's up to the firmware whether
> this one is included.
> 
> Furthermore most IP blocks in amdgpu typically initialize the same
> during both startup and resume to ensure that firmware couldn't have
> mucked with the expected golden state.

We're spending way more time on this than makes sense, but I do think
it's important that the commit log is accurate and makes sense even to
people who don't know the internals of the device.

It *sounds* like what's happening is:

  - OS writes PMCSR to put device in D3hot
  - BIOS traps D0->D3hot transition via something like SMI and
    captures MSI-X state
  - Device enters D3hot
  - Device internal MSI-X state is lost
  - BIOS traps D3hot->D0 transition via SMI
  - Device enters D0
  - BIOS restores MSI-X state
  - OS resumes use of device

If that's what's happening, the fact that the device loses the
internal state in D3hot sounds like a *hardware* defect -- if you put
the device in a system without a BIOS, the D0->D3hot->D0 transitions
would not work as required by the PCIe spec.

We can call the fact that BIOS lacks the MSI-X save/restore a BIOS
defect, but the only reason BIOS would *need* that save/restore is
because of the underlying *hardware* defect.

If that's the case, I would expect a commit log something like this:

  The AMD [1022:15b8] USB controller loses some internal functional
  MSI-X context when transitioning from D0 to D3hot.  BIOS normally
  traps D0->D3hot and D3hot->D0 transitions so it can save and restore
  that internal context, but some firmware in the field lacks this
  workaround.

  If MSI-X is enabled, toggle the PCI_MSIX_FLAGS_ENABLE bit when
  resuming to D0, which resynchronizes the internal state that was
  lost in D3hot.

Rafael, do we run the DECLARE_PCI_FIXUP_RESUME_EARLY quirks for *all*
D3hot->D0 transitions?

I'm concerned about places like pci_pm_reset(), where we do
D0->D3hot->D0 to do the reset.  Or vfio_pm_config_write(), where it
looks like a guest could do that without running the quirk.

Current proposed patch is:
https://lore.kernel.org/r/ddbbfb50-24b6-202f-7452-c8959901c739@amd.com

Bjorn

  reply	other threads:[~2023-03-20 17:20 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-06  7:23 [PATCH] PCI: Add quirk to clear MSI-X Basavaraj Natikar
2023-03-06  8:14 ` Thomas Glanzmann
2023-03-08 22:44 ` Bjorn Helgaas
2023-03-08 23:04   ` Limonciello, Mario
2023-03-09  7:34     ` Basavaraj Natikar
2023-03-09 18:25       ` Bjorn Helgaas
2023-03-09 18:32         ` Limonciello, Mario
2023-03-09 22:30           ` Bjorn Helgaas
2023-03-10  0:57             ` Mario Limonciello
2023-03-10  7:41               ` Basavaraj Natikar
2023-03-10 22:13               ` Bjorn Helgaas
2023-03-20  1:32                 ` Limonciello, Mario
2023-03-20 17:14                   ` Bjorn Helgaas [this message]
2023-03-20 17:20                     ` Limonciello, Mario
2023-03-20 19:36                       ` Bjorn Helgaas
2023-03-20 19:47                         ` Limonciello, Mario
2023-03-20 21:30                           ` Bjorn Helgaas
2023-03-20 21:37                             ` Limonciello, Mario
2023-03-20 22:08                               ` Bjorn Helgaas
2023-03-20 22:52                                 ` Mario Limonciello
2023-03-21 11:07                                   ` Bjorn Helgaas
2023-03-28 13:15                                     ` Basavaraj Natikar
2023-03-28 13:25                                       ` Limonciello, Mario
2023-03-28 17:42                                       ` Bjorn Helgaas
2023-03-10  7:22         ` Basavaraj Natikar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230320171447.GA2293285@bhelgaas \
    --to=helgaas@kernel.org \
    --cc=Basavaraj.Natikar@amd.com \
    --cc=Mario.Limonciello@amd.com \
    --cc=bhelgaas@google.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=rjw@rjwysocki.net \
    --cc=thomas@glanzmann.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox