From: Lukas Wunner <lukas@wunner.de>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: linux-pci@vger.kernel.org, Keith Busch <kbusch@kernel.org>,
Ashok Raj <ashok.raj@intel.com>,
Sathyanarayanan Kuppuswamy
<sathyanarayanan.kuppuswamy@linux.intel.com>,
Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com>,
Mika Westerberg <mika.westerberg@linux.intel.com>,
Sheng Bi <windy.bi.enflame@gmail.com>,
Stanislav Spassov <stanspas@amazon.de>,
Yang Su <yang.su@linux.alibaba.com>
Subject: Re: [PATCH 3/3] PCI/DPC: Await readiness of secondary bus after reset
Date: Fri, 13 Jan 2023 10:10:23 +0100 [thread overview]
Message-ID: <20230113091023.GA29495@wunner.de> (raw)
In-Reply-To: <20230112223533.GA1798809@bhelgaas>
On Thu, Jan 12, 2023 at 04:35:33PM -0600, Bjorn Helgaas wrote:
> On Sat, Dec 31, 2022 at 07:33:39PM +0100, Lukas Wunner wrote:
> > We're calling pci_bridge_wait_for_secondary_bus() after performing a
> > Secondary Bus Reset, but neglect to do the same after coming out of a
> > DPC-induced Hot Reset. As a result, we're not observing the delays
> > prescribed by PCIe r6.0 sec 6.6.1 and may access devices on the
> > secondary bus before they're ready. Fix it.
> >
> > Tested-by: Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com>
>
> I assume this patch is the one that makes the difference for the
> Intel Ponte Vecchio HPC GPU?
Right.
> Is there a URL to a problem report, or
> at least a sentence or two we can include here to connect the patch
> with the problem users may see?
There's no public problem report. My understanding is that Ponte Vecchio
was formally launched this Tuesday and mass distribution starts only now:
https://www.tomshardware.com/news/intel-launches-sapphire-rapids-fourth-gen-xeon-cpus-and-ponte-vecchio-max-gpu-series
The idea is to get the issue in the kernel fixed early so that users will
never even see it.
> Most people won't know how to
> recognize accesses to devices on the secondary bus before they're
> ready.
With Ponte Vecchio, the GPU is located below a PCIe switch and the
Downstream Port Containment happens at the Root Port. So the Root
Port needs to wait for the Switch Upstream Port to re-appear.
Because config space is currently restored too early on the Switch
Upstream Port, it remains in D0uninitialized once it comes out of
reset, so all its registers, in particular the bridge windows,
are in power-on reset state. As a result, anything downstream of it
(including the GPU) remains inaccessible and the user-visible
error messages look like this:
i915 0000:8c:00.0: can't change power state from D3cold to D0 (config space inaccessible)
intel_vsec 0000:8e:00.1: can't change power state from D3cold to D0 (config space inaccessible)
Where intel_vsec is a sibling of the GPU which is used for
telemetry I believe.
I'll be sure to include that additional information in the commit
message when respinning.
Thanks,
Lukas
next prev parent reply other threads:[~2023-01-13 9:27 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-31 18:33 [PATCH 0/3] PCI reset delay fixes Lukas Wunner
2022-12-31 18:33 ` [PATCH 1/3] PCI/PM: Observe reset delay irrespective of bridge_d3 Lukas Wunner
2023-01-03 19:50 ` Sathyanarayanan Kuppuswamy
2022-12-31 18:33 ` [PATCH 2/3] PCI: Unify delay handling for reset and resume Lukas Wunner
2023-01-03 19:50 ` Sathyanarayanan Kuppuswamy
2023-01-12 22:31 ` Bjorn Helgaas
2022-12-31 18:33 ` [PATCH 3/3] PCI/DPC: Await readiness of secondary bus after reset Lukas Wunner
2023-01-03 19:49 ` Sathyanarayanan Kuppuswamy
2023-01-12 22:35 ` Bjorn Helgaas
[not found] ` <15135d89-0515-d965-567b-79b3eca236e6@linux.alibaba.com>
2023-01-13 3:06 ` Bjorn Helgaas
2023-01-13 10:18 ` Lukas Wunner
2023-01-13 9:10 ` Lukas Wunner [this message]
2023-01-03 11:09 ` [PATCH 0/3] PCI reset delay fixes Mika Westerberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230113091023.GA29495@wunner.de \
--to=lukas@wunner.de \
--cc=ashok.raj@intel.com \
--cc=helgaas@kernel.org \
--cc=kbusch@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mika.westerberg@linux.intel.com \
--cc=ravi.kishore.koppuravuri@intel.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=stanspas@amazon.de \
--cc=windy.bi.enflame@gmail.com \
--cc=yang.su@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox