From: Lukas Wunner <lukas@wunner.de>
To: Bjorn Helgaas <bjorn.helgaas@gmail.com>
Cc: Yang Su <yang.su@linux.alibaba.com>,
Bjorn Helgaas <helgaas@kernel.org>,
linux-pci@vger.kernel.org, Keith Busch <kbusch@kernel.org>,
Ashok Raj <ashok.raj@intel.com>,
Sathyanarayanan Kuppuswamy
<sathyanarayanan.kuppuswamy@linux.intel.com>,
Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com>,
Mika Westerberg <mika.westerberg@linux.intel.com>,
Sheng Bi <windy.bi.enflame@gmail.com>,
Stanislav Spassov <stanspas@amazon.de>
Subject: Re: [PATCH 3/3] PCI/DPC: Await readiness of secondary bus after reset
Date: Fri, 13 Jan 2023 11:18:57 +0100 [thread overview]
Message-ID: <20230113101857.GB29495@wunner.de> (raw)
In-Reply-To: <CABhMZUWYxN0iuCJumGVH123E52L17Ow-De5FuX=bfDF3o6A_-A@mail.gmail.com>
On Thu, Jan 12, 2023 at 09:06:15PM -0600, Bjorn Helgaas wrote:
> On Thu, Jan 12, 2023 at 8:39 PM Yang Su <yang.su@linux.alibaba.com> wrote:
> > I also test pci_bridge_wait_for_secondary_bus in NVIDIA GPU T4
> > which bind vfio passthrough in VMM, I found the
> > pci_bridge_wait_for_secondary_bus can not wait the enough time
> > as pci spec requires, the reasons are described as below.
[...]
> I'll wait for you and Lukas to continue this conversation on the
> mailing list.
Yang Su reports that pci_bridge_secondary_bus_reset() is called for
an Nvidia T4 GPU. That's an endpoint device but the function should
only ever be called for bridges. It's unclear to me how that can happen.
The call stack looks like this:
vfio_pci_open()
pci_set_power_state()
pci_clear_master()
pci_enable_device()
pci_try_reset_function()
pci_dev_trylock()
pci_dev_save_and_disable()
pci_dev_specific_reset()
pcie_has_flr()
pcie_af_flr()
pci_read_config_word()
pci_dev_reset_slot_function()
pci_parent_bus_reset()
pci_bridge_secondary_bus_reset()
pcibios_reset_secondary_bus()
pci_reset_secondary_bus()
pci_read_config_word()
pci_write_config_word()
pci_write_config_word()
pcie_wait_for_link_delay()
pci_dev_wait()
pci_dev_restore()
pci_cfg_access_unlock()
pci_save_state()
pci_store_saved_state()
vfio_config_init()
Note that vfio_pci_open() was renamed to vfio_pci_open_device() with
commit 2cd8b14aaa66, which went into v5.15. So apparently this call
stack is from an earlier kernel.
The GPU is located below a PEX9797 switch, which is connected to a
SkyLake-E Root Port. So pci_bridge_secondary_bus_reset() should be
called for the Switch Downstream Port but Yang Su says it's called
for the GPU endpoint device.
pci_parent_bus_reset() finds the parent by following dev->bus->self.
I've suggested to Yang Su to replace that with pci_upstream_bridge(dev)
and see if it fixes the issue. It does make a difference in SR-IOV
scenarios (see the comment in include/linux/pci.h above pci_is_root_bus())
though Yang Su reports that SR-IOV is not used on this machine,
only vfio passthrough.
I believe that endpoint devices don't have a Bridge Control Register
(Type 0 Config Space has Max_Lat / Min_Gnt instead), so setting the
Secondary Bus Reset bit should have no effect. But apparently it
does have an effect because Yang Su is witnessing issues with the delay
after reset.
Specifically, Yang Su says that pci_bridge_wait_for_secondary_bus()
bails out because the GPU endpoint device fails the !pci_is_bridge()
check at the top of the function. Also, the calculation of
pci_bus_max_d3cold_delay() fails because the GPU lacks a subordinate bus.
Apparently the unconditional 1 second delay in pci_reset_secondary_bus()
papers over the issue because it waits long enough for the GPU endpoint
device to come out of reset.
Maybe the information that pci_bridge_secondary_bus_reset() is executed
for an endpoint device is not correct and it's in fact executed for
the Downstream Port of the PEX9797 switch, but perhaps config space
of the port is broken and contains an incorrect Header Type field,
which would cause pci_is_bridge() to return false for it. Though we
wouldn't scan the secondary bus below the switch in that case,
so the GPU wouldn't be enumerated. I've requested "lspci -vvv" output
for the GPU and the Downstream Port off-list but haven't received it yet.
Thanks,
Lukas
next prev parent reply other threads:[~2023-01-13 10:28 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-31 18:33 [PATCH 0/3] PCI reset delay fixes Lukas Wunner
2022-12-31 18:33 ` [PATCH 1/3] PCI/PM: Observe reset delay irrespective of bridge_d3 Lukas Wunner
2023-01-03 19:50 ` Sathyanarayanan Kuppuswamy
2022-12-31 18:33 ` [PATCH 2/3] PCI: Unify delay handling for reset and resume Lukas Wunner
2023-01-03 19:50 ` Sathyanarayanan Kuppuswamy
2023-01-12 22:31 ` Bjorn Helgaas
2022-12-31 18:33 ` [PATCH 3/3] PCI/DPC: Await readiness of secondary bus after reset Lukas Wunner
2023-01-03 19:49 ` Sathyanarayanan Kuppuswamy
2023-01-12 22:35 ` Bjorn Helgaas
[not found] ` <15135d89-0515-d965-567b-79b3eca236e6@linux.alibaba.com>
2023-01-13 3:06 ` Bjorn Helgaas
2023-01-13 10:18 ` Lukas Wunner [this message]
2023-01-13 9:10 ` Lukas Wunner
2023-01-03 11:09 ` [PATCH 0/3] PCI reset delay fixes Mika Westerberg
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230113101857.GB29495@wunner.de \
--to=lukas@wunner.de \
--cc=ashok.raj@intel.com \
--cc=bjorn.helgaas@gmail.com \
--cc=helgaas@kernel.org \
--cc=kbusch@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mika.westerberg@linux.intel.com \
--cc=ravi.kishore.koppuravuri@intel.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=stanspas@amazon.de \
--cc=windy.bi.enflame@gmail.com \
--cc=yang.su@linux.alibaba.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox