From: Yang Su <yang.su@linux.alibaba.com>
To: Lukas Wunner <lukas@wunner.de>,
Bjorn Helgaas <helgaas@kernel.org>,
linux-pci@vger.kernel.org
Cc: Keith Busch <kbusch@kernel.org>, Ashok Raj <ashok.raj@intel.com>,
Sathyanarayanan Kuppuswamy
<sathyanarayanan.kuppuswamy@linux.intel.com>,
Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com>,
Mika Westerberg <mika.westerberg@linux.intel.com>,
Sheng Bi <windy.bi.enflame@gmail.com>,
Stanislav Spassov <stanspas@amazon.de>,
shuo.tan@linux.alibaba.com
Subject: Re: [PATCH v2 3/3] PCI/DPC: Await readiness of secondary bus after reset
Date: Sat, 18 Feb 2023 21:23:47 +0800 [thread overview]
Message-ID: <5d5ee171-18e5-f1b8-d08a-0d88f8eb3a3f@linux.alibaba.com> (raw)
In-Reply-To: <9f5ff00e1593d8d9a4b452398b98aa14d23fca11.1673769517.git.lukas@wunner.de>
Hi Lucas,
I do not understand why pci_bridge_wait_for_secondary_bus() can fix
Intel's Ponte Vecchio HPC GPU
after a DPC-induced Hot Reset.
The func pci_bridge_wait_for_secondary_bus() also use
pcie_wait_for_link_delay() which time depends on
the max device delay time of one bus, for the GPU which bus only one
device, I think the time is 100ms as
the input parater in pcie_wait_for_link_delay().
pcie_wait_for_link() also wait fixed 100ms and then wait the device data
link is ready. So another wait time
is pci_dev_wait() in your patch? pci_dev_wait() to receive the CRS from
the device to check the device
whether is ready.
Please help me understand which difference work.
On 2023/1/15 16:20, Lukas Wunner wrote:
> pci_bridge_wait_for_secondary_bus() is called after a Secondary Bus
> Reset, but not after a DPC-induced Hot Reset.
>
> As a result, the delays prescribed by PCIe r6.0 sec 6.6.1 are not
> observed and devices on the secondary bus may be accessed before
> they're ready.
>
> One affected device is Intel's Ponte Vecchio HPC GPU. It comprises a
> PCIe switch whose upstream port is not immediately ready after reset.
> Because its config space is restored too early, it remains in
> D0uninitialized, its subordinate devices remain inaccessible and DPC
> recovery fails with messages such as:
>
> i915 0000:8c:00.0: can't change power state from D3cold to D0 (config space inaccessible)
> intel_vsec 0000:8e:00.1: can't change power state from D3cold to D0 (config space inaccessible)
> pcieport 0000:89:02.0: AER: device recovery failed
>
> Fix it.
>
> Tested-by: Ravi Kishore Koppuravuri<ravi.kishore.koppuravuri@intel.com>
> Signed-off-by: Lukas Wunner<lukas@wunner.de>
> Reviewed-by: Mika Westerberg<mika.westerberg@linux.intel.com>
> Cc:stable@vger.kernel.org
> ---
> Changes v1 -> v2:
> * Move PCIE_RESET_READY_POLL_MS macro below the newly introduced
> PCI_RESET_WAIT from patch [2/3] and extend its code comment
> * Mention errors seen on Ponte Vecchio in commit message (Bjorn)
> * Avoid first person plural in commit message (Sathyanarayanan)
> * Add Reviewed-by tag (Mika)
>
> drivers/pci/pci.c | 3 ---
> drivers/pci/pci.h | 6 ++++++
> drivers/pci/pcie/dpc.c | 4 ++--
> 3 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 509f6b5c9e14..d31c21ea9688 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -167,9 +167,6 @@ static int __init pcie_port_pm_setup(char *str)
> }
> __setup("pcie_port_pm=", pcie_port_pm_setup);
>
> -/* Time to wait after a reset for device to become responsive */
> -#define PCIE_RESET_READY_POLL_MS 60000
> -
> /**
> * pci_bus_max_busnr - returns maximum PCI bus number of given bus' children
> * @bus: pointer to PCI bus structure to search
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index ce1fc3a90b3f..8f5d4bd5b410 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -70,6 +70,12 @@ struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev,
> * Reset (PCIe r6.0 sec 5.8).
> */
> #define PCI_RESET_WAIT 1000 /* msec */
> +/*
> + * Devices may extend the 1 sec period through Request Retry Status completions
> + * (PCIe r6.0 sec 2.3.1). The spec does not provide an upper limit, but 60 sec
> + * ought to be enough for any device to become responsive.
> + */
> +#define PCIE_RESET_READY_POLL_MS 60000 /* msec */
>
> void pci_update_current_state(struct pci_dev *dev, pci_power_t state);
> void pci_refresh_power_state(struct pci_dev *dev);
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index f5ffea17c7f8..a5d7c69b764e 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -170,8 +170,8 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
> pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
> PCI_EXP_DPC_STATUS_TRIGGER);
>
> - if (!pcie_wait_for_link(pdev, true)) {
> - pci_info(pdev, "Data Link Layer Link Active not set in 1000 msec\n");
> + if (pci_bridge_wait_for_secondary_bus(pdev, "DPC",
> + PCIE_RESET_READY_POLL_MS)) {
> clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
> ret = PCI_ERS_RESULT_DISCONNECT;
> } else {
Thanks,
Yang
next prev parent reply other threads:[~2023-02-18 13:23 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-15 8:20 [PATCH v2 0/3] PCI reset delay fixes Lukas Wunner
2023-01-15 8:20 ` [PATCH v2 1/3] PCI/PM: Observe reset delay irrespective of bridge_d3 Lukas Wunner
2023-02-18 13:22 ` Yang Su
2023-02-19 5:07 ` Lukas Wunner
2023-01-15 8:20 ` [PATCH v2 2/3] PCI: Unify delay handling for reset and resume Lukas Wunner
2023-02-23 11:01 ` Yang Su
2023-03-01 6:31 ` Lukas Wunner
2023-01-15 8:20 ` [PATCH v2 3/3] PCI/DPC: Await readiness of secondary bus after reset Lukas Wunner
2023-02-18 13:23 ` Yang Su [this message]
2023-02-19 5:12 ` Lukas Wunner
2023-02-07 19:03 ` [PATCH v2 0/3] PCI reset delay fixes Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5d5ee171-18e5-f1b8-d08a-0d88f8eb3a3f@linux.alibaba.com \
--to=yang.su@linux.alibaba.com \
--cc=ashok.raj@intel.com \
--cc=helgaas@kernel.org \
--cc=kbusch@kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=mika.westerberg@linux.intel.com \
--cc=ravi.kishore.koppuravuri@intel.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shuo.tan@linux.alibaba.com \
--cc=stanspas@amazon.de \
--cc=windy.bi.enflame@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).