linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yang Su <yang.su@linux.alibaba.com>
To: Lukas Wunner <lukas@wunner.de>,
	Bjorn Helgaas <helgaas@kernel.org>,
	linux-pci@vger.kernel.org
Cc: Keith Busch <kbusch@kernel.org>, Ashok Raj <ashok.raj@intel.com>,
	Sathyanarayanan Kuppuswamy 
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@intel.com>,
	Mika Westerberg <mika.westerberg@linux.intel.com>,
	Sheng Bi <windy.bi.enflame@gmail.com>,
	Stanislav Spassov <stanspas@amazon.de>,
	shuo.tan@linux.alibaba.com
Subject: Re: [PATCH v2 3/3] PCI/DPC: Await readiness of secondary bus after reset
Date: Sat, 18 Feb 2023 21:23:47 +0800	[thread overview]
Message-ID: <5d5ee171-18e5-f1b8-d08a-0d88f8eb3a3f@linux.alibaba.com> (raw)
In-Reply-To: <9f5ff00e1593d8d9a4b452398b98aa14d23fca11.1673769517.git.lukas@wunner.de>

Hi Lucas,

I do not understand why pci_bridge_wait_for_secondary_bus() can fix 
Intel's Ponte Vecchio HPC GPU

after a DPC-induced Hot Reset.


The func pci_bridge_wait_for_secondary_bus() also use 
pcie_wait_for_link_delay() which time depends on

the max device delay time of one bus, for the GPU which bus only one 
device, I think the time is 100ms as

the input parater in pcie_wait_for_link_delay().


pcie_wait_for_link() also wait fixed 100ms and then wait the device data 
link is ready. So another wait time

is pci_dev_wait() in your patch? pci_dev_wait() to receive the CRS from 
the device to check the device

whether is ready.


Please help me understand which difference work.


On 2023/1/15 16:20, Lukas Wunner wrote:
> pci_bridge_wait_for_secondary_bus() is called after a Secondary Bus
> Reset, but not after a DPC-induced Hot Reset.
>
> As a result, the delays prescribed by PCIe r6.0 sec 6.6.1 are not
> observed and devices on the secondary bus may be accessed before
> they're ready.
>
> One affected device is Intel's Ponte Vecchio HPC GPU.  It comprises a
> PCIe switch whose upstream port is not immediately ready after reset.
> Because its config space is restored too early, it remains in
> D0uninitialized, its subordinate devices remain inaccessible and DPC
> recovery fails with messages such as:
>
> i915 0000:8c:00.0: can't change power state from D3cold to D0 (config space inaccessible)
> intel_vsec 0000:8e:00.1: can't change power state from D3cold to D0 (config space inaccessible)
> pcieport 0000:89:02.0: AER: device recovery failed
>
> Fix it.
>
> Tested-by: Ravi Kishore Koppuravuri<ravi.kishore.koppuravuri@intel.com>
> Signed-off-by: Lukas Wunner<lukas@wunner.de>
> Reviewed-by: Mika Westerberg<mika.westerberg@linux.intel.com>
> Cc:stable@vger.kernel.org
> ---
> Changes v1 -> v2:
>   * Move PCIE_RESET_READY_POLL_MS macro below the newly introduced
>     PCI_RESET_WAIT from patch [2/3] and extend its code comment
>   * Mention errors seen on Ponte Vecchio in commit message (Bjorn)
>   * Avoid first person plural in commit message (Sathyanarayanan)
>   * Add Reviewed-by tag (Mika)
>
>   drivers/pci/pci.c      | 3 ---
>   drivers/pci/pci.h      | 6 ++++++
>   drivers/pci/pcie/dpc.c | 4 ++--
>   3 files changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 509f6b5c9e14..d31c21ea9688 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -167,9 +167,6 @@ static int __init pcie_port_pm_setup(char *str)
>   }
>   __setup("pcie_port_pm=", pcie_port_pm_setup);
>   
> -/* Time to wait after a reset for device to become responsive */
> -#define PCIE_RESET_READY_POLL_MS 60000
> -
>   /**
>    * pci_bus_max_busnr - returns maximum PCI bus number of given bus' children
>    * @bus: pointer to PCI bus structure to search
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index ce1fc3a90b3f..8f5d4bd5b410 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -70,6 +70,12 @@ struct pci_cap_saved_state *pci_find_saved_ext_cap(struct pci_dev *dev,
>    * Reset (PCIe r6.0 sec 5.8).
>    */
>   #define PCI_RESET_WAIT		1000	/* msec */
> +/*
> + * Devices may extend the 1 sec period through Request Retry Status completions
> + * (PCIe r6.0 sec 2.3.1).  The spec does not provide an upper limit, but 60 sec
> + * ought to be enough for any device to become responsive.
> + */
> +#define PCIE_RESET_READY_POLL_MS 60000	/* msec */
>   
>   void pci_update_current_state(struct pci_dev *dev, pci_power_t state);
>   void pci_refresh_power_state(struct pci_dev *dev);
> diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c
> index f5ffea17c7f8..a5d7c69b764e 100644
> --- a/drivers/pci/pcie/dpc.c
> +++ b/drivers/pci/pcie/dpc.c
> @@ -170,8 +170,8 @@ pci_ers_result_t dpc_reset_link(struct pci_dev *pdev)
>   	pci_write_config_word(pdev, cap + PCI_EXP_DPC_STATUS,
>   			      PCI_EXP_DPC_STATUS_TRIGGER);
>   
> -	if (!pcie_wait_for_link(pdev, true)) {
> -		pci_info(pdev, "Data Link Layer Link Active not set in 1000 msec\n");
> +	if (pci_bridge_wait_for_secondary_bus(pdev, "DPC",
> +					      PCIE_RESET_READY_POLL_MS)) {
>   		clear_bit(PCI_DPC_RECOVERED, &pdev->priv_flags);
>   		ret = PCI_ERS_RESULT_DISCONNECT;
>   	} else {


Thanks,


Yang


  reply	other threads:[~2023-02-18 13:23 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-15  8:20 [PATCH v2 0/3] PCI reset delay fixes Lukas Wunner
2023-01-15  8:20 ` [PATCH v2 1/3] PCI/PM: Observe reset delay irrespective of bridge_d3 Lukas Wunner
2023-02-18 13:22   ` Yang Su
2023-02-19  5:07     ` Lukas Wunner
2023-01-15  8:20 ` [PATCH v2 2/3] PCI: Unify delay handling for reset and resume Lukas Wunner
2023-02-23 11:01   ` Yang Su
2023-03-01  6:31     ` Lukas Wunner
2023-01-15  8:20 ` [PATCH v2 3/3] PCI/DPC: Await readiness of secondary bus after reset Lukas Wunner
2023-02-18 13:23   ` Yang Su [this message]
2023-02-19  5:12     ` Lukas Wunner
2023-02-07 19:03 ` [PATCH v2 0/3] PCI reset delay fixes Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5d5ee171-18e5-f1b8-d08a-0d88f8eb3a3f@linux.alibaba.com \
    --to=yang.su@linux.alibaba.com \
    --cc=ashok.raj@intel.com \
    --cc=helgaas@kernel.org \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mika.westerberg@linux.intel.com \
    --cc=ravi.kishore.koppuravuri@intel.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=shuo.tan@linux.alibaba.com \
    --cc=stanspas@amazon.de \
    --cc=windy.bi.enflame@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).