From: Brian Norris <briannorris@chromium.org>
To: manivannan.sadhasivam@oss.qualcomm.com
Cc: "Bjorn Helgaas" <bhelgaas@google.com>,
"Mahesh J Salgaonkar" <mahesh@linux.ibm.com>,
"Oliver O'Halloran" <oohall@gmail.com>,
"Will Deacon" <will@kernel.org>,
"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
"Manivannan Sadhasivam" <mani@kernel.org>,
"Rob Herring" <robh@kernel.org>,
"Heiko Stuebner" <heiko@sntech.de>,
"Philipp Zabel" <p.zabel@pengutronix.de>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org,
linux-arm-kernel@lists.infradead.org,
linux-arm-msm@vger.kernel.org,
linux-rockchip@lists.infradead.org,
"Niklas Cassel" <cassel@kernel.org>,
"Wilfred Mallawa" <wilfred.mallawa@wdc.com>,
"Krishna Chaitanya Chundru" <krishna.chundru@oss.qualcomm.com>,
"Lukas Wunner" <lukas@wunner.de>
Subject: Re: [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Ports
Date: Thu, 28 Aug 2025 13:25:12 -0700 [thread overview]
Message-ID: <aLC7KIoi-LoH2en4@google.com> (raw)
In-Reply-To: <20250715-pci-port-reset-v6-2-6f9cce94e7bb@oss.qualcomm.com>
Hi,
I've been testing this out with various endpoints (both upstream and
not...), and I have a question that intersects with this area:
On Tue, Jul 15, 2025 at 07:51:05PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> From: Manivannan Sadhasivam <mani@kernel.org>
>
> The PCI link, when down, needs to be recovered to bring it back. But on
> some platforms, that cannot be done in a generic way as link recovery
> procedure is platform specific. So add a new API
> pci_host_handle_link_down() that could be called by the host bridge drivers
> for a specific Root Port when the link goes down.
>
> The API accepts the 'pci_dev' corresponding to the Root Port which observed
> the link down event. If CONFIG_PCIEAER is enabled, the API calls
> pcie_do_recovery() function with 'pci_channel_io_frozen' as the state. This
> will result in the execution of the AER Fatal error handling code. Since
> the link down recovery is pretty much the same as AER Fatal error handling,
> pcie_do_recovery() helper is reused here. First, the AER error_detected()
> callback will be triggered for the bridge and then for the downstream
> devices.
I've been trying to understand what exactly the .error_detected()
involvement should be here (and what it actually does, despite the
docs), and especially around its return codes.
Specifically, I'm trying to see what's supposed to happen with
PCI_ERS_RESULT_CAN_RECOVER. I see that for pci_channel_io_frozen, almost
all endpoint drivers return PCI_ERS_RESULT_NEED_RESET, but if drivers
actually return PCI_ERS_RESULT_CAN_RECOVER, it's unclear what should
happen.
Today, we don't actually respect it; pcie_do_recovery() just calls
reset_subordinates() (pci_host_reset_root_port()) unconditionally. The
only thing that return code affects is whether we call
report_mmio_enabled() vs report_slot_reset() afterward. This seems odd.
It also doesn't totally match the docs:
https://docs.kernel.org/PCI/pcieaer-howto.html#non-correctable-non-fatal-and-fatal-errors
https://docs.kernel.org/PCI/pci-error-recovery.html
e.g., "PCI_ERS_RESULT_CAN_RECOVER
Driver returns this if it thinks it might be able to recover the HW by
just banging IOs or if it wants to be given a chance to extract some
diagnostic information (see mmio_enable, below)."
I've seen drivers that think they want to handle stuff on their own --
for example, if they have a handle to an external PMIC, they may try to
reset things that way -- and so they return PCI_ERS_RESULT_CAN_RECOVER
even for io_frozen. I'm not convinced that's a great idea, but I'm also
not sure what to say about the docs.
On the flip side: it's not clear
PCI_ERS_RESULT_NEED_RESET+pci_channel_io_normal works as documented
either. An endpoint might think it's requesting a slot reset, but
pcie_do_recovery() will ignore that and skip reset_subordinates()
(pci_host_reset_root_port()).
All in all, the docs sound like endpoints _should_ have control over
whether we exercise a full port/slot reset for all types of errors. But
in practice, we do not actually give it that control. i.e., your commit
message is correct, and the docs are not.
I have half a mind to suggest the appended change, so the behavior
matches (some of) the docs a little better [1].
Brian
> Finally, pci_host_reset_root_port() will be called for the Root
> Port, which will reset the Root Port using 'reset_root_port' callback to
> recover the link. Once that's done, resume message will be broadcasted to
> the bridge and the downstream devices, indicating successful link recovery.
>
> But if CONFIG_PCIEAER is not enabled in the kernel, only
> pci_host_reset_root_port() API will be called, which will in turn call
> pci_bus_error_reset() to just reset the Root Port as there is no way we
> could inform the drivers about link recovery.
>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
[1]
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -219,13 +219,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_dbg(bridge, "broadcast error_detected message\n");
if (state == pci_channel_io_frozen) {
pci_walk_bridge(bridge, report_frozen_detected, &status);
- if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
- pci_warn(bridge, "subordinate device reset failed\n");
- goto failed;
- }
} else {
pci_walk_bridge(bridge, report_normal_detected, &status);
}
+ pci_dbg(bridge, "error_detected result: %d\n", status);
if (status == PCI_ERS_RESULT_CAN_RECOVER) {
status = PCI_ERS_RESULT_RECOVERED;
@@ -234,6 +231,11 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
}
if (status == PCI_ERS_RESULT_NEED_RESET) {
+ if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
+ pci_warn(bridge, "subordinate device reset failed\n");
+ goto failed;
+ }
+
status = PCI_ERS_RESULT_RECOVERED;
pci_dbg(bridge, "broadcast slot_reset message\n");
pci_walk_bridge(bridge, report_slot_reset, &status);
next prev parent reply other threads:[~2025-08-28 20:25 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-15 14:21 [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam via B4 Relay
2025-07-15 14:21 ` [PATCH v6 1/4] PCI/ERR: " Manivannan Sadhasivam via B4 Relay
2025-07-17 18:28 ` [PATCH v6 1/4] PCI/ERR: Add support for resetting the Root Ports in a platform specific wayy Frank Li
2025-07-15 14:21 ` [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Ports Manivannan Sadhasivam via B4 Relay
2025-07-17 18:31 ` [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Portsy Frank Li
2025-08-28 20:25 ` Brian Norris [this message]
2025-08-29 8:35 ` [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Ports Lukas Wunner
2025-08-29 23:58 ` Brian Norris
2025-07-15 14:21 ` [PATCH v6 3/4] PCI: qcom: Add support for resetting the Root Port due to link down event Manivannan Sadhasivam via B4 Relay
2025-07-15 14:21 ` [PATCH v6 4/4] PCI: dw-rockchip: Add support to reset Root Port upon " Manivannan Sadhasivam via B4 Relay
2025-07-18 3:58 ` [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way Krishna Chaitanya Chundru
2025-07-18 10:28 ` Niklas Cassel
2025-07-18 10:39 ` Niklas Cassel
2025-07-24 5:30 ` Manivannan Sadhasivam
2025-08-15 9:07 ` Niklas Cassel
2025-08-29 16:14 ` Manivannan Sadhasivam
2025-09-04 14:03 ` Niklas Cassel
2025-07-24 9:28 ` Hongxing Zhu
2025-08-28 20:01 ` Brian Norris
2025-08-29 13:56 ` Manivannan Sadhasivam
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aLC7KIoi-LoH2en4@google.com \
--to=briannorris@chromium.org \
--cc=bhelgaas@google.com \
--cc=cassel@kernel.org \
--cc=heiko@sntech.de \
--cc=krishna.chundru@oss.qualcomm.com \
--cc=kwilczynski@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rockchip@lists.infradead.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lpieralisi@kernel.org \
--cc=lukas@wunner.de \
--cc=mahesh@linux.ibm.com \
--cc=mani@kernel.org \
--cc=manivannan.sadhasivam@oss.qualcomm.com \
--cc=oohall@gmail.com \
--cc=p.zabel@pengutronix.de \
--cc=robh@kernel.org \
--cc=wilfred.mallawa@wdc.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).