From: Brian Norris <briannorris@chromium.org>
To: manivannan.sadhasivam@oss.qualcomm.com
Cc: "Bjorn Helgaas" <bhelgaas@google.com>,
"Mahesh J Salgaonkar" <mahesh@linux.ibm.com>,
"Oliver O'Halloran" <oohall@gmail.com>,
"Will Deacon" <will@kernel.org>,
"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
"Manivannan Sadhasivam" <mani@kernel.org>,
"Rob Herring" <robh@kernel.org>,
"Heiko Stuebner" <heiko@sntech.de>,
"Philipp Zabel" <p.zabel@pengutronix.de>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org,
linux-arm-kernel@lists.infradead.org,
linux-arm-msm@vger.kernel.org,
linux-rockchip@lists.infradead.org,
"Niklas Cassel" <cassel@kernel.org>,
"Wilfred Mallawa" <wilfred.mallawa@wdc.com>,
"Krishna Chaitanya Chundru" <krishna.chundru@oss.qualcomm.com>,
"Lukas Wunner" <lukas@wunner.de>
Subject: Re: [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Ports
Date: Thu, 28 Aug 2025 13:25:12 -0700 [thread overview]
Message-ID: <aLC7KIoi-LoH2en4@google.com> (raw)
In-Reply-To: <20250715-pci-port-reset-v6-2-6f9cce94e7bb@oss.qualcomm.com>
Hi,
I've been testing this out with various endpoints (both upstream and
not...), and I have a question that intersects with this area:
On Tue, Jul 15, 2025 at 07:51:05PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> From: Manivannan Sadhasivam <mani@kernel.org>
>
> The PCI link, when down, needs to be recovered to bring it back. But on
> some platforms, that cannot be done in a generic way as link recovery
> procedure is platform specific. So add a new API
> pci_host_handle_link_down() that could be called by the host bridge drivers
> for a specific Root Port when the link goes down.
>
> The API accepts the 'pci_dev' corresponding to the Root Port which observed
> the link down event. If CONFIG_PCIEAER is enabled, the API calls
> pcie_do_recovery() function with 'pci_channel_io_frozen' as the state. This
> will result in the execution of the AER Fatal error handling code. Since
> the link down recovery is pretty much the same as AER Fatal error handling,
> pcie_do_recovery() helper is reused here. First, the AER error_detected()
> callback will be triggered for the bridge and then for the downstream
> devices.
I've been trying to understand what exactly the .error_detected()
involvement should be here (and what it actually does, despite the
docs), and especially around its return codes.
Specifically, I'm trying to see what's supposed to happen with
PCI_ERS_RESULT_CAN_RECOVER. I see that for pci_channel_io_frozen, almost
all endpoint drivers return PCI_ERS_RESULT_NEED_RESET, but if drivers
actually return PCI_ERS_RESULT_CAN_RECOVER, it's unclear what should
happen.
Today, we don't actually respect it; pcie_do_recovery() just calls
reset_subordinates() (pci_host_reset_root_port()) unconditionally. The
only thing that return code affects is whether we call
report_mmio_enabled() vs report_slot_reset() afterward. This seems odd.
It also doesn't totally match the docs:
https://docs.kernel.org/PCI/pcieaer-howto.html#non-correctable-non-fatal-and-fatal-errors
https://docs.kernel.org/PCI/pci-error-recovery.html
e.g., "PCI_ERS_RESULT_CAN_RECOVER
Driver returns this if it thinks it might be able to recover the HW by
just banging IOs or if it wants to be given a chance to extract some
diagnostic information (see mmio_enable, below)."
I've seen drivers that think they want to handle stuff on their own --
for example, if they have a handle to an external PMIC, they may try to
reset things that way -- and so they return PCI_ERS_RESULT_CAN_RECOVER
even for io_frozen. I'm not convinced that's a great idea, but I'm also
not sure what to say about the docs.
On the flip side: it's not clear
PCI_ERS_RESULT_NEED_RESET+pci_channel_io_normal works as documented
either. An endpoint might think it's requesting a slot reset, but
pcie_do_recovery() will ignore that and skip reset_subordinates()
(pci_host_reset_root_port()).
All in all, the docs sound like endpoints _should_ have control over
whether we exercise a full port/slot reset for all types of errors. But
in practice, we do not actually give it that control. i.e., your commit
message is correct, and the docs are not.
I have half a mind to suggest the appended change, so the behavior
matches (some of) the docs a little better [1].
Brian
> Finally, pci_host_reset_root_port() will be called for the Root
> Port, which will reset the Root Port using 'reset_root_port' callback to
> recover the link. Once that's done, resume message will be broadcasted to
> the bridge and the downstream devices, indicating successful link recovery.
>
> But if CONFIG_PCIEAER is not enabled in the kernel, only
> pci_host_reset_root_port() API will be called, which will in turn call
> pci_bus_error_reset() to just reset the Root Port as there is no way we
> could inform the drivers about link recovery.
>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
[1]
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -219,13 +219,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_dbg(bridge, "broadcast error_detected message\n");
if (state == pci_channel_io_frozen) {
pci_walk_bridge(bridge, report_frozen_detected, &status);
- if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
- pci_warn(bridge, "subordinate device reset failed\n");
- goto failed;
- }
} else {
pci_walk_bridge(bridge, report_normal_detected, &status);
}
+ pci_dbg(bridge, "error_detected result: %d\n", status);
if (status == PCI_ERS_RESULT_CAN_RECOVER) {
status = PCI_ERS_RESULT_RECOVERED;
@@ -234,6 +231,11 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
}
if (status == PCI_ERS_RESULT_NEED_RESET) {
+ if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
+ pci_warn(bridge, "subordinate device reset failed\n");
+ goto failed;
+ }
+
status = PCI_ERS_RESULT_RECOVERED;
pci_dbg(bridge, "broadcast slot_reset message\n");
pci_walk_bridge(bridge, report_slot_reset, &status);
WARNING: multiple messages have this Message-ID (diff)
From: Brian Norris <briannorris@chromium.org>
To: manivannan.sadhasivam@oss.qualcomm.com
Cc: "Bjorn Helgaas" <bhelgaas@google.com>,
"Mahesh J Salgaonkar" <mahesh@linux.ibm.com>,
"Oliver O'Halloran" <oohall@gmail.com>,
"Will Deacon" <will@kernel.org>,
"Lorenzo Pieralisi" <lpieralisi@kernel.org>,
"Krzysztof Wilczyński" <kwilczynski@kernel.org>,
"Manivannan Sadhasivam" <mani@kernel.org>,
"Rob Herring" <robh@kernel.org>,
"Heiko Stuebner" <heiko@sntech.de>,
"Philipp Zabel" <p.zabel@pengutronix.de>,
linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org,
linux-arm-kernel@lists.infradead.org,
linux-arm-msm@vger.kernel.org,
linux-rockchip@lists.infradead.org,
"Niklas Cassel" <cassel@kernel.org>,
"Wilfred Mallawa" <wilfred.mallawa@wdc.com>,
"Krishna Chaitanya Chundru" <krishna.chundru@oss.qualcomm.com>,
"Lukas Wunner" <lukas@wunner.de>
Subject: Re: [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Ports
Date: Thu, 28 Aug 2025 13:25:12 -0700 [thread overview]
Message-ID: <aLC7KIoi-LoH2en4@google.com> (raw)
In-Reply-To: <20250715-pci-port-reset-v6-2-6f9cce94e7bb@oss.qualcomm.com>
Hi,
I've been testing this out with various endpoints (both upstream and
not...), and I have a question that intersects with this area:
On Tue, Jul 15, 2025 at 07:51:05PM +0530, Manivannan Sadhasivam via B4 Relay wrote:
> From: Manivannan Sadhasivam <mani@kernel.org>
>
> The PCI link, when down, needs to be recovered to bring it back. But on
> some platforms, that cannot be done in a generic way as link recovery
> procedure is platform specific. So add a new API
> pci_host_handle_link_down() that could be called by the host bridge drivers
> for a specific Root Port when the link goes down.
>
> The API accepts the 'pci_dev' corresponding to the Root Port which observed
> the link down event. If CONFIG_PCIEAER is enabled, the API calls
> pcie_do_recovery() function with 'pci_channel_io_frozen' as the state. This
> will result in the execution of the AER Fatal error handling code. Since
> the link down recovery is pretty much the same as AER Fatal error handling,
> pcie_do_recovery() helper is reused here. First, the AER error_detected()
> callback will be triggered for the bridge and then for the downstream
> devices.
I've been trying to understand what exactly the .error_detected()
involvement should be here (and what it actually does, despite the
docs), and especially around its return codes.
Specifically, I'm trying to see what's supposed to happen with
PCI_ERS_RESULT_CAN_RECOVER. I see that for pci_channel_io_frozen, almost
all endpoint drivers return PCI_ERS_RESULT_NEED_RESET, but if drivers
actually return PCI_ERS_RESULT_CAN_RECOVER, it's unclear what should
happen.
Today, we don't actually respect it; pcie_do_recovery() just calls
reset_subordinates() (pci_host_reset_root_port()) unconditionally. The
only thing that return code affects is whether we call
report_mmio_enabled() vs report_slot_reset() afterward. This seems odd.
It also doesn't totally match the docs:
https://docs.kernel.org/PCI/pcieaer-howto.html#non-correctable-non-fatal-and-fatal-errors
https://docs.kernel.org/PCI/pci-error-recovery.html
e.g., "PCI_ERS_RESULT_CAN_RECOVER
Driver returns this if it thinks it might be able to recover the HW by
just banging IOs or if it wants to be given a chance to extract some
diagnostic information (see mmio_enable, below)."
I've seen drivers that think they want to handle stuff on their own --
for example, if they have a handle to an external PMIC, they may try to
reset things that way -- and so they return PCI_ERS_RESULT_CAN_RECOVER
even for io_frozen. I'm not convinced that's a great idea, but I'm also
not sure what to say about the docs.
On the flip side: it's not clear
PCI_ERS_RESULT_NEED_RESET+pci_channel_io_normal works as documented
either. An endpoint might think it's requesting a slot reset, but
pcie_do_recovery() will ignore that and skip reset_subordinates()
(pci_host_reset_root_port()).
All in all, the docs sound like endpoints _should_ have control over
whether we exercise a full port/slot reset for all types of errors. But
in practice, we do not actually give it that control. i.e., your commit
message is correct, and the docs are not.
I have half a mind to suggest the appended change, so the behavior
matches (some of) the docs a little better [1].
Brian
> Finally, pci_host_reset_root_port() will be called for the Root
> Port, which will reset the Root Port using 'reset_root_port' callback to
> recover the link. Once that's done, resume message will be broadcasted to
> the bridge and the downstream devices, indicating successful link recovery.
>
> But if CONFIG_PCIEAER is not enabled in the kernel, only
> pci_host_reset_root_port() API will be called, which will in turn call
> pci_bus_error_reset() to just reset the Root Port as there is no way we
> could inform the drivers about link recovery.
>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com>
[1]
--- a/drivers/pci/pcie/err.c
+++ b/drivers/pci/pcie/err.c
@@ -219,13 +219,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
pci_dbg(bridge, "broadcast error_detected message\n");
if (state == pci_channel_io_frozen) {
pci_walk_bridge(bridge, report_frozen_detected, &status);
- if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
- pci_warn(bridge, "subordinate device reset failed\n");
- goto failed;
- }
} else {
pci_walk_bridge(bridge, report_normal_detected, &status);
}
+ pci_dbg(bridge, "error_detected result: %d\n", status);
if (status == PCI_ERS_RESULT_CAN_RECOVER) {
status = PCI_ERS_RESULT_RECOVERED;
@@ -234,6 +231,11 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
}
if (status == PCI_ERS_RESULT_NEED_RESET) {
+ if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
+ pci_warn(bridge, "subordinate device reset failed\n");
+ goto failed;
+ }
+
status = PCI_ERS_RESULT_RECOVERED;
pci_dbg(bridge, "broadcast slot_reset message\n");
pci_walk_bridge(bridge, report_slot_reset, &status);
_______________________________________________
Linux-rockchip mailing list
Linux-rockchip@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-rockchip
next prev parent reply other threads:[~2025-08-28 23:26 UTC|newest]
Thread overview: 54+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-15 14:21 [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way Manivannan Sadhasivam
2025-07-15 14:21 ` Manivannan Sadhasivam via B4 Relay
2025-07-15 14:21 ` Manivannan Sadhasivam via B4 Relay
2025-07-15 14:21 ` [PATCH v6 1/4] PCI/ERR: " Manivannan Sadhasivam
2025-07-15 14:21 ` Manivannan Sadhasivam via B4 Relay
2025-07-15 14:21 ` Manivannan Sadhasivam via B4 Relay
2025-07-17 18:28 ` [PATCH v6 1/4] PCI/ERR: Add support for resetting the Root Ports in a platform specific wayy Frank Li
2025-07-17 18:28 ` Frank Li
2025-07-15 14:21 ` [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Ports Manivannan Sadhasivam
2025-07-15 14:21 ` Manivannan Sadhasivam via B4 Relay
2025-07-15 14:21 ` Manivannan Sadhasivam via B4 Relay
2025-07-17 18:31 ` [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Portsy Frank Li
2025-07-17 18:31 ` Frank Li
2025-08-28 20:25 ` Brian Norris [this message]
2025-08-28 20:25 ` [PATCH v6 2/4] PCI: host-common: Add link down handling for Root Ports Brian Norris
2025-08-29 8:35 ` Lukas Wunner
2025-08-29 23:58 ` Brian Norris
2025-08-29 23:58 ` Brian Norris
2025-07-15 14:21 ` [PATCH v6 3/4] PCI: qcom: Add support for resetting the Root Port due to link down event Manivannan Sadhasivam
2025-07-15 14:21 ` Manivannan Sadhasivam via B4 Relay
2025-07-15 14:21 ` Manivannan Sadhasivam via B4 Relay
2025-07-15 14:21 ` [PATCH v6 4/4] PCI: dw-rockchip: Add support to reset Root Port upon " Manivannan Sadhasivam
2025-07-15 14:21 ` Manivannan Sadhasivam via B4 Relay
2025-07-15 14:21 ` Manivannan Sadhasivam via B4 Relay
2025-07-18 3:58 ` [PATCH v6 0/4] PCI: Add support for resetting the Root Ports in a platform specific way Krishna Chaitanya Chundru
2025-07-18 3:58 ` Krishna Chaitanya Chundru
2025-07-18 10:28 ` Niklas Cassel
2025-07-18 10:28 ` Niklas Cassel
2025-07-18 10:39 ` Niklas Cassel
2025-07-18 10:39 ` Niklas Cassel
2025-07-24 5:30 ` Manivannan Sadhasivam
2025-07-24 5:30 ` Manivannan Sadhasivam
2025-08-15 9:07 ` Niklas Cassel
2025-08-15 9:07 ` Niklas Cassel
2025-08-29 16:14 ` Manivannan Sadhasivam
2025-08-29 16:14 ` Manivannan Sadhasivam
2025-09-04 14:03 ` Niklas Cassel
2025-09-04 14:03 ` Niklas Cassel
2026-03-10 13:37 ` Manivannan Sadhasivam
2026-03-10 13:37 ` Manivannan Sadhasivam
2026-03-11 10:59 ` Niklas Cassel
2026-03-11 10:59 ` Niklas Cassel
2026-03-11 14:19 ` Manivannan Sadhasivam
2026-03-11 14:19 ` Manivannan Sadhasivam
2025-07-24 9:28 ` Hongxing Zhu
2025-07-24 9:28 ` Hongxing Zhu
2025-08-28 20:01 ` Brian Norris
2025-08-28 20:01 ` Brian Norris
2025-08-29 13:56 ` Manivannan Sadhasivam
2025-08-29 13:56 ` Manivannan Sadhasivam
2025-09-23 13:06 ` David Bremner
2025-09-23 13:06 ` David Bremner
2025-09-23 13:46 ` Niklas Cassel
2025-09-23 13:46 ` Niklas Cassel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aLC7KIoi-LoH2en4@google.com \
--to=briannorris@chromium.org \
--cc=bhelgaas@google.com \
--cc=cassel@kernel.org \
--cc=heiko@sntech.de \
--cc=krishna.chundru@oss.qualcomm.com \
--cc=kwilczynski@kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linux-rockchip@lists.infradead.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=lpieralisi@kernel.org \
--cc=lukas@wunner.de \
--cc=mahesh@linux.ibm.com \
--cc=mani@kernel.org \
--cc=manivannan.sadhasivam@oss.qualcomm.com \
--cc=oohall@gmail.com \
--cc=p.zabel@pengutronix.de \
--cc=robh@kernel.org \
--cc=wilfred.mallawa@wdc.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.