From: Jay Vosburgh <jay.vosburgh@canonical.com>
To: sathyanarayanan.kuppuswamy@linux.intel.com
Cc: bhelgaas@google.com, linux-pci@vger.kernel.org,
linux-kernel@vger.kernel.org, ashok.raj@intel.com,
Yicong Yang <yangyicong@hisilicon.com>,
liudongdong 00290354 <liudongdong3@huawei.com>,
Linuxarm <linuxarm@huawei.com>
Subject: Re: [PATCH v1 1/1] PCI/ERR: Handle fatal error recovery for non-hotplug capable devices
Date: Tue, 12 May 2020 12:20:30 -0700 [thread overview]
Message-ID: <9908.1589311230@famine> (raw)
In-Reply-To: <f4bbacd3af453285271c8fc733652969e11b84f8.1588821160.git.sathyanarayanan.kuppuswamy@linux.intel.com>
sathyanarayanan.kuppuswamy@linux.intel.com wrote:
>From: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>
>If there are non-hotplug capable devices connected to a given
>port, then during the fatal error recovery(triggered by DPC or
>AER), after calling reset_link() function, we cannot rely on
>hotplug handler to detach and re-enumerate the device drivers
>in the affected bus. Instead, we will have to let the error
>recovery handler call report_slot_reset() for all devices in
>the bus to notify about the reset operation. Although this is
>only required for non hot-plug capable devices, doing it for
>hotplug capable devices should not affect the functionality.
Yicong,
Does the patch below also resolve the issue for you, as with
your changed version of my original patch?
-J
>Along with above issue, this fix also applicable to following
>issue.
>
>Commit 6d2c89441571 ("PCI/ERR: Update error status after
>reset_link()") added support to store status of reset_link()
>call. Although this fixed the error recovery issue observed if
>the initial value of error status is PCI_ERS_RESULT_DISCONNECT
>or PCI_ERS_RESULT_NO_AER_DRIVER, it also discarded the status
>result from report_frozen_detected. This can cause a failure to
>recover if _NEED_RESET is returned by report_frozen_detected and
>report_slot_reset is not invoked.
>
>Such an event can be induced for testing purposes by reducing the
>Max_Payload_Size of a PCIe bridge to less than that of a device
>downstream from the bridge, and then initiating I/O through the
>device, resulting in oversize transactions. In the presence of DPC,
>this results in a containment event and attempted reset and recovery
>via pcie_do_recovery. After 6d2c89441571 report_slot_reset is not
>invoked, and the device does not recover.
>
>[original patch is from jay.vosburgh@canonical.com]
>[original patch link https://lore.kernel.org/linux-pci/18609.1588812972@famine/]
>Fixes: 6d2c89441571 ("PCI/ERR: Update error status after reset_link()")
>Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
>Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
>---
> drivers/pci/pcie/err.c | 19 +++++++++++++++----
> 1 file changed, 15 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
>index 14bb8f54723e..db80e1ecb2dc 100644
>--- a/drivers/pci/pcie/err.c
>+++ b/drivers/pci/pcie/err.c
>@@ -165,13 +165,24 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> pci_dbg(dev, "broadcast error_detected message\n");
> if (state == pci_channel_io_frozen) {
> pci_walk_bus(bus, report_frozen_detected, &status);
>- status = reset_link(dev);
>- if (status != PCI_ERS_RESULT_RECOVERED) {
>+ status = PCI_ERS_RESULT_NEED_RESET;
>+ } else {
>+ pci_walk_bus(bus, report_normal_detected, &status);
>+ }
>+
>+ if (status == PCI_ERS_RESULT_NEED_RESET) {
>+ if (reset_link) {
>+ if (reset_link(dev) != PCI_ERS_RESULT_RECOVERED)
>+ status = PCI_ERS_RESULT_DISCONNECT;
>+ } else {
>+ if (pci_bus_error_reset(dev))
>+ status = PCI_ERS_RESULT_DISCONNECT;
>+ }
>+
>+ if (status == PCI_ERS_RESULT_DISCONNECT) {
> pci_warn(dev, "link reset failed\n");
> goto failed;
> }
>- } else {
>- pci_walk_bus(bus, report_normal_detected, &status);
> }
>
> if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>--
>2.17.1
>
---
-Jay Vosburgh, jay.vosburgh@canonical.com
next prev parent reply other threads:[~2020-05-12 19:20 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20200506203249.GA453633@bjorn-Precision-5520>
2020-05-07 0:56 ` [PATCH] PCI/ERR: Resolve regression in pcie_do_recovery Jay Vosburgh
2020-05-07 3:32 ` [PATCH v1 1/1] PCI/ERR: Handle fatal error recovery for non-hotplug capable devices sathyanarayanan.kuppuswamy
2020-05-12 19:20 ` Jay Vosburgh [this message]
2020-05-13 1:50 ` Yicong Yang
2020-05-13 22:44 ` Bjorn Helgaas
2020-05-14 20:36 ` Kuppuswamy, Sathyanarayanan
2020-05-20 8:28 ` Yicong Yang
2020-05-20 17:04 ` Kuppuswamy, Sathyanarayanan
2020-05-21 10:58 ` Yicong Yang
2020-05-21 19:31 ` Kuppuswamy, Sathyanarayanan
2020-05-22 2:56 ` Yicong Yang
2020-05-27 1:31 ` Kuppuswamy, Sathyanarayanan
2020-05-27 3:00 ` Oliver O'Halloran
2020-05-27 3:06 ` Kuppuswamy, Sathyanarayanan
2020-05-27 3:35 ` Oliver O'Halloran
2020-05-27 3:50 ` Yicong Yang
2020-05-27 4:04 ` Kuppuswamy, Sathyanarayanan
2020-05-27 6:41 ` Yicong Yang
2020-05-28 3:57 ` Kuppuswamy, Sathyanarayanan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9908.1589311230@famine \
--to=jay.vosburgh@canonical.com \
--cc=ashok.raj@intel.com \
--cc=bhelgaas@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxarm@huawei.com \
--cc=liudongdong3@huawei.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=yangyicong@hisilicon.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.