From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Terry Bowman <terry.bowman@amd.com>
Cc: <dan.j.williams@intel.com>, <ira.weiny@intel.com>,
<dave@stgolabs.net>, <dave.jiang@intel.com>,
<alison.schofield@intel.com>, <ming4.li@intel.com>,
<vishal.l.verma@intel.com>, <jim.harris@samsung.com>,
<ilpo.jarvinen@linux.intel.com>, <ardb@kernel.org>,
<sathyanarayanan.kuppuswamy@linux.intel.com>,
<linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<Yazen.Ghannam@amd.com>, <Robert.Richter@amd.com>,
Bjorn Helgaas <bhelgaas@google.com>, <linux-pci@vger.kernel.org>
Subject: Re: [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors
Date: Thu, 20 Jun 2024 13:30:27 +0100 [thread overview]
Message-ID: <20240620133027.000047e1@Huawei.com> (raw)
In-Reply-To: <20240617200411.1426554-4-terry.bowman@amd.com>
On Mon, 17 Jun 2024 15:04:05 -0500
Terry Bowman <terry.bowman@amd.com> wrote:
> PCIe port devices are bound to portdrv, the PCIe port bus driver. portdrv
> does not implement an AER correctable handler (CE) but does implement the
> AER uncorrectable error (UCE). The UCE handler is fairly straightforward
> in that it only checks for frozen error state and returns the next step
> for recovery accordingly.
>
> As a result, port devices relying on AER correctable internal errors (CIE)
> and AER uncorrectable internal errors (UIE) will not be handled. Note,
> the PCIe spec indicates AER CIE/UIE can be used to report implementation
> specific errors.[1]
>
> CXL root ports, CXL downstream switch ports, and CXL upstream switch ports
> are examples of devices using the AER CIE/UIE for implementation specific
> purposes. These CXL ports use the AER interrupt and AER CIE/UIE status to
> report CXL RAS errors.[2]
>
> Add an atomic notifier to portdrv's CE/UCE handlers. Use the atomic
> notifier to report CIE/UIE errors to the registered functions. This will
> require adding a CE handler and updating the existing UCE handler.
>
> For the UCE handler, the CXL spec states UIE errors should return need
> reset: "The only method of recovering from an Uncorrectable Internal Error
> is reset or hardware replacement."[1]
>
> [1] PCI6.0 - 6.2.10 Internal Errors
> [2] CXL3.1 - 12.2.2 CXL Root Ports, Downstream Switch Ports, and
> Upstream Switch Ports
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Cc: Bjorn Helgaas <bhelgaas@google.com>
> Cc: linux-pci@vger.kernel.org
> ---
> drivers/pci/pcie/portdrv.c | 32 ++++++++++++++++++++++++++++++++
> drivers/pci/pcie/portdrv.h | 2 ++
> 2 files changed, 34 insertions(+)
>
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 14a4b89a3b83..86d80e0e9606 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -37,6 +37,9 @@ struct portdrv_service_data {
> u32 service;
> };
>
> +ATOMIC_NOTIFIER_HEAD(portdrv_aer_internal_err_chain);
> +EXPORT_SYMBOL_GPL(portdrv_aer_internal_err_chain);
Perhaps these should be per instance of the portdrv?
I'd imagine we only want to register CXL ones on CXL ports etc
and it's annoying to have to check at runtime for relevance
of a particular notifier.
> +
> /**
> * release_pcie_device - free PCI Express port service device structure
> * @dev: Port service device to release
> @@ -745,11 +748,39 @@ static void pcie_portdrv_shutdown(struct pci_dev *dev)
> static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev,
> pci_channel_state_t error)
> {
> + if (dev->aer_cap) {
> + u32 status;
> +
> + pci_read_config_dword(dev, dev->aer_cap + PCI_ERR_UNCOR_STATUS,
> + &status);
> +
> + if (status & PCI_ERR_UNC_INTN) {
> + atomic_notifier_call_chain(&portdrv_aer_internal_err_chain,
> + AER_FATAL, (void *)dev);
Don't think the cast is needed as always fine to implicitly cast to and from
void * in C.
> + return PCI_ERS_RESULT_NEED_RESET;
> + }
> + }
> +
> if (error == pci_channel_io_frozen)
> return PCI_ERS_RESULT_NEED_RESET;
> return PCI_ERS_RESULT_CAN_RECOVER;
> }
>
> +static void pcie_portdrv_cor_error_detected(struct pci_dev *dev)
> +{
> + u32 status;
> +
> + if (!dev->aer_cap)
> + return;
> +
> + pci_read_config_dword(dev, dev->aer_cap + PCI_ERR_COR_STATUS,
> + &status);
> +
> + if (status & PCI_ERR_COR_INTERNAL)
> + atomic_notifier_call_chain(&portdrv_aer_internal_err_chain,
> + AER_CORRECTABLE, (void *)dev);
No need for the cast.
> +}
> +
> static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev)
> {
> size_t off = offsetof(struct pcie_port_service_driver, slot_reset);
> @@ -780,6 +811,7 @@ static const struct pci_device_id port_pci_ids[] = {
>
> static const struct pci_error_handlers pcie_portdrv_err_handler = {
> .error_detected = pcie_portdrv_error_detected,
> + .cor_error_detected = pcie_portdrv_cor_error_detected,
> .slot_reset = pcie_portdrv_slot_reset,
> .mmio_enabled = pcie_portdrv_mmio_enabled,
> };
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index 12c89ea0313b..8a39197f0203 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -121,4 +121,6 @@ static inline void pcie_pme_interrupt_enable(struct pci_dev *dev, bool en) {}
> #endif /* !CONFIG_PCIE_PME */
>
> struct device *pcie_port_find_device(struct pci_dev *dev, u32 service);
> +
> +extern struct atomic_notifier_head portdrv_aer_internal_err_chain;
> #endif /* _PORTDRV_H_ */
next prev parent reply other threads:[~2024-06-20 12:30 UTC|newest]
Thread overview: 59+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-17 20:04 [RFC PATCH 0/9] Add RAS support for CXL root ports, CXL downstream switch ports, and CXL upstream switch ports Terry Bowman
2024-06-17 20:04 ` [RFC PATCH 1/9] PCI/AER: Update AER driver to call root port and downstream port UCE handlers Terry Bowman
2024-06-20 11:21 ` Jonathan Cameron
2024-06-24 14:58 ` Terry Bowman
2024-06-21 19:17 ` Dan Williams
2024-06-24 17:56 ` Terry Bowman
2024-07-10 20:48 ` nifan.cxl
2024-07-10 21:48 ` Terry Bowman
2024-07-11 1:14 ` fan
2024-08-19 18:35 ` Fan Ni
2024-06-17 20:04 ` [RFC PATCH 2/9] PCI/AER: Call AER CE handler before clearing AER CE status register Terry Bowman
2024-06-20 11:31 ` Jonathan Cameron
2024-06-24 15:08 ` Terry Bowman
2024-06-21 19:23 ` Dan Williams
2024-06-24 18:00 ` Terry Bowman
2024-06-17 20:04 ` [RFC PATCH 3/9] PCI/portdrv: Update portdrv with an atomic notifier for reporting AER internal errors Terry Bowman
2024-06-20 12:30 ` Jonathan Cameron [this message]
2024-06-24 15:22 ` Terry Bowman
2024-06-21 19:36 ` Dan Williams
2024-06-24 18:21 ` Terry Bowman
2024-06-24 21:46 ` Dan Williams
2024-06-25 14:41 ` Terry Bowman
2024-06-26 2:54 ` Li, Ming4
2024-06-26 13:39 ` Terry Bowman
2024-06-17 20:04 ` [RFC PATCH 4/9] cxl/pci: Map CXL PCIe ports' RAS registers Terry Bowman
2024-06-20 12:46 ` Jonathan Cameron
2024-06-24 15:51 ` Terry Bowman
2024-07-02 15:18 ` Jonathan Cameron
2024-06-26 3:39 ` Li, Ming4
2024-06-17 20:04 ` [RFC PATCH 5/9] cxl/pci: Update RAS handler interfaces to support CXL PCIe ports Terry Bowman
2024-06-20 12:49 ` Jonathan Cameron
2024-07-15 17:50 ` nifan.cxl
2024-06-17 20:04 ` [RFC PATCH 6/9] cxl/pci: Add trace logging for CXL PCIe port RAS errors Terry Bowman
2024-06-20 12:53 ` Jonathan Cameron
2024-06-24 15:53 ` Terry Bowman
2024-07-02 15:53 ` Jonathan Cameron
2024-06-17 20:04 ` [RFC PATCH 7/9] cxl/pci: Add atomic notifier callback for CXL PCIe port AER internal errors Terry Bowman
2024-06-20 13:09 ` Jonathan Cameron
2024-06-24 16:09 ` Terry Bowman
2024-07-02 15:58 ` Jonathan Cameron
2024-06-26 6:22 ` Li, Ming4
2024-06-26 13:51 ` Terry Bowman
2024-06-17 20:04 ` [RFC PATCH 8/9] PCI/AER: Export pci_aer_unmask_internal_errors() Terry Bowman
2024-06-19 7:09 ` Christoph Hellwig
2024-06-19 15:40 ` Terry Bowman
2024-06-20 13:11 ` Jonathan Cameron
2024-06-24 16:22 ` Terry Bowman
2024-07-10 21:47 ` Bjorn Helgaas
2024-06-17 20:04 ` [RFC PATCH 9/9] cxl/pci: Enable interrupts for CXL PCIe ports' AER internal errors Terry Bowman
2024-06-20 13:15 ` Jonathan Cameron
2024-06-24 16:46 ` Terry Bowman
2024-07-02 16:00 ` Jonathan Cameron
2024-06-21 19:04 ` [RFC PATCH 0/9] Add RAS support for CXL root ports, CXL downstream switch ports, and CXL upstream switch ports Dan Williams
2024-06-24 17:47 ` Terry Bowman
2024-06-24 20:51 ` Dan Williams
2024-06-25 14:29 ` Terry Bowman
2024-07-25 18:49 ` fan
2024-08-19 16:21 ` Terry Bowman
2024-08-19 18:17 ` Fan Ni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240620133027.000047e1@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=Robert.Richter@amd.com \
--cc=Yazen.Ghannam@amd.com \
--cc=alison.schofield@intel.com \
--cc=ardb@kernel.org \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ilpo.jarvinen@linux.intel.com \
--cc=ira.weiny@intel.com \
--cc=jim.harris@samsung.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=ming4.li@intel.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox