From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Terry Bowman <terry.bowman@amd.com>
Cc: <ming4.li@intel.com>, <linux-cxl@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <linux-pci@vger.kernel.org>,
<dave@stgolabs.net>, <dave.jiang@intel.com>,
<alison.schofield@intel.com>, <vishal.l.verma@intel.com>,
<dan.j.williams@intel.com>, <bhelgaas@google.com>,
<mahesh@linux.ibm.com>, <ira.weiny@intel.com>, <oohall@gmail.com>,
<Benjamin.Cheatham@amd.com>, <rrichter@amd.com>,
<nathan.fontenot@amd.com>,
<Smita.KoralahalliChannabasappa@amd.com>
Subject: Re: [PATCH v2 05/14] PCI/AER: Add CXL PCIe port correctable error support in AER service driver
Date: Wed, 30 Oct 2024 15:13:08 +0000 [thread overview]
Message-ID: <20241030151308.000005d5@Huawei.com> (raw)
In-Reply-To: <20241025210305.27499-6-terry.bowman@amd.com>
On Fri, 25 Oct 2024 16:02:56 -0500
Terry Bowman <terry.bowman@amd.com> wrote:
> The AER service driver doesn't currently handle CXL protocol errors
> reported by CXL root ports, CXL upstream switch ports, and CXL downstream
> switch ports. Consequently, RAS protocol errors from CXL PCIe port devices
> are not properly logged or handled.
>
> These errors are reported to the OS via the root port's AER correctable
> and uncorrectable internal error fields. While the AER driver supports
> handling downstream port protocol errors in restricted CXL host (RCH) mode
> also known as CXL1.1, it lacks the same functionality for CXL PCIe ports
> operating in virtual hierarchy (VH) mode.
>
> To address this gap, update the AER driver to handle CXL PCIe port device
> protocol correctable errors (CE).
>
> Make this update alongside the existing downstream port RCH error handling
> logic, extending support to CXL PCIe ports in VH mode.
>
> is_internal_error() is currently limited by CONFIG_PCIEAER_CXL kernel
> config. Update is_internal_error()'s function declaration such that it is
> always available regardless if CONFIG_PCIEAER_CXL kernel config is enabled
> or disabled.
>
> The uncorrectable error (UCE) handling will be added in a future patch.
>
> [1] CXL 3.1 Spec, 12.2.2 CXL Root Ports, Downstream Switch Ports, and
> Upstream Switch Ports
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
This is a fiddly patch to read, but that's at least partly diff going crazy
in a few places.
Anyhow, I think it is fine but I would call out that this changes
things so that the PCI error handlers are no longer called for CXL ports
if it's an internal error.
With a sentence on that:
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
I'm not 100% convinced the path of separate handlers is the way to go
but we can always change things again if that doesn't work out.
Jonathan
> ---
> drivers/pci/pcie/aer.c | 59 ++++++++++++++++++++++++++++--------------
> 1 file changed, 39 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index 53e9a11f6c0f..1d3e5b929661 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -941,8 +941,15 @@ static bool find_source_device(struct pci_dev *parent,
> return true;
> }
>
> -#ifdef CONFIG_PCIEAER_CXL
> +static bool is_internal_error(struct aer_err_info *info)
> +{
> + if (info->severity == AER_CORRECTABLE)
> + return info->status & PCI_ERR_COR_INTERNAL;
>
> + return info->status & PCI_ERR_UNC_INTN;
> +}
> +
> +#ifdef CONFIG_PCIEAER_CXL
Diff was having fun. Maybe put a blank line here? I think that's
what has tripped it up.
> /**
> * pci_aer_unmask_internal_errors - unmask internal errors
> * @dev: pointer to the pcie_dev data structure
> @@ -994,14 +1001,6 @@ static bool cxl_error_is_native(struct pci_dev *dev)
> return (pcie_ports_native || host->native_aer);
> }
> -
> /**
> @@ -1115,8 +1131,11 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info)
>
> static void handle_error_source(struct pci_dev *dev, struct aer_err_info *info)
> {
> - cxl_handle_error(dev, info);
> - pci_aer_handle_error(dev, info);
> + if (is_internal_error(info) && handles_cxl_errors(dev))
> + cxl_handle_error(dev, info);
> + else
> + pci_aer_handle_error(dev, info);
Whilst not calling this for the CXL cases probably makes sense and
given new code needs to be the case to avoid a double clear I think,
I would call that change out more explicitly in the patch description.
> +
> pci_dev_put(dev);
> }
>
next prev parent reply other threads:[~2024-10-30 15:13 UTC|newest]
Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-25 21:02 [PATCH v2 0/14] Enable CXL PCIe port protocol error handling and logging Terry Bowman
2024-10-25 21:02 ` [PATCH v2 01/14] PCI/AER: Introduce 'struct cxl_err_handlers' and add to 'struct pci_driver' Terry Bowman
2024-10-30 15:14 ` Jonathan Cameron
2024-10-30 15:15 ` Bowman, Terry
2024-10-31 16:20 ` Dave Jiang
2024-10-31 20:24 ` Fan Ni
2024-10-25 21:02 ` [PATCH v2 02/14] PCI/AER: Rename AER driver's interfaces to also indicate CXL PCIe port support Terry Bowman
2024-10-30 15:13 ` Jonathan Cameron
2024-10-31 16:21 ` Dave Jiang
2024-10-31 20:25 ` Fan Ni
2024-10-25 21:02 ` [PATCH v2 03/14] cxl/pci: Introduce helper functions pcie_is_cxl() and pcie_is_cxl_port() Terry Bowman
2024-10-30 14:57 ` Jonathan Cameron
2024-10-31 16:25 ` Dave Jiang
2024-10-31 21:22 ` Fan Ni
2024-10-25 21:02 ` [PATCH v2 04/14] PCI/AER: Modify AER driver logging to report CXL or PCIe bus error type Terry Bowman
2024-10-30 14:56 ` Jonathan Cameron
2024-10-31 16:27 ` Dave Jiang
2024-10-31 21:27 ` Fan Ni
2024-10-25 21:02 ` [PATCH v2 05/14] PCI/AER: Add CXL PCIe port correctable error support in AER service driver Terry Bowman
2024-10-30 15:13 ` Jonathan Cameron [this message]
2024-10-30 15:51 ` Bowman, Terry
2024-11-04 21:50 ` Dan Williams
2024-11-04 22:05 ` Bowman, Terry
2024-10-31 16:37 ` Dave Jiang
2024-10-25 21:02 ` [PATCH v2 06/14] PCI/AER: Change AER driver to read UCE fatal status for all CXL PCIe port devices Terry Bowman
2024-10-30 15:37 ` Jonathan Cameron
2024-10-31 16:58 ` Dave Jiang
2024-11-01 13:30 ` Bowman, Terry
2024-10-25 21:02 ` [PATCH v2 07/14] PCI/AER: Add CXL PCIe port uncorrectable error recovery in AER service driver Terry Bowman
2024-10-30 15:42 ` Jonathan Cameron
2024-10-25 21:02 ` [PATCH v2 08/14] cxl/pci: Change find_cxl_ports() to non-static Terry Bowman
2024-10-30 15:45 ` Jonathan Cameron
2024-10-30 15:54 ` Bowman, Terry
2024-10-25 21:03 ` [PATCH v2 09/14] cxl/pci: Map CXL PCIe root port and downstream switch port RAS registers Terry Bowman
2024-10-30 15:55 ` Jonathan Cameron
2024-10-25 21:03 ` [PATCH v2 10/14] cxl/pci: Map CXL PCIe upstream " Terry Bowman
2024-10-30 15:56 ` Jonathan Cameron
2024-10-25 21:03 ` [PATCH v2 11/14] cxl/pci: Rename RAS handler interfaces to also indicate CXL PCIe port support Terry Bowman
2024-10-30 15:59 ` Jonathan Cameron
2024-10-25 21:03 ` [PATCH v2 12/14] cxl/pci: Add error handler for CXL PCIe port RAS errors Terry Bowman
2024-10-30 16:03 ` Jonathan Cameron
2024-10-25 21:03 ` [PATCH v2 13/14] cxl/pci: Add trace logging " Terry Bowman
2024-10-30 16:07 ` Jonathan Cameron
2024-10-30 21:30 ` Bowman, Terry
2024-10-25 21:03 ` [PATCH v2 14/14] cxl/pci: Add support to assign and clear pci_driver::cxl_err_handlers Terry Bowman
2024-10-30 16:11 ` Jonathan Cameron
2024-10-30 21:34 ` Bowman, Terry
2024-10-27 16:59 ` [PATCH v2 0/14] Applies to Base commit: 8cf0b93919e1 (tag: v6.12-rc2) Linux 6.12-rc2 Bowman, Terry
2024-10-28 1:05 ` [PATCH v2 0/14] Enable CXL PCIe port protocol error handling and logging Bowman, Terry
2024-11-01 18:00 ` Fan Ni
2024-11-01 18:28 ` Bowman, Terry
2024-11-01 19:11 ` Fan Ni
2024-11-01 22:11 ` Fan Ni
2024-11-04 21:25 ` Bowman, Terry
2024-11-04 21:48 ` Fan Ni
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241030151308.000005d5@Huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mahesh@linux.ibm.com \
--cc=ming4.li@intel.com \
--cc=nathan.fontenot@amd.com \
--cc=oohall@gmail.com \
--cc=rrichter@amd.com \
--cc=terry.bowman@amd.com \
--cc=vishal.l.verma@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox