From: Jonathan Cameron <Jonathan.Cameron@huawei.com>
To: Terry Bowman <terry.bowman@amd.com>
Cc: <dave@stgolabs.net>, <dave.jiang@intel.com>,
<alison.schofield@intel.com>, <dan.j.williams@intel.com>,
<bhelgaas@google.com>, <shiju.jose@huawei.com>,
<ming.li@zohomail.com>, <Smita.KoralahalliChannabasappa@amd.com>,
<rrichter@amd.com>, <dan.carpenter@linaro.org>,
<PradeepVineshReddy.Kodamati@amd.com>, <lukas@wunner.de>,
<Benjamin.Cheatham@amd.com>,
<sathyanarayanan.kuppuswamy@linux.intel.com>,
<linux-cxl@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<linux-pci@vger.kernel.org>
Subject: Re: [PATCH v10 05/17] CXL/AER: Introduce kfifo for forwarding CXL errors
Date: Fri, 27 Jun 2025 11:24:29 +0100 [thread overview]
Message-ID: <20250627112429.00007155@huawei.com> (raw)
In-Reply-To: <20250626224252.1415009-6-terry.bowman@amd.com>
On Thu, 26 Jun 2025 17:42:40 -0500
Terry Bowman <terry.bowman@amd.com> wrote:
> CXL error handling will soon be moved from the AER driver into the CXL
> driver. This requires a notification mechanism for the AER driver to share
> the AER interrupt with the CXL driver. The notification will be used
> as an indication for the CXL drivers to handle and log the CXL RAS errors.
>
> First, introduce cxl/core/native_ras.c to contain changes for the CXL
> driver's RAS native handling. This as an alternative to dropping the
> changes into existing cxl/core/ras.c file with purpose to avoid #ifdefs.
> Introduce CXL Kconfig CXL_NATIVE_RAS, dependent on PCIEAER_CXL, to
> conditionally compile the new file.
>
> Add a kfifo work queue to be used by the AER driver and CXL driver. The AER
> driver will be the sole kfifo producer adding work and the cxl_core will be
> the sole kfifo consumer removing work. Add the boilerplate kfifo support.
>
> Add CXL work queue handler registration functions in the AER driver. Export
> the functions allowing CXL driver to access. Implement registration
> functions for the CXL driver to assign or clear the work handler function.
>
> Introduce 'struct cxl_proto_err_info' to serve as the kfifo work data. This
> will contain the erring device's PCI SBDF details used to rediscover the
> device after the CXL driver dequeues the kfifo work. The device rediscovery
> will be introduced along with the CXL handling in future patches.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
Hi Terry,
Whilst it obviously makes patch preparation a bit more time consuming
for series like this with many patches it can be useful to add a brief
change log to the individual patches as well as the cover letter.
That helps reviewers figure out where they need to look again.
A few trivial things inline.
With those fixed up
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Jonathan
> ---
> drivers/cxl/Kconfig | 14 ++++++++
> drivers/cxl/core/Makefile | 1 +
> drivers/cxl/core/core.h | 8 +++++
> drivers/cxl/core/native_ras.c | 26 +++++++++++++++
> drivers/cxl/core/port.c | 2 ++
> drivers/cxl/core/ras.c | 1 +
> drivers/cxl/cxlpci.h | 1 +
> drivers/pci/pci.h | 4 +++
> drivers/pci/pcie/aer.c | 7 ++--
> drivers/pci/pcie/cxl_aer.c | 60 +++++++++++++++++++++++++++++++++++
> include/linux/aer.h | 31 ++++++++++++++++++
> 11 files changed, 153 insertions(+), 2 deletions(-)
> create mode 100644 drivers/cxl/core/native_ras.c
> static void cxl_cper_trace_corr_port_prot_err(struct pci_dev *pdev,
> diff --git a/drivers/cxl/cxlpci.h b/drivers/cxl/cxlpci.h
> index 54e219b0049e..6f1396ef7b77 100644
> --- a/drivers/cxl/cxlpci.h
> +++ b/drivers/cxl/cxlpci.h
> @@ -4,6 +4,7 @@
> #define __CXL_PCI_H__
> #include <linux/pci.h>
> #include "cxl.h"
> +#include "linux/aer.h"
Why? There are no changes in this header other than the include and the changes
to linux/aer.h are new stuff so I can't see how it becomes necessary if it
wasn't before.
Might well have always been missing and should have been here. If so separate
patch to tidy that up.
>
> #define CXL_MEMORY_PROGIF 0x10
>
> diff --git a/drivers/pci/pcie/cxl_aer.c b/drivers/pci/pcie/cxl_aer.c
> index b2ea14f70055..846ab55d747c 100644
> --- a/drivers/pci/pcie/cxl_aer.c
> +++ b/drivers/pci/pcie/cxl_aer.c
> static int cxl_rch_handle_error_iter(struct pci_dev *dev, void *data)
> {
> struct aer_err_info *info = (struct aer_err_info *)data;
> @@ -136,3 +152,47 @@ void cxl_rch_enable_rcec(struct pci_dev *rcec)
> pci_info(rcec, "CXL: Internal errors unmasked");
> }
>
> +static DEFINE_KFIFO(cxl_proto_err_fifo, struct cxl_proto_err_work_data,
> + CXL_ERROR_SOURCES_MAX);
> +static DEFINE_SPINLOCK(cxl_proto_err_fifo_lock);
> +struct work_struct *cxl_proto_err_work;
I'm not seeing a declaration for this in the headers, so can it be static?
This is made a little more confusing as in this patch we have both
a structure called cxl_proto_err_work and a pointer to it with exactly the
same name. Maybe rename this so it's subtly different. cxl_protocol_err_work
or something silly like that just to make reviewers life a tiny bit easier!
> +
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 02940be66324..24c3d9e18ad5 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -10,6 +10,7 @@
>
> #include <linux/errno.h>
> #include <linux/types.h>
> +#include <linux/workqueue_types.h>
>
> #define AER_NONFATAL 0
> #define AER_FATAL 1
> @@ -53,6 +54,26 @@ struct aer_capability_regs {
> u16 uncor_err_source;
> };
>
> +/**
> + * struct cxl_proto_err_info - Error information used in CXL error handling
> + * @severity: AER severity
> + * @function: Device's PCI function
Run kernel-doc over the files and fix errors / warning.
Missed updating this to devfn which it would have shouted about.
> + * @device: Device's PCI device
> + * @bus: Device's PCI bus
> + * @segment: Device's PCI segment
> + */
> +struct cxl_proto_error_info {
> + int severity;
> +
> + u8 devfn;
> + u8 bus;
> + u16 segment;
> +};
next prev parent reply other threads:[~2025-06-27 10:24 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-26 22:42 [PATCH v10 00/17] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2025-06-26 22:42 ` [PATCH v10 01/17] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
2025-07-18 17:55 ` Dave Jiang
2025-07-23 21:58 ` dan.j.williams
2025-07-23 22:15 ` Dave Jiang
2025-06-26 22:42 ` [PATCH v10 02/17] PCI/CXL: Add pcie_is_cxl() Terry Bowman
2025-07-23 22:30 ` dan.j.williams
2025-07-23 23:21 ` Bowman, Terry
2025-07-24 18:00 ` dan.j.williams
2025-08-09 10:56 ` Alejandro Lucero Palau
2025-08-11 19:14 ` Bowman, Terry
2025-08-11 23:14 ` dan.j.williams
2025-06-26 22:42 ` [PATCH v10 03/17] PCI/AER: Report CXL or PCIe bus error type in trace logging Terry Bowman
2025-06-26 23:25 ` Sathyanarayanan Kuppuswamy
2025-06-27 14:14 ` Bowman, Terry
2025-06-27 9:53 ` Jonathan Cameron
2025-07-02 16:00 ` Bowman, Terry
2025-06-27 11:32 ` Shiju Jose
2025-06-27 14:24 ` Bowman, Terry
2025-07-01 21:27 ` Dave Jiang
2025-07-23 22:56 ` dan.j.williams
2025-06-26 22:42 ` [PATCH v10 04/17] CXL/AER: Introduce CXL specific AER driver file Terry Bowman
2025-06-26 23:42 ` Sathyanarayanan Kuppuswamy
2025-06-27 10:12 ` Jonathan Cameron
2025-06-27 14:29 ` Bowman, Terry
2025-07-24 0:01 ` dan.j.williams
2025-07-24 17:06 ` Bowman, Terry
2025-07-24 20:32 ` dan.j.williams
2025-07-24 1:16 ` dan.j.williams
2025-07-24 17:02 ` Bowman, Terry
2025-07-24 20:23 ` dan.j.williams
2025-06-26 22:42 ` [PATCH v10 05/17] CXL/AER: Introduce kfifo for forwarding CXL errors Terry Bowman
2025-06-27 10:24 ` Jonathan Cameron [this message]
2025-07-02 16:21 ` Bowman, Terry
2025-07-02 19:54 ` Dan Carpenter
2025-07-02 19:57 ` Bowman, Terry
2025-07-03 10:06 ` Jonathan Cameron
2025-07-01 21:53 ` Dave Jiang
2025-07-02 17:10 ` Bowman, Terry
2025-07-24 2:01 ` dan.j.williams
2025-07-24 17:21 ` Bowman, Terry
2025-07-24 20:55 ` dan.j.williams
2025-06-26 22:42 ` [PATCH v10 06/17] PCI/AER: Dequeue forwarded CXL error Terry Bowman
2025-06-27 11:00 ` Jonathan Cameron
2025-07-02 17:51 ` Bowman, Terry
2025-07-01 23:04 ` Dave Jiang
2025-07-02 17:56 ` Bowman, Terry
2025-07-03 10:11 ` Jonathan Cameron
2025-07-25 0:38 ` dan.j.williams
2025-06-26 22:42 ` [PATCH v10 07/17] CXL/PCI: Introduce CXL uncorrectable protocol error recovery Terry Bowman
2025-06-27 11:05 ` Jonathan Cameron
2025-07-02 21:06 ` Bowman, Terry
2025-06-27 12:27 ` Shiju Jose
2025-07-02 21:34 ` Bowman, Terry
2025-06-26 22:42 ` [PATCH v10 08/17] cxl/pci: Move RAS initialization to cxl_port driver Terry Bowman
2025-06-27 11:12 ` Jonathan Cameron
2025-07-18 18:01 ` Dave Jiang
2025-06-26 22:42 ` [PATCH v10 09/17] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
2025-06-27 11:17 ` Jonathan Cameron
2025-07-02 21:41 ` Bowman, Terry
2025-07-18 21:28 ` Dave Jiang
2025-07-18 21:55 ` Bowman, Terry
2025-07-18 22:01 ` Dave Jiang
2025-07-18 22:40 ` Bowman, Terry
2025-07-18 22:45 ` Dave Jiang
2025-06-26 22:42 ` [PATCH v10 10/17] cxl/pci: Update RAS handler interfaces to also support CXL Ports Terry Bowman
2025-06-26 22:42 ` [PATCH v10 11/17] cxl/pci: Log message if RAS registers are unmapped Terry Bowman
2025-07-21 21:56 ` Dave Jiang
2025-06-26 22:42 ` [PATCH v10 12/17] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports Terry Bowman
2025-06-27 12:22 ` Shiju Jose
2025-07-02 1:18 ` Alison Schofield
2025-07-02 22:07 ` Bowman, Terry
2025-07-02 21:56 ` Bowman, Terry
2025-06-26 22:42 ` [PATCH v10 13/17] cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors Terry Bowman
2025-06-27 11:48 ` Jonathan Cameron
2025-07-21 22:17 ` Dave Jiang
2025-06-26 22:42 ` [PATCH v10 14/17] cxl/pci: Introduce CXL Endpoint protocol error handlers Terry Bowman
2025-06-27 11:52 ` Jonathan Cameron
2025-06-27 12:27 ` Shiju Jose
2025-07-21 22:35 ` Dave Jiang
2025-07-22 18:23 ` Bowman, Terry
2025-06-26 22:42 ` [PATCH v10 15/17] CXL/PCI: Introduce CXL Port " Terry Bowman
2025-06-26 22:42 ` [PATCH v10 16/17] CXL/PCI: Enable CXL protocol errors during CXL Port probe Terry Bowman
2025-06-26 22:42 ` [PATCH v10 17/17] CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup Terry Bowman
2025-07-23 21:55 ` [PATCH v10 00/17] Enable CXL PCIe Port Protocol Error handling and logging dan.j.williams
2025-07-24 15:58 ` Bowman, Terry
2025-08-18 15:18 ` Joshua Hahn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250627112429.00007155@huawei.com \
--to=jonathan.cameron@huawei.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=bhelgaas@google.com \
--cc=dan.carpenter@linaro.org \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=ming.li@zohomail.com \
--cc=rrichter@amd.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shiju.jose@huawei.com \
--cc=terry.bowman@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.