From: <dan.j.williams@intel.com>
To: Terry Bowman <terry.bowman@amd.com>, <dave@stgolabs.net>,
<jonathan.cameron@huawei.com>, <dave.jiang@intel.com>,
<alison.schofield@intel.com>, <dan.j.williams@intel.com>,
<bhelgaas@google.com>, <shiju.jose@huawei.com>,
<ming.li@zohomail.com>, <Smita.KoralahalliChannabasappa@amd.com>,
<rrichter@amd.com>, <dan.carpenter@linaro.org>,
<PradeepVineshReddy.Kodamati@amd.com>, <lukas@wunner.de>,
<Benjamin.Cheatham@amd.com>,
<sathyanarayanan.kuppuswamy@linux.intel.com>,
<linux-cxl@vger.kernel.org>, <alucerop@amd.com>,
<ira.weiny@intel.com>
Cc: <linux-kernel@vger.kernel.org>, <linux-pci@vger.kernel.org>,
<terry.bowman@amd.com>
Subject: Re: [RESEND v13 16/25] CXL/AER: Introduce pcie/aer_cxl_vh.c in AER driver for forwarding CXL errors
Date: Wed, 19 Nov 2025 16:44:19 -0800 [thread overview]
Message-ID: <691e646357da5_1a37510026@dwillia2-mobl4.notmuch> (raw)
In-Reply-To: <20251104170305.4163840-17-terry.bowman@amd.com>
Terry Bowman wrote:
> CXL virtual hierarchy (VH) RAS handling for CXL Port devices will be added
> soon. This requires a notification mechanism for the AER driver to share
> the AER interrupt with the CXL driver. The notification will be used as an
> indication for the CXL drivers to handle and log the CXL RAS errors.
>
> Note, 'CXL protocol error' terminology will refer to CXL VH and not
> CXL RCH errors unless specifically noted going forward.
>
> Introduce a new file in the AER driver to handle the CXL protocol errors
> named pci/pcie/aer_cxl_vh.c.
>
> Add a kfifo work queue to be used by the AER and CXL drivers. The AER
> driver will be the sole kfifo producer adding work and the cxl_core will be
> the sole kfifo consumer removing work. Add the boilerplate kfifo support.
> Encapsulate the kfifo, RW semaphore, and work pointer in a single structure.
>
> Add CXL work queue handler registration functions in the AER driver. Export
> the functions allowing CXL driver to access. Implement registration
> functions for the CXL driver to assign or clear the work handler function.
> Synchronize accesses using the RW semaphore.
>
> Introduce 'struct cxl_proto_err_work_data' to serve as the kfifo work data.
> This will contain a reference to the erring PCI device and the error
> severity. This will be used when the work is dequeued by the cxl_core driver.
>
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Some small things to fixup.
> diff --git a/drivers/pci/pcie/aer_cxl_vh.c b/drivers/pci/pcie/aer_cxl_vh.c
> new file mode 100644
> index 000000000000..5dbc81341dc4
> --- /dev/null
> +++ b/drivers/pci/pcie/aer_cxl_vh.c
> @@ -0,0 +1,95 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/* Copyright(c) 2025 AMD Corporation. All rights reserved. */
> +
> +#include <linux/pci.h>
> +#include <linux/aer.h>
> +#include <linux/pci.h>
> +#include <linux/bitfield.h>
> +#include <linux/kfifo.h>
> +#include "../pci.h"
> +
> +#define CXL_ERROR_SOURCES_MAX 128
> +
> +struct cxl_proto_err_kfifo {
> + struct work_struct *work;
> + struct rw_semaphore rw_sema;
> + DECLARE_KFIFO(fifo, struct cxl_proto_err_work_data,
> + CXL_ERROR_SOURCES_MAX);
> +};
> +
> +static struct cxl_proto_err_kfifo cxl_proto_err_kfifo = {
> + .rw_sema = __RWSEM_INITIALIZER(cxl_proto_err_kfifo.rw_sema)
> +};
> +
> +bool cxl_error_is_native(struct pci_dev *dev)
> +{
> + struct pci_host_bridge *host = pci_find_host_bridge(dev->bus);
> +
> + return (pcie_ports_native || host->native_aer);
This function always confuses me because there is zero "cxl" inside this
function. Something to comment on later so I am not scratching my head
the next time this function is touched.
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_error_is_native, "CXL");
Why is this exported? All of the consumers are local to
drivers/pci/pcie/built-in.a.
> +
> +bool is_internal_error(struct aer_err_info *info)
> +{
> + if (info->severity == AER_CORRECTABLE)
> + return info->status & PCI_ERR_COR_INTERNAL;
> +
> + return info->status & PCI_ERR_UNC_INTN;
> +}
> +EXPORT_SYMBOL_NS_GPL(is_internal_error, "CXL");
Ditto on the export, and I do not see it getting used anywhere later in
the series.
Also, this is so tiny that if anything else wanted to use it just make
it a static inline.
> +
> +bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info)
> +{
> + if (!info || !info->is_cxl)
> + return false;
> +
> + if (pci_pcie_type(pdev) != PCI_EXP_TYPE_ENDPOINT)
> + return false;
> +
> + return is_internal_error(info);
> +}
> +EXPORT_SYMBOL_NS_GPL(is_cxl_error, "CXL");
No consumers for this exported symbol.
> +
> +void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info)
> +{
> + struct cxl_proto_err_work_data wd = (struct cxl_proto_err_work_data) {
> + .severity = info->severity,
> + .pdev = pdev
> + };
> +
> + guard(rwsem_write)(&cxl_proto_err_kfifo.rw_sema);
This guard can be downgraded to rwsem_read. This only needs to make sure
that the kifo remain registered for the duration of the function.
> +
> + if (!cxl_proto_err_kfifo.work) {
> + dev_warn_once(&pdev->dev, "CXL driver is unregistered. Unable to forward error.");
I would combine this with the following ratelimited message because they
are effectively the same thing. "Hey admin, I see some errors but the
driver to handle them is gone, or out to lunch." The reason to combine
them is that you probably want this message to catch dropped errors
without failure, and this dev_warn_once() starts failing after the first
invocation.
> + return;
> + }
> +
> + if (!kfifo_put(&cxl_proto_err_kfifo.fifo, wd)) {
> + dev_err_ratelimited(&pdev->dev, "AER-CXL kfifo overflow\n");
> + return;
> + }
> +
> + schedule_work(cxl_proto_err_kfifo.work);
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_forward_error, "CXL");
No consumer for this export.
> +
> +void cxl_register_proto_err_work(struct work_struct *work)
> +{
> + guard(rwsem_write)(&cxl_proto_err_kfifo.rw_sema);
> + cxl_proto_err_kfifo.work = work;
> +}
> +EXPORT_SYMBOL_NS_GPL(cxl_register_proto_err_work, "CXL");
Oh hey, the rest of these exports make sense.
...but I do think you can go back and remove
bool is_internal_error(struct aer_err_info *info);
bool is_cxl_error(struct pci_dev *pdev, struct aer_err_info *info);
void cxl_forward_error(struct pci_dev *pdev, struct aer_err_info *info);
...from pci.h, and move them to an aer internal header like
drivers/pci/pcie/portdrv.h.
next prev parent reply other threads:[~2025-11-20 0:44 UTC|newest]
Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-04 17:02 [RESEND v13 00/25] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2025-11-04 17:02 ` [RESEND v13 01/25] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
2025-11-04 17:50 ` Jonathan Cameron
2025-11-19 3:19 ` dan.j.williams
2025-12-08 18:04 ` Bjorn Helgaas
2025-12-08 22:13 ` Bowman, Terry
2025-11-04 17:02 ` [RESEND v13 02/25] PCI/CXL: Introduce pcie_is_cxl() Terry Bowman
2025-11-04 17:52 ` Jonathan Cameron
2025-11-19 3:19 ` dan.j.williams
2025-11-19 15:55 ` Bowman, Terry
2025-11-19 23:34 ` dan.j.williams
2025-11-21 20:31 ` Gregory Price
2025-11-04 17:02 ` [RESEND v13 03/25] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
2025-11-04 17:53 ` Jonathan Cameron
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 04/25] cxl/pci: Remove unnecessary CXL RCH " Terry Bowman
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 05/25] cxl: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c Terry Bowman
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 06/25] cxl: Move CXL driver's RCH error handling into core/ras_rch.c Terry Bowman
2025-11-04 18:03 ` Jonathan Cameron
2025-11-19 3:20 ` dan.j.williams
2025-11-19 16:07 ` Bowman, Terry
2025-11-04 17:02 ` [RESEND v13 07/25] CXL/AER: Replace device_lock() in cxl_rch_handle_error_iter() with guard() lock Terry Bowman
2025-11-04 18:05 ` Jonathan Cameron
2025-11-04 19:53 ` Dave Jiang
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 08/25] CXL/AER: Move AER drivers RCH error handling into pcie/aer_cxl_rch.c Terry Bowman
2025-11-19 3:20 ` dan.j.williams
2025-11-19 8:26 ` Lukas Wunner
2025-11-19 23:36 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 09/25] PCI/AER: Report CXL or PCIe bus error type in trace logging Terry Bowman
2025-11-04 18:08 ` Jonathan Cameron
2025-11-04 18:26 ` Bjorn Helgaas
2025-11-04 17:02 ` [RESEND v13 10/25] cxl/pci: Update RAS handler interfaces to also support CXL Ports Terry Bowman
2025-11-04 18:10 ` Jonathan Cameron
2025-11-11 8:17 ` Alison Schofield
2025-11-19 3:19 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 11/25] cxl/pci: Log message if RAS registers are unmapped Terry Bowman
2025-11-19 3:27 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 12/25] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports Terry Bowman
2025-11-19 21:23 ` dan.j.williams
2025-11-19 22:02 ` Bowman, Terry
2025-11-19 23:40 ` dan.j.williams
2025-11-21 14:56 ` Bowman, Terry
2025-11-04 17:02 ` [RESEND v13 13/25] cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors Terry Bowman
2025-11-05 8:30 ` Alejandro Lucero Palau
2025-11-19 22:00 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 14/25] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
2025-11-04 18:15 ` Jonathan Cameron
2025-11-04 20:03 ` Dave Jiang
2025-11-11 8:23 ` Alison Schofield
2025-11-04 17:02 ` [RESEND v13 15/25] CXL/PCI: Introduce PCI_ERS_RESULT_PANIC Terry Bowman
2025-11-04 19:03 ` Bjorn Helgaas
2025-11-20 0:17 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 16/25] CXL/AER: Introduce pcie/aer_cxl_vh.c in AER driver for forwarding CXL errors Terry Bowman
2025-11-20 0:44 ` dan.j.williams [this message]
2025-11-20 0:53 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver Terry Bowman
2025-11-05 17:51 ` Gregory Price
2025-11-05 19:03 ` Gregory Price
2025-11-05 22:26 ` Gregory Price
2025-11-06 17:11 ` Gregory Price
2025-11-06 23:32 ` Bowman, Terry
2025-11-11 8:33 ` Alison Schofield
2025-11-13 21:42 ` Alison Schofield
2025-11-13 22:39 ` Bowman, Terry
2025-11-20 1:24 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 18/25] cxl: Change CXL handlers to use guard() instead of scoped_guard() Terry Bowman
2025-11-04 18:18 ` Jonathan Cameron
2025-11-04 20:15 ` Dave Jiang
2025-11-04 17:02 ` [RESEND v13 19/25] cxl/pci: Introduce CXL protocol error handlers for Endpoints Terry Bowman
2025-11-04 18:29 ` Jonathan Cameron
2025-11-04 19:09 ` Bjorn Helgaas
2025-11-04 17:03 ` [RESEND v13 20/25] CXL/PCI: Introduce CXL Port protocol error handlers Terry Bowman
2025-11-04 18:32 ` Jonathan Cameron
2025-11-04 21:20 ` Dave Jiang
2025-11-04 21:27 ` Bowman, Terry
2025-11-04 23:39 ` Dave Jiang
2025-11-04 17:03 ` [RESEND v13 21/25] PCI/AER: Dequeue forwarded CXL error Terry Bowman
2025-11-04 18:40 ` Jonathan Cameron
2025-11-04 18:45 ` Bjorn Helgaas
2025-11-20 3:33 ` dan.j.williams
2025-11-04 17:03 ` [RESEND v13 22/25] CXL/PCI: Export and rename merge_result() to pci_ers_merge_result() Terry Bowman
2025-11-04 18:41 ` Jonathan Cameron
2025-11-04 19:03 ` Bjorn Helgaas
2025-11-14 15:20 ` Bowman, Terry
2025-11-14 16:09 ` Jonathan Cameron
2025-11-04 17:03 ` [RESEND v13 23/25] CXL/PCI: Introduce CXL uncorrectable protocol error recovery Terry Bowman
2025-11-04 18:47 ` Jonathan Cameron
2025-11-04 23:43 ` Dave Jiang
2025-11-05 14:59 ` Bowman, Terry
2025-11-05 16:10 ` Dave Jiang
2025-11-11 8:37 ` Alison Schofield
2025-12-08 18:40 ` Bjorn Helgaas
2025-11-04 17:03 ` [RESEND v13 24/25] CXL/PCI: Enable CXL protocol errors during CXL Port probe Terry Bowman
2025-11-04 17:03 ` [RESEND v13 25/25] CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup Terry Bowman
2025-11-20 3:10 ` dan.j.williams
2025-11-04 19:11 ` [RESEND v13 00/25] Enable CXL PCIe Port Protocol Error handling and logging Bjorn Helgaas
2025-11-04 21:54 ` Bowman, Terry
2025-11-04 22:12 ` Bjorn Helgaas
2025-12-04 17:30 ` Bowman, Terry
2025-12-08 18:42 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=691e646357da5_1a37510026@dwillia2-mobl4.notmuch \
--to=dan.j.williams@intel.com \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=alucerop@amd.com \
--cc=bhelgaas@google.com \
--cc=dan.carpenter@linaro.org \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=ming.li@zohomail.com \
--cc=rrichter@amd.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shiju.jose@huawei.com \
--cc=terry.bowman@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox