From: Bjorn Helgaas <helgaas@kernel.org>
To: Terry Bowman <terry.bowman@amd.com>
Cc: dave@stgolabs.net, jonathan.cameron@huawei.com,
dave.jiang@intel.com, alison.schofield@intel.com,
dan.j.williams@intel.com, bhelgaas@google.com,
shiju.jose@huawei.com, ming.li@zohomail.com,
Smita.KoralahalliChannabasappa@amd.com, rrichter@amd.com,
dan.carpenter@linaro.org, PradeepVineshReddy.Kodamati@amd.com,
lukas@wunner.de, Benjamin.Cheatham@amd.com,
sathyanarayanan.kuppuswamy@linux.intel.com,
linux-cxl@vger.kernel.org, alucerop@amd.com, ira.weiny@intel.com,
linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [RESEND v13 21/25] PCI/AER: Dequeue forwarded CXL error
Date: Tue, 4 Nov 2025 12:45:22 -0600 [thread overview]
Message-ID: <20251104184522.GA1864503@bhelgaas> (raw)
In-Reply-To: <20251104170305.4163840-22-terry.bowman@amd.com>
On Tue, Nov 04, 2025 at 11:03:01AM -0600, Terry Bowman wrote:
> The AER driver now forwards CXL protocol errors to the CXL driver via a
> kfifo. The CXL driver must consume these work items, initiate protocol
> error handling, and ensure RAS mappings remain valid throughout processing.
>
> Implement cxl_proto_err_work_fn() to dequeue work items forwarded by the
> AER service driver and begin protocol error processing by calling
> cxl_handle_proto_error().
>
> Add a PCI device lock on &pdev->dev within cxl_proto_err_work_fn() to
> keep the PCI device structure valid during handling. Locking an Endpoint
> will also defer RAS unmapping until the device is unlocked.
>
> For Endpoints, add a lock on CXL memory device cxlds->dev. The CXL memory
> device structure holds the RAS register reference needed during error
> handling.
>
> Add lock for the parent CXL Port for Root Ports, Downstream Ports, and
> Upstream Ports to prevent destruction of structures holding mapped RAS
> addresses while they are in use.
>
> Invoke cxl_do_recovery() for uncorrectable errors. Treat this as a stub for
> now; implement its functionality in a future patch.
>
> Export pci_clean_device_status() to enable cleanup of AER status following
> error handling.
s/pci_clean_device_status/pcie_clear_device_status/
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> drivers/cxl/core/ras.c | 153 ++++++++++++++++++++++++++++++++++++++---
> drivers/pci/pci.c | 1 +
> drivers/pci/pci.h | 1 -
> include/linux/pci.h | 2 +
Looks like this is primarily a CXL change, and the PCI part is
minimal, so I question the "PCI/AER:" prefix in the subject.
> +static struct cxl_port *get_cxl_port(struct pci_dev *pdev)
> +{
> + switch (pci_pcie_type(pdev)) {
> + case PCI_EXP_TYPE_ROOT_PORT:
> + case PCI_EXP_TYPE_DOWNSTREAM:
> + {
> + struct cxl_dport *dport;
> + struct cxl_port *port = find_cxl_port(&pdev->dev, &dport);
> +
> + if (!port) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> + return port;
> + }
> + case PCI_EXP_TYPE_UPSTREAM:
> + {
> + struct cxl_port *port = find_cxl_port_by_uport(&pdev->dev);
> +
> + if (!port) {
> + pci_err(pdev, "Failed to find the CXL device");
> + return NULL;
> + }
> + return port;
> + }
> + case PCI_EXP_TYPE_ENDPOINT:
> + {
> + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> + struct cxl_port *port = cxlds->cxlmd->endpoint;
> +
> + get_device(&port->dev);
> + return port;
> + }
> + }
> + pci_warn_once(pdev, "Error: Unsupported device type (%X)", pci_pcie_type(pdev));
Maybe use "%#x" so it's clear that this is hex? PCI typically uses
lower-case hex; maybe the CXL convention is different.
> +static void cxl_handle_proto_error(struct cxl_proto_err_work_data *err_info)
> +{
> + struct pci_dev *pdev = err_info->pdev;
> + struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +
> + if (err_info->severity == AER_CORRECTABLE) {
> +
> + if (pdev->aer_cap)
> + pci_clear_and_set_config_dword(pdev,
> + pdev->aer_cap + PCI_ERR_COR_STATUS,
> + 0, PCI_ERR_COR_INTERNAL);
> +
> + if (is_pcie_endpoint(pdev))
> + cxl_cor_error_detected(&cxlds->cxlmd->dev);
> + else
> + cxl_port_cor_error_detected(&pdev->dev);
> +
> + pcie_clear_device_status(pdev);
The AER clear above and pcie_clear_device_status() require
ownership of the PCIe Capability and the AER Capability, typically
granted by _OSC.
I suppose it's obvious that the OS does own these Capabilities if we
get here, but I'm not familiar with this code.
next prev parent reply other threads:[~2025-11-04 18:45 UTC|newest]
Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-04 17:02 [RESEND v13 00/25] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2025-11-04 17:02 ` [RESEND v13 01/25] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
2025-11-04 17:50 ` Jonathan Cameron
2025-11-19 3:19 ` dan.j.williams
2025-12-08 18:04 ` Bjorn Helgaas
2025-12-08 22:13 ` Bowman, Terry
2025-11-04 17:02 ` [RESEND v13 02/25] PCI/CXL: Introduce pcie_is_cxl() Terry Bowman
2025-11-04 17:52 ` Jonathan Cameron
2025-11-19 3:19 ` dan.j.williams
2025-11-19 15:55 ` Bowman, Terry
2025-11-19 23:34 ` dan.j.williams
2025-11-21 20:31 ` Gregory Price
2025-11-04 17:02 ` [RESEND v13 03/25] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
2025-11-04 17:53 ` Jonathan Cameron
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 04/25] cxl/pci: Remove unnecessary CXL RCH " Terry Bowman
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 05/25] cxl: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c Terry Bowman
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 06/25] cxl: Move CXL driver's RCH error handling into core/ras_rch.c Terry Bowman
2025-11-04 18:03 ` Jonathan Cameron
2025-11-19 3:20 ` dan.j.williams
2025-11-19 16:07 ` Bowman, Terry
2025-11-04 17:02 ` [RESEND v13 07/25] CXL/AER: Replace device_lock() in cxl_rch_handle_error_iter() with guard() lock Terry Bowman
2025-11-04 18:05 ` Jonathan Cameron
2025-11-04 19:53 ` Dave Jiang
2025-11-19 3:20 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 08/25] CXL/AER: Move AER drivers RCH error handling into pcie/aer_cxl_rch.c Terry Bowman
2025-11-19 3:20 ` dan.j.williams
2025-11-19 8:26 ` Lukas Wunner
2025-11-19 23:36 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 09/25] PCI/AER: Report CXL or PCIe bus error type in trace logging Terry Bowman
2025-11-04 18:08 ` Jonathan Cameron
2025-11-04 18:26 ` Bjorn Helgaas
2025-11-04 17:02 ` [RESEND v13 10/25] cxl/pci: Update RAS handler interfaces to also support CXL Ports Terry Bowman
2025-11-04 18:10 ` Jonathan Cameron
2025-11-11 8:17 ` Alison Schofield
2025-11-19 3:19 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 11/25] cxl/pci: Log message if RAS registers are unmapped Terry Bowman
2025-11-19 3:27 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 12/25] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports Terry Bowman
2025-11-19 21:23 ` dan.j.williams
2025-11-19 22:02 ` Bowman, Terry
2025-11-19 23:40 ` dan.j.williams
2025-11-21 14:56 ` Bowman, Terry
2025-11-04 17:02 ` [RESEND v13 13/25] cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors Terry Bowman
2025-11-05 8:30 ` Alejandro Lucero Palau
2025-11-19 22:00 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 14/25] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
2025-11-04 18:15 ` Jonathan Cameron
2025-11-04 20:03 ` Dave Jiang
2025-11-11 8:23 ` Alison Schofield
2025-11-04 17:02 ` [RESEND v13 15/25] CXL/PCI: Introduce PCI_ERS_RESULT_PANIC Terry Bowman
2025-11-04 19:03 ` Bjorn Helgaas
2025-11-20 0:17 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 16/25] CXL/AER: Introduce pcie/aer_cxl_vh.c in AER driver for forwarding CXL errors Terry Bowman
2025-11-20 0:44 ` dan.j.williams
2025-11-20 0:53 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver Terry Bowman
2025-11-05 17:51 ` Gregory Price
2025-11-05 19:03 ` Gregory Price
2025-11-05 22:26 ` Gregory Price
2025-11-06 17:11 ` Gregory Price
2025-11-06 23:32 ` Bowman, Terry
2025-11-11 8:33 ` Alison Schofield
2025-11-13 21:42 ` Alison Schofield
2025-11-13 22:39 ` Bowman, Terry
2025-11-20 1:24 ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 18/25] cxl: Change CXL handlers to use guard() instead of scoped_guard() Terry Bowman
2025-11-04 18:18 ` Jonathan Cameron
2025-11-04 20:15 ` Dave Jiang
2025-11-04 17:02 ` [RESEND v13 19/25] cxl/pci: Introduce CXL protocol error handlers for Endpoints Terry Bowman
2025-11-04 18:29 ` Jonathan Cameron
2025-11-04 19:09 ` Bjorn Helgaas
2025-11-04 17:03 ` [RESEND v13 20/25] CXL/PCI: Introduce CXL Port protocol error handlers Terry Bowman
2025-11-04 18:32 ` Jonathan Cameron
2025-11-04 21:20 ` Dave Jiang
2025-11-04 21:27 ` Bowman, Terry
2025-11-04 23:39 ` Dave Jiang
2025-11-04 17:03 ` [RESEND v13 21/25] PCI/AER: Dequeue forwarded CXL error Terry Bowman
2025-11-04 18:40 ` Jonathan Cameron
2025-11-04 18:45 ` Bjorn Helgaas [this message]
2025-11-20 3:33 ` dan.j.williams
2025-11-04 17:03 ` [RESEND v13 22/25] CXL/PCI: Export and rename merge_result() to pci_ers_merge_result() Terry Bowman
2025-11-04 18:41 ` Jonathan Cameron
2025-11-04 19:03 ` Bjorn Helgaas
2025-11-14 15:20 ` Bowman, Terry
2025-11-14 16:09 ` Jonathan Cameron
2025-11-04 17:03 ` [RESEND v13 23/25] CXL/PCI: Introduce CXL uncorrectable protocol error recovery Terry Bowman
2025-11-04 18:47 ` Jonathan Cameron
2025-11-04 23:43 ` Dave Jiang
2025-11-05 14:59 ` Bowman, Terry
2025-11-05 16:10 ` Dave Jiang
2025-11-11 8:37 ` Alison Schofield
2025-12-08 18:40 ` Bjorn Helgaas
2025-11-04 17:03 ` [RESEND v13 24/25] CXL/PCI: Enable CXL protocol errors during CXL Port probe Terry Bowman
2025-11-04 17:03 ` [RESEND v13 25/25] CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup Terry Bowman
2025-11-20 3:10 ` dan.j.williams
2025-12-04 17:08 ` Bowman, Terry
2025-11-04 19:11 ` [RESEND v13 00/25] Enable CXL PCIe Port Protocol Error handling and logging Bjorn Helgaas
2025-11-04 21:54 ` Bowman, Terry
2025-11-04 22:12 ` Bjorn Helgaas
2025-12-04 17:30 ` Bowman, Terry
2025-12-08 18:42 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251104184522.GA1864503@bhelgaas \
--to=helgaas@kernel.org \
--cc=Benjamin.Cheatham@amd.com \
--cc=PradeepVineshReddy.Kodamati@amd.com \
--cc=Smita.KoralahalliChannabasappa@amd.com \
--cc=alison.schofield@intel.com \
--cc=alucerop@amd.com \
--cc=bhelgaas@google.com \
--cc=dan.carpenter@linaro.org \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=dave@stgolabs.net \
--cc=ira.weiny@intel.com \
--cc=jonathan.cameron@huawei.com \
--cc=linux-cxl@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=ming.li@zohomail.com \
--cc=rrichter@amd.com \
--cc=sathyanarayanan.kuppuswamy@linux.intel.com \
--cc=shiju.jose@huawei.com \
--cc=terry.bowman@amd.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.