All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Terry Bowman <terry.bowman@amd.com>
Cc: <dave@stgolabs.net>, <dave.jiang@intel.com>,
	<alison.schofield@intel.com>, <dan.j.williams@intel.com>,
	<bhelgaas@google.com>, <shiju.jose@huawei.com>,
	<ming.li@zohomail.com>, <Smita.KoralahalliChannabasappa@amd.com>,
	<rrichter@amd.com>, <dan.carpenter@linaro.org>,
	<PradeepVineshReddy.Kodamati@amd.com>, <lukas@wunner.de>,
	<Benjamin.Cheatham@amd.com>,
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	<linux-cxl@vger.kernel.org>, <alucerop@amd.com>,
	<ira.weiny@intel.com>, <linux-kernel@vger.kernel.org>,
	<linux-pci@vger.kernel.org>
Subject: Re: [PATCH v11 16/23] cxl/pci: Introduce CXL Endpoint protocol error handlers
Date: Wed, 10 Sep 2025 14:23:35 +0100	[thread overview]
Message-ID: <20250910142335.00001f53@huawei.com> (raw)
In-Reply-To: <20250827013539.903682-17-terry.bowman@amd.com>

On Tue, 26 Aug 2025 20:35:31 -0500
Terry Bowman <terry.bowman@amd.com> wrote:

> CXL Endpoint protocol errors are currently handled using PCI error
> handlers. The CXL Endpoint requires CXL specific handling in the case of
> uncorrectable error (UCE) handling not provided by the PCI handlers.
> 
> Add CXL specific handlers for CXL Endpoints. Rename the existing
> cxl_error_handlers to be pci_error_handlers to more correctly indicate
> the error type and follow naming consistency.
> 
> The PCI handlers will be called if the CXL device is not trained for
> alternate protocol (CXL). Update the CXL Endpoint PCI handlers to call the
> CXL UCE handlers.
> 
> The existing EP UCE handler includes checks for various results. These are
> no longer needed because CXL UCE recovery will not be attempted. Implement
> cxl_handle_ras() to return PCI_ERS_RESULT_NONE or PCI_ERS_RESULT_PANIC. The
> CXL UCE handler is called by cxl_do_recovery() that acts on the return
> value. In the case of the PCI handler path, call panic() if the result is
> PCI_ERS_RESULT_PANIC.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
> ---
> Changes in v10->v11:
> - cxl_error_detected() - Change handlers' scoped_guard() to guard() (Jonathan)
> - cxl_error_detected() - Remove extra line (Shiju)
> - Changes moved to core/ras.c (Terry)
> - cxl_error_detected(), remove 'ue' and return with function call. (Jonathan)
> - Remove extra space in documentation for PCI_ERS_RESULT_PANIC definition
> - Move #include "pci.h from cxl.h to core.h (Terry)
> - Remove unnecessary includes of cxl.h and core.h in mem.c (Terry)
Hi Terry,

A few suggested renames inline just because things like pci_cor_error_detected()
sounds like it should have nothing CXL specific in it. 

Whilst it is clunky to use cxl_pci_cor_error_detected() I think that is
worth doing to avoid this potential confusion.

> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 42b6e0b092d5..b285448c2d9c 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c

>  
> -void cxl_cor_error_detected(struct pci_dev *pdev)
> +void cxl_cor_error_detected(struct device *dev)
>  {
> +	struct pci_dev *pdev = to_pci_dev(dev);
>  	struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> -	struct device *dev = &cxlds->cxlmd->dev;
> +	struct device *cxlmd_dev = &cxlds->cxlmd->dev;
>  
> -	scoped_guard(device, dev) {

Dropping the scoped_guard() to guard() here makes it harder
to tell what is going on.  I guess it isn't quite worth a precursor
patch unless there are several similar refactors to do.

> -		if (!dev->driver) {
> -			dev_warn(&pdev->dev,
> -				 "%s: memdev disabled, abort error handling\n",
> -				 dev_name(dev));
> -			return;
> -		}
> +	guard(device)(cxlmd_dev);
>  
> -		if (cxlds->rcd)
> -			cxl_handle_rdport_errors(cxlds);
> -
> -		cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->serial, cxlds->regs.ras);
> +	if (!cxlmd_dev->driver) {
> +		dev_warn(&pdev->dev, "%s: memdev disabled, abort error handling", dev_name(dev));
> +		return;
>  	}
> +
> +	if (cxlds->rcd)
> +		cxl_handle_rdport_errors(cxlds);
> +
> +	cxl_handle_cor_ras(&cxlds->cxlmd->dev, cxlds->serial, cxlds->regs.ras);
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_cor_error_detected, "CXL");
>  
> -pci_ers_result_t cxl_error_detected(struct pci_dev *pdev,
> -				    pci_channel_state_t state)
> +void pci_cor_error_detected(struct pci_dev *pdev)
>  {
> +	cxl_cor_error_detected(&pdev->dev);
> +}
> +EXPORT_SYMBOL_NS_GPL(pci_cor_error_detected, "CXL");

I'd go with a cxl_pci_ prefix for this to avoid impression it is is more
generic than that.

>
> +
> +pci_ers_result_t pci_error_detected(struct pci_dev *pdev,

Given this is CXL specific, probably makes sense to prefix to give
cxl_pci_error_detected()


Then when we see the call it is more obvious it isn't a 'generic'
PCI thing.


> +				    pci_channel_state_t error)
> +{
> +	pci_ers_result_t rc;
> +
> +	rc = cxl_error_detected(&pdev->dev);
> +	if (rc == PCI_ERS_RESULT_PANIC)
> +		panic("CXL cachemem error.");
> +
> +	return rc;
> +}
> +EXPORT_SYMBOL_NS_GPL(pci_error_detected, "CXL");



  parent reply	other threads:[~2025-09-10 13:23 UTC|newest]

Thread overview: 89+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-08-27  1:35 [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2025-08-27  1:35 ` [PATCH v11 01/23] cxl: Remove ifdef blocks of CONFIG_PCIEAER_CXL from core/pci.c Terry Bowman
2025-08-27  1:35 ` [PATCH v11 02/23] CXL/AER: Remove CONFIG_PCIEAER_CXL and replace with CONFIG_CXL_RAS Terry Bowman
2025-08-29 15:24   ` Jonathan Cameron
2025-08-29 18:16   ` Sathyanarayanan Kuppuswamy
2025-08-27  1:35 ` [PATCH v11 03/23] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
2025-08-28 15:28   ` Dave Jiang
2025-08-27  1:35 ` [PATCH v11 04/23] cxl/pci: Remove unnecessary CXL RCH " Terry Bowman
2025-08-28  8:35   ` Alejandro Lucero Palau
2025-08-28 17:32   ` Dave Jiang
2025-08-27  1:35 ` [PATCH v11 05/23] cxl: Move CXL driver RCH error handling into CONFIG_CXL_RCH_RAS conditional block Terry Bowman
2025-08-28  8:57   ` Alejandro Lucero Palau
2025-09-10 16:43     ` Bowman, Terry
2025-08-29 15:33   ` Jonathan Cameron
2025-09-11 17:48     ` Bowman, Terry
2025-09-11 19:41       ` Dave Jiang
2025-09-15 13:32         ` Bowman, Terry
2025-08-27  1:35 ` [PATCH v11 06/23] CXL/AER: Introduce rch_aer.c into AER driver for handling CXL RCH errors Terry Bowman
2025-08-28 20:53   ` Dave Jiang
2025-08-29  8:39     ` Lukas Wunner
2025-09-10 17:01       ` Bowman, Terry
2025-09-10 17:26         ` Dave Jiang
2025-09-12 13:59       ` Bowman, Terry
2025-09-12 19:09         ` Lukas Wunner
2025-09-10 16:56     ` Bowman, Terry
2025-09-10 12:43   ` Jonathan Cameron
2025-08-27  1:35 ` [PATCH v11 07/23] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
2025-08-27 14:51   ` Lukas Wunner
2025-08-29 15:42     ` Jonathan Cameron
2025-08-29 15:47     ` Jonathan Cameron
2025-08-28 21:07   ` Dave Jiang
2025-09-10 18:11     ` Bowman, Terry
2025-09-10 20:06       ` Dave Jiang
2025-08-27  1:35 ` [PATCH v11 08/23] PCI/CXL: Introduce pcie_is_cxl() Terry Bowman
2025-08-28  8:18   ` Alejandro Lucero Palau
2025-09-10 16:24     ` Bowman, Terry
2025-09-11  3:48       ` Lukas Wunner
2025-09-10 13:10   ` Jonathan Cameron
2025-08-27  1:35 ` [PATCH v11 09/23] PCI/AER: Report CXL or PCIe bus error type in trace logging Terry Bowman
2025-08-27  7:37   ` Lukas Wunner
2025-09-10 15:26     ` Bowman, Terry
2025-09-10 15:33       ` Lukas Wunner
2025-09-11  5:07   ` Lukas Wunner
2025-08-27  1:35 ` [PATCH v11 10/23] CXL/AER: Update PCI class code check to use FIELD_GET() Terry Bowman
2025-08-29 16:03   ` Jonathan Cameron
2025-09-11 18:18     ` Bowman, Terry
2025-08-27  1:35 ` [PATCH v11 11/23] cxl/pci: Update RAS handler interfaces to also support CXL Ports Terry Bowman
2025-08-27  1:35 ` [PATCH v11 12/23] cxl/pci: Log message if RAS registers are unmapped Terry Bowman
2025-08-27  1:35 ` [PATCH v11 13/23] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports Terry Bowman
2025-08-27 11:55   ` Shiju Jose
2025-08-29 16:06     ` Jonathan Cameron
2025-08-27  1:35 ` [PATCH v11 14/23] cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors Terry Bowman
2025-08-27  1:35 ` [PATCH v11 15/23] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
2025-08-28 23:05   ` Dave Jiang
2025-09-10 18:40     ` Bowman, Terry
2025-08-27  1:35 ` [PATCH v11 16/23] cxl/pci: Introduce CXL Endpoint protocol error handlers Terry Bowman
2025-08-27  7:48   ` Lukas Wunner
2025-09-10 13:23   ` Jonathan Cameron [this message]
2025-08-27  1:35 ` [PATCH v11 17/23] CXL/AER: Introduce cxl_aer.c into AER driver for forwarding CXL errors Terry Bowman
2025-08-27  7:56   ` Lukas Wunner
2025-08-27  1:35 ` [PATCH v11 18/23] PCI/AER: Dequeue forwarded CXL error Terry Bowman
2025-08-29  0:43   ` Dave Jiang
2025-08-29  7:10     ` Lukas Wunner
2025-09-16 15:18       ` Bowman, Terry
2025-09-11 14:33     ` Bowman, Terry
2025-09-11 15:41       ` Dave Jiang
2025-09-11 16:47         ` Bowman, Terry
2025-09-11 19:45           ` Dave Jiang
2025-09-10 13:29   ` Jonathan Cameron
2025-08-27  1:35 ` [PATCH v11 19/23] CXL/PCI: Introduce CXL Port protocol error handlers Terry Bowman
2025-08-30  0:17   ` Dave Jiang
2025-08-27  1:35 ` [PATCH v11 20/23] CXL/PCI: Export and rename merge_result() to pci_ers_merge_result() Terry Bowman
2025-08-27  8:04   ` Lukas Wunner
2025-09-10 15:57     ` Bowman, Terry
2025-09-11  3:44       ` Lukas Wunner
2025-08-27 12:19   ` kernel test robot
2025-08-27  1:35 ` [PATCH v11 21/23] CXL/PCI: Introduce CXL uncorrectable protocol error recovery Terry Bowman
2025-09-03 22:30   ` Dave Jiang
2025-09-11 19:19     ` Bowman, Terry
2025-09-11 19:48       ` Dave Jiang
2025-09-10 13:33   ` Jonathan Cameron
2025-08-27  1:35 ` [PATCH v11 22/23] CXL/PCI: Enable CXL protocol errors during CXL Port probe Terry Bowman
2025-09-03 23:23   ` Dave Jiang
2025-08-27  1:35 ` [PATCH v11 23/23] CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup Terry Bowman
2025-09-10 15:07   ` Gregory Price
2025-09-11 20:19     ` Bowman, Terry
2025-08-29  0:07 ` [PATCH v11 00/23] Enable CXL PCIe Port Protocol Error handling and logging Dave Jiang
2025-09-23  3:29 ` Gregory Price
2025-09-23  9:21   ` Srinivasulu Thanneeru

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250910142335.00001f53@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=Benjamin.Cheatham@amd.com \
    --cc=PradeepVineshReddy.Kodamati@amd.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=alucerop@amd.com \
    --cc=bhelgaas@google.com \
    --cc=dan.carpenter@linaro.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=ming.li@zohomail.com \
    --cc=rrichter@amd.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=shiju.jose@huawei.com \
    --cc=terry.bowman@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.