All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Terry Bowman <terry.bowman@amd.com>
Cc: <dave@stgolabs.net>, <dave.jiang@intel.com>,
	<alison.schofield@intel.com>, <dan.j.williams@intel.com>,
	<bhelgaas@google.com>, <shiju.jose@huawei.com>,
	<ming.li@zohomail.com>, <Smita.KoralahalliChannabasappa@amd.com>,
	<rrichter@amd.com>, <dan.carpenter@linaro.org>,
	<PradeepVineshReddy.Kodamati@amd.com>, <lukas@wunner.de>,
	<Benjamin.Cheatham@amd.com>,
	<sathyanarayanan.kuppuswamy@linux.intel.com>,
	<linux-cxl@vger.kernel.org>, <alucerop@amd.com>,
	<ira.weiny@intel.com>, <linux-kernel@vger.kernel.org>,
	<linux-pci@vger.kernel.org>
Subject: Re: [RESEND v13 21/25] PCI/AER: Dequeue forwarded CXL error
Date: Tue, 4 Nov 2025 18:40:09 +0000	[thread overview]
Message-ID: <20251104184009.00005d27@huawei.com> (raw)
In-Reply-To: <20251104170305.4163840-22-terry.bowman@amd.com>

On Tue, 4 Nov 2025 11:03:01 -0600
Terry Bowman <terry.bowman@amd.com> wrote:

> The AER driver now forwards CXL protocol errors to the CXL driver via a
> kfifo. The CXL driver must consume these work items, initiate protocol
> error handling, and ensure RAS mappings remain valid throughout processing.
> 
> Implement cxl_proto_err_work_fn() to dequeue work items forwarded by the
> AER service driver and begin protocol error processing by calling
> cxl_handle_proto_error().
> 
> Add a PCI device lock on &pdev->dev within cxl_proto_err_work_fn() to
> keep the PCI device structure valid during handling. Locking an Endpoint
> will also defer RAS unmapping until the device is unlocked.
> 
> For Endpoints, add a lock on CXL memory device cxlds->dev. The CXL memory
> device structure holds the RAS register reference needed during error
> handling.
> 
> Add lock for the parent CXL Port for Root Ports, Downstream Ports, and
> Upstream Ports to prevent destruction of structures holding mapped RAS
> addresses while they are in use.
> 
> Invoke cxl_do_recovery() for uncorrectable errors. Treat this as a stub for
> now; implement its functionality in a future patch.
> 
> Export pci_clean_device_status() to enable cleanup of AER status following
> error handling.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
> 
Various comments inline.
> 
> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
> index 142ca8794107..5bc144cde0ee 100644
> --- a/drivers/cxl/core/ras.c
> +++ b/drivers/cxl/core/ras.c
> @@ -117,17 +117,6 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
>  }
>  static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
>  
> -int cxl_ras_init(void)
> -{
> -	return cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
> -}
> -
> -void cxl_ras_exit(void)
> -{
> -	cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
> -	cancel_work_sync(&cxl_cper_prot_err_work);
> -}
> -
>  static bool is_pcie_endpoint(struct pci_dev *pdev)
>  {
>  	return pci_pcie_type(pdev) == PCI_EXP_TYPE_ENDPOINT;
> @@ -178,6 +167,51 @@ static void __iomem *cxl_get_ras_base(struct device *dev)
>  	return NULL;
>  }
>  
> +/*
> + * Return 'struct cxl_port *' parent CXL port of dev's
> + *
> + * Reference count increments on success
> + *
> + * dev: Find the parent port of this dev

pdev. 

Generally I'd prefer kernel-doc style even for non exported
/ exposed functions.  Makes it easy to check for stuff like
this as the script will moan at you.

> + */
> +static struct cxl_port *get_cxl_port(struct pci_dev *pdev)
> +{
> +	switch (pci_pcie_type(pdev)) {
> +	case PCI_EXP_TYPE_ROOT_PORT:
> +	case PCI_EXP_TYPE_DOWNSTREAM:
> +	{
> +		struct cxl_dport *dport;
> +		struct cxl_port *port = find_cxl_port(&pdev->dev, &dport);
> +
> +		if (!port) {
> +			pci_err(pdev, "Failed to find the CXL device");
> +			return NULL;
> +		}
> +		return port;
> +	}
> +	case PCI_EXP_TYPE_UPSTREAM:
> +	{
> +		struct cxl_port *port = find_cxl_port_by_uport(&pdev->dev);
> +
> +		if (!port) {
> +			pci_err(pdev, "Failed to find the CXL device");
> +			return NULL;
> +		}
> +		return port;
> +	}
> +	case PCI_EXP_TYPE_ENDPOINT:
> +	{
> +		struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +		struct cxl_port *port = cxlds->cxlmd->endpoint;
> +
> +		get_device(&port->dev);
> +		return port;
> +	}
> +	}
> +	pci_warn_once(pdev, "Error: Unsupported device type (%X)", pci_pcie_type(pdev));
> +	return NULL;
> +}
> +
>  /**
>   * cxl_dport_init_ras_reporting - Setup CXL RAS report on this dport
>   * @dport: the cxl_dport that needs to be initialized
> @@ -212,6 +246,23 @@ void cxl_uport_init_ras_reporting(struct cxl_port *port,
>  }
>  EXPORT_SYMBOL_NS_GPL(cxl_uport_init_ras_reporting, "CXL");
>  
> +static bool device_lock_if(struct device *dev, bool cond)
> +{
> +	if (cond)
> +		device_lock(dev);
> +	return cond;
> +}
> +
> +static void device_unlock_if(struct device *dev, bool take)
> +{
> +	if (take)
> +		device_unlock(dev);
> +}

See below. To me these are too weird to wrap up.  Open code them inline
where we can see what they are doing.

> +static void cxl_proto_err_work_fn(struct work_struct *work)
> +{
> +	struct cxl_proto_err_work_data wd;
> +
> +	while (cxl_proto_err_kfifo_get(&wd)) {
> +		struct pci_dev *pdev __free(pci_dev_put) = pci_dev_get(wd.pdev);
> +		struct device *cxlmd_dev;
> +
> +		if (!pdev) {
> +			pr_err_ratelimited("NULL PCI device passed in AER-CXL KFIFO\n");
> +			continue;
> +		}
> +
> +		guard(device)(&pdev->dev);
> +		if (is_pcie_endpoint(pdev)) {
> +			struct cxl_dev_state *cxlds = pci_get_drvdata(pdev);
> +
> +			if (!cxl_pci_drv_bound(pdev))
> +				return;
> +			cxlmd_dev = &cxlds->cxlmd->dev;
> +			device_lock_if(cxlmd_dev, cxlmd_dev);

As below. Too odd.  Also needs comments to explain why conditionally locking it
would be useful.

> +		} else {
> +			cxlmd_dev = NULL;

Set it to NULL at declaration and drop this else leg.

> +		}
> +
> +		struct cxl_port *port __free(put_cxl_port) = get_cxl_port(pdev);
> +		if (!port)
> +			return;
> +		guard(device)(&port->dev);
> +
> +		cxl_handle_proto_error(&wd);
> +		device_unlock_if(cxlmd_dev, cxlmd_dev);
This is too odd to wrap up like that.  Particularly given the
very generic sounding device_unlock_if() naming.

> +	}
> +}


  reply	other threads:[~2025-11-04 18:40 UTC|newest]

Thread overview: 104+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-04 17:02 [RESEND v13 00/25] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2025-11-04 17:02 ` [RESEND v13 01/25] CXL/PCI: Move CXL DVSEC definitions into uapi/linux/pci_regs.h Terry Bowman
2025-11-04 17:50   ` Jonathan Cameron
2025-11-19  3:19   ` dan.j.williams
2025-12-08 18:04   ` Bjorn Helgaas
2025-12-08 22:13     ` Bowman, Terry
2025-11-04 17:02 ` [RESEND v13 02/25] PCI/CXL: Introduce pcie_is_cxl() Terry Bowman
2025-11-04 17:52   ` Jonathan Cameron
2025-11-19  3:19   ` dan.j.williams
2025-11-19 15:55     ` Bowman, Terry
2025-11-19 23:34       ` dan.j.williams
2025-11-21 20:31   ` Gregory Price
2025-11-04 17:02 ` [RESEND v13 03/25] cxl/pci: Remove unnecessary CXL Endpoint handling helper functions Terry Bowman
2025-11-04 17:53   ` Jonathan Cameron
2025-11-19  3:20   ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 04/25] cxl/pci: Remove unnecessary CXL RCH " Terry Bowman
2025-11-19  3:20   ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 05/25] cxl: Remove CXL VH handling in CONFIG_PCIEAER_CXL conditional blocks from core/pci.c Terry Bowman
2025-11-19  3:20   ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 06/25] cxl: Move CXL driver's RCH error handling into core/ras_rch.c Terry Bowman
2025-11-04 18:03   ` Jonathan Cameron
2025-11-19  3:20   ` dan.j.williams
2025-11-19 16:07     ` Bowman, Terry
2025-11-04 17:02 ` [RESEND v13 07/25] CXL/AER: Replace device_lock() in cxl_rch_handle_error_iter() with guard() lock Terry Bowman
2025-11-04 18:05   ` Jonathan Cameron
2025-11-04 19:53   ` Dave Jiang
2025-11-19  3:20   ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 08/25] CXL/AER: Move AER drivers RCH error handling into pcie/aer_cxl_rch.c Terry Bowman
2025-11-19  3:20   ` dan.j.williams
2025-11-19  8:26     ` Lukas Wunner
2025-11-19 23:36       ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 09/25] PCI/AER: Report CXL or PCIe bus error type in trace logging Terry Bowman
2025-11-04 18:08   ` Jonathan Cameron
2025-11-04 18:26   ` Bjorn Helgaas
2025-11-04 17:02 ` [RESEND v13 10/25] cxl/pci: Update RAS handler interfaces to also support CXL Ports Terry Bowman
2025-11-04 18:10   ` Jonathan Cameron
2025-11-11  8:17   ` Alison Schofield
2025-11-19  3:19   ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 11/25] cxl/pci: Log message if RAS registers are unmapped Terry Bowman
2025-11-19  3:27   ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 12/25] cxl/pci: Unify CXL trace logging for CXL Endpoints and CXL Ports Terry Bowman
2025-11-19 21:23   ` dan.j.williams
2025-11-19 22:02     ` Bowman, Terry
2025-11-19 23:40       ` dan.j.williams
2025-11-21 14:56         ` Bowman, Terry
2025-11-04 17:02 ` [RESEND v13 13/25] cxl/pci: Update cxl_handle_cor_ras() to return early if no RAS errors Terry Bowman
2025-11-05  8:30   ` Alejandro Lucero Palau
2025-11-19 22:00   ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 14/25] cxl/pci: Map CXL Endpoint Port and CXL Switch Port RAS registers Terry Bowman
2025-11-04 18:15   ` Jonathan Cameron
2025-11-04 20:03   ` Dave Jiang
2025-11-11  8:23   ` Alison Schofield
2025-11-04 17:02 ` [RESEND v13 15/25] CXL/PCI: Introduce PCI_ERS_RESULT_PANIC Terry Bowman
2025-11-04 19:03   ` Bjorn Helgaas
2025-11-20  0:17   ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 16/25] CXL/AER: Introduce pcie/aer_cxl_vh.c in AER driver for forwarding CXL errors Terry Bowman
2025-11-20  0:44   ` dan.j.williams
2025-11-20  0:53   ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 17/25] cxl: Introduce cxl_pci_drv_bound() to check for bound driver Terry Bowman
2025-11-05 17:51   ` Gregory Price
2025-11-05 19:03     ` Gregory Price
2025-11-05 22:26       ` Gregory Price
2025-11-06 17:11         ` Gregory Price
2025-11-06 23:32         ` Bowman, Terry
2025-11-11  8:33   ` Alison Schofield
2025-11-13 21:42     ` Alison Schofield
2025-11-13 22:39       ` Bowman, Terry
2025-11-20  1:24   ` dan.j.williams
2025-11-04 17:02 ` [RESEND v13 18/25] cxl: Change CXL handlers to use guard() instead of scoped_guard() Terry Bowman
2025-11-04 18:18   ` Jonathan Cameron
2025-11-04 20:15   ` Dave Jiang
2025-11-04 17:02 ` [RESEND v13 19/25] cxl/pci: Introduce CXL protocol error handlers for Endpoints Terry Bowman
2025-11-04 18:29   ` Jonathan Cameron
2025-11-04 19:09   ` Bjorn Helgaas
2025-11-04 17:03 ` [RESEND v13 20/25] CXL/PCI: Introduce CXL Port protocol error handlers Terry Bowman
2025-11-04 18:32   ` Jonathan Cameron
2025-11-04 21:20   ` Dave Jiang
2025-11-04 21:27     ` Bowman, Terry
2025-11-04 23:39       ` Dave Jiang
2025-11-04 17:03 ` [RESEND v13 21/25] PCI/AER: Dequeue forwarded CXL error Terry Bowman
2025-11-04 18:40   ` Jonathan Cameron [this message]
2025-11-04 18:45   ` Bjorn Helgaas
2025-11-20  3:33   ` dan.j.williams
2025-11-04 17:03 ` [RESEND v13 22/25] CXL/PCI: Export and rename merge_result() to pci_ers_merge_result() Terry Bowman
2025-11-04 18:41   ` Jonathan Cameron
2025-11-04 19:03   ` Bjorn Helgaas
2025-11-14 15:20     ` Bowman, Terry
2025-11-14 16:09       ` Jonathan Cameron
2025-11-04 17:03 ` [RESEND v13 23/25] CXL/PCI: Introduce CXL uncorrectable protocol error recovery Terry Bowman
2025-11-04 18:47   ` Jonathan Cameron
2025-11-04 23:43     ` Dave Jiang
2025-11-05 14:59       ` Bowman, Terry
2025-11-05 16:10         ` Dave Jiang
2025-11-11  8:37   ` Alison Schofield
2025-12-08 18:40   ` Bjorn Helgaas
2025-11-04 17:03 ` [RESEND v13 24/25] CXL/PCI: Enable CXL protocol errors during CXL Port probe Terry Bowman
2025-11-04 17:03 ` [RESEND v13 25/25] CXL/PCI: Disable CXL protocol error interrupts during CXL Port cleanup Terry Bowman
2025-11-20  3:10   ` dan.j.williams
2025-12-04 17:08     ` Bowman, Terry
2025-11-04 19:11 ` [RESEND v13 00/25] Enable CXL PCIe Port Protocol Error handling and logging Bjorn Helgaas
2025-11-04 21:54   ` Bowman, Terry
2025-11-04 22:12     ` Bjorn Helgaas
2025-12-04 17:30       ` Bowman, Terry
2025-12-08 18:42         ` Bjorn Helgaas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251104184009.00005d27@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=Benjamin.Cheatham@amd.com \
    --cc=PradeepVineshReddy.Kodamati@amd.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=alucerop@amd.com \
    --cc=bhelgaas@google.com \
    --cc=dan.carpenter@linaro.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=ming.li@zohomail.com \
    --cc=rrichter@amd.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=shiju.jose@huawei.com \
    --cc=terry.bowman@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.