Re: [PATCH v15 4/9] PCI/AER: Dequeue forwarded CXL error

public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed

From: "Bowman, Terry" <terry.bowman@amd.com>
To: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: dave@stgolabs.net, dave.jiang@intel.com,
	alison.schofield@intel.com, dan.j.williams@intel.com,
	bhelgaas@google.com, shiju.jose@huawei.com, ming.li@zohomail.com,
	Smita.KoralahalliChannabasappa@amd.com, rrichter@amd.com,
	dan.carpenter@linaro.org, PradeepVineshReddy.Kodamati@amd.com,
	lukas@wunner.de, Benjamin.Cheatham@amd.com,
	sathyanarayanan.kuppuswamy@linux.intel.com,
	linux-cxl@vger.kernel.org, vishal.l.verma@intel.com,
	alucerop@amd.com, ira.weiny@intel.com,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org
Subject: Re: [PATCH v15 4/9] PCI/AER: Dequeue forwarded CXL error
Date: Tue, 3 Feb 2026 11:00:40 -0600	[thread overview]
Message-ID: <59fb1a00-a9cb-482a-b8be-3982c515cd85@amd.com> (raw)
In-Reply-To: <20260203152609.00004b41@huawei.com>

On 2/3/2026 9:26 AM, Jonathan Cameron wrote:
> On Mon, 2 Feb 2026 20:52:39 -0600
> Terry Bowman <terry.bowman@amd.com> wrote:
> 
>> The AER driver now forwards CXL protocol errors to the CXL driver via a
>> kfifo. The CXL driver must consume these work items and initiate protocol
>> error handling while ensuring the device's RAS mappings remain valid
>> throughout processing.
>>
>> Implement cxl_proto_err_work_fn() to dequeue work items forwarded by the
>> AER service driver. Lock the parent CXL Port device to ensure the CXL
>> device's RAS registers are accessible during handling. Add pdev reference-put
>> to match reference-get in AER driver. This will ensure pdev access after
>> kfifo dequeue. These changes apply to CXL Ports and CXL Endpoints.
>>
>> Update is_cxl_error() to recognize CXL Port devices with errors.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> 
> There are some small functional changes to existing paths (maybe)
> that I think need explanations in this commit message.
> 
> Otherwise, one suggests small simplification.
> 
>> diff --git a/drivers/cxl/core/ras.c b/drivers/cxl/core/ras.c
>> index 74df561ed32e..a6c0bc6d7203 100644
>> --- a/drivers/cxl/core/ras.c
>> +++ b/drivers/cxl/core/ras.c
>> @@ -118,17 +118,6 @@ static void cxl_cper_prot_err_work_fn(struct work_struct *work)
>>  }
>>  static DECLARE_WORK(cxl_cper_prot_err_work, cxl_cper_prot_err_work_fn);
>>  
>> -int cxl_ras_init(void)
>> -{
>> -	return cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work);
>> -}
>> -
>> -void cxl_ras_exit(void)
>> -{
>> -	cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
>> -	cancel_work_sync(&cxl_cper_prot_err_work);
>> -}
>> -
>>  static void cxl_dport_map_ras(struct cxl_dport *dport)
>>  {
>>  	struct cxl_register_map *map = &dport->reg_map;
>> @@ -185,6 +174,50 @@ void devm_cxl_port_ras_setup(struct cxl_port *port)
>>  }
>>  EXPORT_SYMBOL_NS_GPL(devm_cxl_port_ras_setup, "CXL");
>>  
>> +/*
>> + * get_cxl_port - Return the parent CXL Port of a PCI device
>> + * @pdev: PCI device whose parent CXL Port is being queried
>> + *
>> + * Looks up and returns the parent CXL Port associated with @pdev. On
>> + * success, the returned port has its reference count incremented and must
>> + * be released by the caller. Returns NULL if no associated CXL port is
>> + * found.
>> + *
>> + * Return: Pointer to the parent &struct cxl_port or NULL on failure
>> + */
>> +static struct cxl_port *get_cxl_port(struct pci_dev *pdev)
>> +{
>> +	switch (pci_pcie_type(pdev)) {
>> +	case PCI_EXP_TYPE_ROOT_PORT:
>> +	case PCI_EXP_TYPE_DOWNSTREAM:
>> +	{
>> +		struct cxl_dport *dport;
>> +		struct cxl_port *port = find_cxl_port(&pdev->dev, &dport);
> 
> Can you pass NULL for dport?  Looks like it to me as that ultimately ends
> up in match_port_by_dport() and 
> if (ctx->dport)
> 	*ctx->dport = dport;
> 
> where with this as null means ctx->dport == NULL.
> 

Yes.


>> +
>> +		if (!port) {
>> +			pci_err(pdev, "Failed to find the CXL device");
>> +			return NULL;
>> +		}
>> +		return port;
>> +	}
>> +	case PCI_EXP_TYPE_UPSTREAM:
>> +	case PCI_EXP_TYPE_ENDPOINT:
>> +	{
>> +		struct cxl_port *port = find_cxl_port_by_uport(&pdev->dev);
>> +
>> +		if (!port) {
>> +			pci_err(pdev, "Failed to find the CXL device");
>> +			return NULL;
>> +		}
>> +		return port;
>> +	}
>> +	}
>> +
>> +	pr_err_ratelimited("%s: Error - Unsupported device type (%#x)",
>> +			   pci_name(pdev), pci_pcie_type(pdev));
>> +	return NULL;
>> +}
> 
> 
>> +int cxl_ras_init(void)
>> +{
>> +	if (cxl_cper_register_prot_err_work(&cxl_cper_prot_err_work))
>> +		pr_err("Failed to initialize CXL RAS CPER\n");
> 
> Why introduce a new error print?  I don't particularly mind
> but wasn't obvious to me why one has become appropriate and why only
> for the first call here.
> 

This was introduced before v10. 

RAS initialization failure should not fail cxl_core probe.

OSfirst AER support was added in this series in this file next to CPER. 
CPER initialization can fail and OSFirst can not is the reason for only 
one log. 

When I look at this block of code I'm drawn to the return value. It looks 
like it should be a void function. Thoughts?

- Terry


> More importantly - if this failed it would previously have resulted
> in cxl_core_init() failing and things getting torn down.
> 
>> +
>> +	cxl_register_proto_err_work(&cxl_proto_err_work);
>> +
>> +	return 0;
>> +}
>> +
>> +void cxl_ras_exit(void)
>> +{
>> +	cxl_cper_unregister_prot_err_work(&cxl_cper_prot_err_work);
>> +	cancel_work_sync(&cxl_cper_prot_err_work);
>> +
>> +	cxl_unregister_proto_err_work();
>> +	cancel_work_sync(&cxl_proto_err_work);
>> +}
> 
>

next prev parent reply	other threads:[~2026-02-03 17:00 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-03  2:52 [PATCH v15 0/9] Enable CXL PCIe Port Protocol Error handling and logging Terry Bowman
2026-02-03  2:52 ` [PATCH v15 1/9] PCI/AER: Introduce AER-CXL Kfifo in new file, pcie/aer_cxl_vh.c Terry Bowman
2026-02-04  4:25   ` dan.j.williams
2026-02-03  2:52 ` [PATCH v15 2/9] cxl: Update CXL Endpoint tracing Terry Bowman
2026-02-04  4:29   ` dan.j.williams
2026-02-03  2:52 ` [PATCH v15 3/9] PCI/ERR: Introduce PCI_ERS_RESULT_PANIC Terry Bowman
2026-02-03  2:52 ` [PATCH v15 4/9] PCI/AER: Dequeue forwarded CXL error Terry Bowman
2026-02-03 15:26   ` Jonathan Cameron
2026-02-03 17:00     ` Bowman, Terry [this message]
2026-02-05 17:13       ` Jonathan Cameron
2026-02-04  4:46   ` dan.j.williams
2026-02-03  2:52 ` [PATCH v15 5/9] PCI: Establish common CXL Port protocol error flow Terry Bowman
2026-02-03 15:40   ` Jonathan Cameron
2026-02-03 18:21     ` Bowman, Terry
2026-02-05 17:16       ` Jonathan Cameron
2026-02-04  5:08   ` dan.j.williams
2026-02-04 17:11     ` Bowman, Terry
2026-02-04 21:22       ` dan.j.williams
2026-02-05 16:07         ` Bowman, Terry
2026-02-05 21:17           ` dan.j.williams
2026-02-03  2:52 ` [PATCH v15 6/9] cxl: Update error handlers to support CXL Port protocol errors Terry Bowman
2026-02-03 15:54   ` Jonathan Cameron
2026-02-03  2:52 ` [PATCH v15 7/9] cxl: Update Endpoint AER uncorrectable handler Terry Bowman
2026-02-03 16:18   ` Jonathan Cameron
2026-02-03 17:31   ` Dave Jiang
2026-02-03 18:35     ` Bowman, Terry
2026-02-03 18:49       ` Dave Jiang
2026-02-03 20:21         ` Dave Jiang
2026-02-03  2:52 ` [PATCH v15 8/9] cxl: Remove Endpoint AER correctable handler Terry Bowman
2026-02-03 16:27   ` Jonathan Cameron
2026-02-03  2:52 ` [PATCH v15 9/9] cxl: Enable CXL protocol error reporting Terry Bowman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59fb1a00-a9cb-482a-b8be-3982c515cd85@amd.com \
    --to=terry.bowman@amd.com \
    --cc=Benjamin.Cheatham@amd.com \
    --cc=PradeepVineshReddy.Kodamati@amd.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=alucerop@amd.com \
    --cc=bhelgaas@google.com \
    --cc=dan.carpenter@linaro.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=ming.li@zohomail.com \
    --cc=rrichter@amd.com \
    --cc=sathyanarayanan.kuppuswamy@linux.intel.com \
    --cc=shiju.jose@huawei.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox