public inbox for linux-cxl@vger.kernel.org
 help / color / mirror / Atom feed
From: Terry Bowman <Terry.Bowman@amd.com>
To: Dan Williams <dan.j.williams@intel.com>,
	ming4.li@intel.com, linux-cxl@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-pci@vger.kernel.org,
	dave@stgolabs.net, jonathan.cameron@huawei.com,
	dave.jiang@intel.com, alison.schofield@intel.com,
	vishal.l.verma@intel.com, bhelgaas@google.com,
	mahesh@linux.ibm.com, oohall@gmail.com,
	Benjamin.Cheatham@amd.com, rrichter@amd.com,
	nathan.fontenot@amd.com, smita.koralahallichannabasappa@amd.com
Subject: Re: [PATCH 01/15] cxl/aer/pci: Add CXL PCIe port error handler callbacks in AER service driver
Date: Tue, 22 Oct 2024 08:50:19 -0500	[thread overview]
Message-ID: <0cceca3d-f69e-4277-bc9f-2556fd212ebb@amd.com> (raw)
In-Reply-To: <671705b5bb95b_231229468@dwillia2-xfh.jf.intel.com.notmuch>

Hi Dan,

On 10/21/24 20:53, Dan Williams wrote:
> Terry Bowman wrote:
>> CXL protocol errors are reported to the OS through PCIe correctable and
>> uncorrectable internal errors. However, since CXL PCIe port devices
>> are currently bound to the portdrv driver, there is no mechanism to
>> notify the CXL driver, which is necessary for proper logging and
>> handling.
>>
>> To address this, introduce CXL PCIe port error callbacks along with
>> register/unregister and accessor functions. The callbacks will be
>> invoked by the AER driver in the case protocol errors are reported by
>> a CXL port device.
>>
>> The AER driver callbacks will be used in future patches implementing
>> CXL PCIe port error handling.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> ---
>>  drivers/pci/pcie/aer.c | 22 ++++++++++++++++++++++
>>  include/linux/aer.h    | 14 ++++++++++++++
>>  2 files changed, 36 insertions(+)
>>
>> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
>> index 13b8586924ea..a9792b9576b4 100644
>> --- a/drivers/pci/pcie/aer.c
>> +++ b/drivers/pci/pcie/aer.c
>> @@ -50,6 +50,8 @@ struct aer_rpc {
>>  	DECLARE_KFIFO(aer_fifo, struct aer_err_source, AER_ERROR_SOURCES_MAX);
>>  };
>>  
>> +static struct cxl_port_err_hndlrs cxl_port_hndlrs;
> 
> I think this can afford to splurge on a few more letters and make this
> 
> static struct cxl_port_error_handlers cxl_port_error_handlers;
> 
> 

Ok.

>> +
>>  /* AER stats for the device */
>>  struct aer_stats {
>>  
>> @@ -1078,6 +1080,26 @@ static inline void cxl_rch_handle_error(struct pci_dev *dev,
>>  					struct aer_err_info *info) { }
>>  #endif
>>  
>> +void register_cxl_port_hndlrs(struct cxl_port_err_hndlrs *_cxl_port_hndlrs)
>> +{
>> +	cxl_port_hndlrs.error_detected = _cxl_port_hndlrs->error_detected;
>> +	cxl_port_hndlrs.cor_error_detected = _cxl_port_hndlrs->cor_error_detected;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(register_cxl_port_hndlrs, CXL);
>> +
>> +void unregister_cxl_port_hndlrs(void)
>> +{
>> +	cxl_port_hndlrs.error_detected = NULL;
>> +	cxl_port_hndlrs.cor_error_detected = NULL;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(unregister_cxl_port_hndlrs, CXL);
>> +
>> +struct cxl_port_err_hndlrs *find_cxl_port_hndlrs(void)
>> +{
>> +	return &cxl_port_hndlrs;
>> +}
>> +EXPORT_SYMBOL_NS_GPL(find_cxl_port_hndlrs, CXL);
> 
> I guess I will need to go deeper into the code, but I would not have
> expected that new registration interfaces are needed. Each 'struct
> pci_driver' could optionally include CXL error handlers alongside their
> PCIe error handlers and when CXL AER errors are broadcast only the CXL
> handlers are invoked. I.e. the registration is something like:
> 
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 6af5e0425872..42db26195bda 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -793,6 +793,7 @@ static struct pci_driver pcie_portdriver = {
>         .shutdown       = pcie_portdrv_shutdown,
>  
>         .err_handler    = &pcie_portdrv_err_handler,
> +       .cxl_err_handler = &cxl_portdrv_err_handler,
>  
>         .driver_managed_dma = true,

Ok. I'm thinking to add a definition for 'pci_dev::cxl_err_handler' of type 
'struct pci_error_handler'. 

'struct pci_error_handler' contains a slot reset(), resume(), and mmio_enabled() fn 
pointers that are used in PCIe recovery if available. The plan is for CXL devices to
call panic for UCE fatal and non-fatal but it might be good to use the 
'struct pci_error_handler' type in case there are needs for the other handlers in 
the future. It also makes the logic to access and use the error handlers common, 
requiring less code.

Regards,
Terry

  reply	other threads:[~2024-10-22 13:50 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-08 22:16 [PATCH 0/15] Enable CXL PCIe port protocol error handling and logging Terry Bowman
2024-10-08 22:16 ` [PATCH 01/15] cxl/aer/pci: Add CXL PCIe port error handler callbacks in AER service driver Terry Bowman
2024-10-22  1:53   ` Dan Williams
2024-10-22 13:50     ` Terry Bowman [this message]
2024-10-22 17:09       ` Dan Williams
2024-10-22 18:40         ` Terry Bowman
2024-10-22 23:43           ` Dan Williams
2024-10-24 15:20             ` Bowman, Terry
2024-10-24 19:10               ` Dan Williams
2024-10-08 22:16 ` [PATCH 02/15] cxl/aer/pci: Update is_internal_error() to be callable w/o CONFIG_PCIEAER_CXL Terry Bowman
2024-10-16 16:11   ` Jonathan Cameron
2024-10-22  2:17   ` Dan Williams
2024-10-22 13:54     ` Terry Bowman
2024-10-08 22:16 ` [PATCH 03/15] cxl/aer/pci: Refactor AER driver's existing interfaces to support CXL PCIe ports Terry Bowman
2024-10-10 19:11   ` Bjorn Helgaas
2024-10-14 17:27     ` Terry Bowman
2024-10-08 22:16 ` [PATCH 04/15] cxl/aer/pci: Add CXL PCIe port correctable error support in AER service driver Terry Bowman
2024-10-16 16:22   ` Jonathan Cameron
2024-10-16 17:18     ` Terry Bowman
2024-10-16 17:29       ` Jonathan Cameron
2024-10-08 22:16 ` [PATCH 05/15] cxl/aer/pci: Update AER driver to read UCE fatal status for all CXL PCIe port devices Terry Bowman
2024-10-16 16:28   ` Jonathan Cameron
2024-10-08 22:16 ` [PATCH 06/15] cxl/aer/pci: Introduce PCI_ERS_RESULT_PANIC to pci_ers_result type Terry Bowman
2024-10-16 16:30   ` Jonathan Cameron
2024-10-16 17:31     ` Terry Bowman
2024-10-17 13:31       ` Jonathan Cameron
2024-10-17 14:50         ` Bowman, Terry
2024-10-08 22:16 ` [PATCH 07/15] cxl/aer/pci: Add CXL PCIe port uncorrectable error recovery in AER service driver Terry Bowman
2024-10-16 16:54   ` Jonathan Cameron
2024-10-16 18:07     ` Terry Bowman
2024-10-17 13:43       ` Jonathan Cameron
2024-10-17 16:21         ` Bowman, Terry
2024-10-17 17:08           ` Jonathan Cameron
2024-10-08 22:16 ` [PATCH 08/15] cxl/pci: Change find_cxl_ports() to be non-static Terry Bowman
2024-10-08 22:16 ` [PATCH 09/15] cxl/pci: Map CXL PCIe downstream port RAS registers Terry Bowman
2024-10-16 17:14   ` Jonathan Cameron
2024-10-16 18:16     ` Terry Bowman
2024-10-17 13:50       ` Jonathan Cameron
2024-10-17 16:26         ` Bowman, Terry
2024-10-08 22:16 ` [PATCH 10/15] cxl/pci: Map CXL PCIe upstream " Terry Bowman
2024-10-08 22:16 ` [PATCH 11/15] cxl/pci: Update RAS handler interfaces to support CXL PCIe ports Terry Bowman
2024-10-08 22:16 ` [PATCH 12/15] cxl/pci: Add error handler for CXL PCIe port RAS errors Terry Bowman
2024-10-17 13:57   ` Jonathan Cameron
2024-10-17 16:42     ` Bowman, Terry
2024-10-08 22:16 ` [PATCH 13/15] cxl/pci: Add trace logging " Terry Bowman
2024-10-17 14:04   ` Jonathan Cameron
2024-10-08 22:16 ` [PATCH 14/15] cxl/aer/pci: Export pci_aer_unmask_internal_errors() Terry Bowman
2024-10-16 17:22   ` Jonathan Cameron
2024-10-08 22:16 ` [PATCH 15/15] cxl/pci: Enable internal CE/UCE interrupts for CXL PCIe port devices Terry Bowman
2024-10-16 17:21   ` Jonathan Cameron
2024-10-16 17:24     ` Terry Bowman
2024-10-10 19:07 ` [PATCH 0/15] Enable CXL PCIe port protocol error handling and logging Bjorn Helgaas
2024-10-14 17:22   ` Terry Bowman
2024-10-14 17:29     ` Bjorn Helgaas
2024-10-14 17:33       ` Terry Bowman
2024-10-17 16:34 ` Fan Ni
2024-10-17 17:27   ` Bowman, Terry
2024-10-21 22:19     ` Fan Ni
2024-10-18 23:22 ` Bjorn Helgaas
2024-10-21 19:22   ` Terry Bowman
2024-10-22  1:43 ` Dan Williams
2024-10-22 13:29   ` Terry Bowman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0cceca3d-f69e-4277-bc9f-2556fd212ebb@amd.com \
    --to=terry.bowman@amd.com \
    --cc=Benjamin.Cheatham@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mahesh@linux.ibm.com \
    --cc=ming4.li@intel.com \
    --cc=nathan.fontenot@amd.com \
    --cc=oohall@gmail.com \
    --cc=rrichter@amd.com \
    --cc=smita.koralahallichannabasappa@amd.com \
    --cc=vishal.l.verma@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox