Linux CXL
 help / color / mirror / Atom feed
From: Karolina Stolarek <karolina.stolarek@oracle.com>
To: Bjorn Helgaas <bhelgaas@google.com>
Cc: linux-pci@vger.kernel.org, Jon Pan-Doh <pandoh@google.com>,
	Terry Bowman <terry.bowman@amd.com>, Len Brown <lenb@kernel.org>,
	James Morse <james.morse@arm.com>,
	Tony Luck <tony.luck@intel.com>, Borislav Petkov <bp@alien8.de>,
	Ben Cheatham <Benjamin.Cheatham@amd.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Shuai Xue <xueshuai@linux.alibaba.com>,
	Liu Xinpeng <liuxp11@chinatelecom.cn>,
	Darren Hart <darren@os.amperecomputing.com>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-cxl@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] PCI/AER: Consolidate CXL, ACPI GHES and native AER reporting paths
Date: Wed, 23 Apr 2025 15:52:27 +0200	[thread overview]
Message-ID: <c51f0f8b-99c2-49df-9112-650b3c5382f4@oracle.com> (raw)
In-Reply-To: <81c040d54209627de2d8b150822636b415834c7f.1742900213.git.karolina.stolarek@oracle.com>

Hi Bjorn,

On 25/03/2025 16:07, Karolina Stolarek wrote:
> Currently, CXL and GHES feature use pci_print_aer() function to
> log AER errors. Its implementation is pretty similar to aer_print_error(),
> duplicating the way how native PCIe devices report errors. We shouldn't
> log messages differently only because they are coming from a different
> code path.
> 
> Make CXL devices and GHES to call aer_print_error() when reporting
> AER errors. Add a wrapper, aer_print_platform_error(), that translates
> aer_capabilities_regs to aer_err_info so we can use pci_print_aer()
> function.
> 
> Signed-off-by: Karolina Stolarek <karolina.stolarek@oracle.com>
> ---
> v2:
>    - Don't expose aer_err_info to the world; as aer_recover_queue()
>      is tightly connected to the ghes code, introduce a wrapper for
>      aer_print_error()
>    - Move aer_err_info memset to the wrapper, don't expect the
>      caller to clean it for us
> 
>    I'm still working on the logs; in the meantime, I think, we can
>    continue reviewing the patch.

I wasn't able to produce logs for the CXL path (that is, Restricted CXL 
Device, as CXL1.1 devices not supported by the driver due to a missing 
functionality; confirmed by Terry) and faced issues when trying to 
inject errors via GHES. Is the lack of logs a blocker for this patch? I 
tested other CXL scenarios and my changes didn't cause regression, as 
far as I know.

All the best,
Karolina

> 
>   drivers/cxl/core/pci.c |  2 +-
>   drivers/pci/pcie/aer.c | 64 ++++++++++++++++++++----------------------
>   include/linux/aer.h    |  4 +--
>   3 files changed, 33 insertions(+), 37 deletions(-)
> 
> diff --git a/drivers/cxl/core/pci.c b/drivers/cxl/core/pci.c
> index 013b869b66cb..9ba711365388 100644
> --- a/drivers/cxl/core/pci.c
> +++ b/drivers/cxl/core/pci.c
> @@ -885,7 +885,7 @@ static void cxl_handle_rdport_errors(struct cxl_dev_state *cxlds)
>   	if (!cxl_rch_get_aer_severity(&aer_regs, &severity))
>   		return;
>   
> -	pci_print_aer(pdev, severity, &aer_regs);
> +	aer_print_platform_error(pdev, severity, &aer_regs);
>   
>   	if (severity == AER_CORRECTABLE)
>   		cxl_handle_rdport_cor_ras(cxlds, dport);
> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c
> index a1cf8c7ef628..ec34bc9b2332 100644
> --- a/drivers/pci/pcie/aer.c
> +++ b/drivers/pci/pcie/aer.c
> @@ -760,47 +760,42 @@ int cper_severity_to_aer(int cper_severity)
>   EXPORT_SYMBOL_GPL(cper_severity_to_aer);
>   #endif
>   
> -void pci_print_aer(struct pci_dev *dev, int aer_severity,
> -		   struct aer_capability_regs *aer)
> +static void populate_aer_err_info(struct aer_err_info *info, int severity,
> +				  struct aer_capability_regs *aer_regs)
>   {
> -	int layer, agent, tlp_header_valid = 0;
> -	u32 status, mask;
> -	struct aer_err_info info;
> -
> -	if (aer_severity == AER_CORRECTABLE) {
> -		status = aer->cor_status;
> -		mask = aer->cor_mask;
> -	} else {
> -		status = aer->uncor_status;
> -		mask = aer->uncor_mask;
> -		tlp_header_valid = status & AER_LOG_TLP_MASKS;
> -	}
> -
> -	layer = AER_GET_LAYER_ERROR(aer_severity, status);
> -	agent = AER_GET_AGENT(aer_severity, status);
> +	int tlp_header_valid;
>   
>   	memset(&info, 0, sizeof(info));
> -	info.severity = aer_severity;
> -	info.status = status;
> -	info.mask = mask;
> -	info.first_error = PCI_ERR_CAP_FEP(aer->cap_control);
>   
> -	pci_err(dev, "aer_status: 0x%08x, aer_mask: 0x%08x\n", status, mask);
> -	__aer_print_error(dev, &info);
> -	pci_err(dev, "aer_layer=%s, aer_agent=%s\n",
> -		aer_error_layer[layer], aer_agent_string[agent]);
> +	info->severity = severity;
> +	info->first_error = PCI_ERR_CAP_FEP(aer_regs->cap_control);
>   
> -	if (aer_severity != AER_CORRECTABLE)
> -		pci_err(dev, "aer_uncor_severity: 0x%08x\n",
> -			aer->uncor_severity);
> +	if (severity == AER_CORRECTABLE) {
> +		info->id = aer_regs->cor_err_source;
> +		info->status = aer_regs->cor_status;
> +		info->mask = aer_regs->cor_mask;
> +	} else {
> +		info->id = aer_regs->uncor_err_source;
> +		info->status = aer_regs->uncor_status;
> +		info->mask = aer_regs->uncor_mask;
> +		tlp_header_valid = info->status & AER_LOG_TLP_MASKS;
> +
> +		if (tlp_header_valid) {
> +			info->tlp_header_valid = tlp_header_valid;
> +			info->tlp = aer_regs->header_log;
> +		}
> +	}
> +}
>   
> -	if (tlp_header_valid)
> -		pcie_print_tlp_log(dev, &aer->header_log, dev_fmt("  "));
> +void aer_print_platform_error(struct pci_dev *pdev, int severity,
> +			      struct aer_capability_regs *aer_regs)
> +{
> +	struct aer_err_info info;
>   
> -	trace_aer_event(dev_name(&dev->dev), (status & ~mask),
> -			aer_severity, tlp_header_valid, &aer->header_log);
> +	populate_aer_err_info(&info, severity, aer_regs);
> +	aer_print_error(pdev, &info);
>   }
> -EXPORT_SYMBOL_NS_GPL(pci_print_aer, "CXL");
> +EXPORT_SYMBOL_NS_GPL(aer_print_platform_error, "CXL");
>   
>   /**
>    * add_error_device - list device to be handled
> @@ -1146,7 +1141,8 @@ static void aer_recover_work_func(struct work_struct *work)
>   			       PCI_SLOT(entry.devfn), PCI_FUNC(entry.devfn));
>   			continue;
>   		}
> -		pci_print_aer(pdev, entry.severity, entry.regs);
> +
> +		aer_print_platform_error(pdev, entry.severity, entry.regs);
>   
>   		/*
>   		 * Memory for aer_capability_regs(entry.regs) is being
> diff --git a/include/linux/aer.h b/include/linux/aer.h
> index 02940be66324..5593352dfb51 100644
> --- a/include/linux/aer.h
> +++ b/include/linux/aer.h
> @@ -64,8 +64,8 @@ static inline int pci_aer_clear_nonfatal_status(struct pci_dev *dev)
>   static inline int pcie_aer_is_native(struct pci_dev *dev) { return 0; }
>   #endif
>   
> -void pci_print_aer(struct pci_dev *dev, int aer_severity,
> -		    struct aer_capability_regs *aer);
> +void aer_print_platform_error(struct pci_dev *pdev, int severity,
> +			      struct aer_capability_regs *aer_regs);
>   int cper_severity_to_aer(int cper_severity);
>   void aer_recover_queue(int domain, unsigned int bus, unsigned int devfn,
>   		       int severity, struct aer_capability_regs *aer_regs);


  parent reply	other threads:[~2025-04-23 13:53 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-03-25 15:07 [PATCH v2] PCI/AER: Consolidate CXL, ACPI GHES and native AER reporting paths Karolina Stolarek
2025-04-01  1:47 ` Jon Pan-Doh
2025-04-04  9:33   ` Karolina Stolarek
2025-04-23 13:52 ` Karolina Stolarek [this message]
2025-04-23 20:31   ` Bjorn Helgaas
2025-04-24  9:01     ` Karolina Stolarek
2025-04-24 17:28       ` Bjorn Helgaas
2025-04-25 10:32         ` Karolina Stolarek
2025-04-25 13:14           ` Jonathan Cameron
2025-04-25 14:12             ` Karolina Stolarek
2025-04-29 15:54               ` Jonathan Cameron
2025-05-05  9:58                 ` Karolina Stolarek
2025-05-05 17:45                   ` Bjorn Helgaas
2025-05-06 17:03                   ` Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c51f0f8b-99c2-49df-9112-650b3c5382f4@oracle.com \
    --to=karolina.stolarek@oracle.com \
    --cc=Benjamin.Cheatham@amd.com \
    --cc=bhelgaas@google.com \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=darren@os.amperecomputing.com \
    --cc=ira.weiny@intel.com \
    --cc=james.morse@arm.com \
    --cc=lenb@kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=liuxp11@chinatelecom.cn \
    --cc=pandoh@google.com \
    --cc=terry.bowman@amd.com \
    --cc=tony.luck@intel.com \
    --cc=xueshuai@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox