linux-efi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
Cc: <linux-efi@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
	<linux-cxl@vger.kernel.org>, Ard Biesheuvel <ardb@kernel.org>,
	"Alison Schofield" <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Yazen Ghannam <yazen.ghannam@amd.com>
Subject: Re: [PATCH v2 3/4] acpi/ghes, efi/cper: Recognize and process CXL Protocol Errors.
Date: Thu, 15 Feb 2024 12:17:57 +0000	[thread overview]
Message-ID: <20240215121757.00005e72@Huawei.com> (raw)
In-Reply-To: <20240109034755.100555-4-Smita.KoralahalliChannabasappa@amd.com>

On Tue, 9 Jan 2024 03:47:54 +0000
Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com> wrote:

> UEFI v2.10 section N.2.13 defines a CPER record for CXL Protocol errors.
> 
> Add GHES support to detect CXL CPER Protocol record and cache error
> severity, device_id, serial number and CXL RAS capability struct in
> struct cxl_cper_event_info.
> 
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> ---
> v2:
> 	Change to sub-struct for protocol error specific elemenets.
> 	Set serial number unconditionally.
> 	Copy entire cxl_ras_capability_regs struct rather than pointer.
> 	Calculate error severity in efi/cper and change to enum.
> ---
>  drivers/acpi/apei/ghes.c        | 11 ++++++
>  drivers/firmware/efi/cper_cxl.c | 68 +++++++++++++++++++++++++++++++++
>  include/linux/cxl-event.h       | 13 +++++++
>  3 files changed, 92 insertions(+)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index 60b615d361d3..1d4f3d68a0bc 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -714,6 +714,14 @@ static void cxl_cper_post_event(enum cxl_event_type event_type,
>  		cper_callback(event_type, &info);
>  }
>  
> +void cxl_cper_handle_prot_err(struct acpi_hest_generic_data *gdata)
> +{
> +	struct cxl_cper_event_info info;
> +
> +	if (cxl_cper_handle_prot_err_info(gdata, &info))
> +		return;
> +}
> +
>  int cxl_cper_register_callback(cxl_cper_callback callback)
>  {
>  	guard(rwsem_write)(&cxl_cper_rw_sem);
> @@ -768,6 +776,9 @@ static bool ghes_do_proc(struct ghes *ghes,
>  		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
>  			queued = ghes_handle_arm_hw_error(gdata, sev);
>  		}
> +		else if (guid_equal(sec_type, &CPER_SEC_CXL_PROT_ERR)) {
> +			cxl_cper_handle_prot_err(gdata);
> +		}
>  		else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
>  			struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata);
>  
> diff --git a/drivers/firmware/efi/cper_cxl.c b/drivers/firmware/efi/cper_cxl.c
> index 4fd8d783993e..9b9b8c8f1157 100644
> --- a/drivers/firmware/efi/cper_cxl.c
> +++ b/drivers/firmware/efi/cper_cxl.c
> @@ -8,6 +8,7 @@
>   */
>  
>  #include <linux/cper.h>
> +#include <acpi/ghes.h>
>  #include "cper_cxl.h"
>  
>  #define PROT_ERR_VALID_AGENT_TYPE		BIT_ULL(0)
> @@ -44,6 +45,17 @@ enum {
>  	USP,	/* CXL Upstream Switch Port */
>  };
>  
> +static enum cxl_aer_err_type cper_severity_cxl_aer(int cper_severity)
> +{
> +	switch (cper_severity) {
> +	case CPER_SEV_RECOVERABLE:
> +	case CPER_SEV_FATAL:
> +		return CXL_AER_UNCORRECTABLE;
> +	default:
> +		return CXL_AER_CORRECTABLE;
> +	}
> +}
> +
>  void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_err)
>  {
>  	if (prot_err->valid_bits & PROT_ERR_VALID_AGENT_TYPE)
> @@ -176,3 +188,59 @@ void cper_print_prot_err(const char *pfx, const struct cper_sec_prot_err *prot_e
>  			       sizeof(cxl_ras->header_log), 0);
>  	}
>  }
> +
> +int cxl_cper_handle_prot_err_info(struct acpi_hest_generic_data *gdata,
> +				  struct cxl_cper_event_info *info)
> +{
> +	struct cper_sec_prot_err *prot_err = acpi_hest_get_payload(gdata);
> +	struct cper_cxl_event_devid *device_id = &info->rec.hdr.device_id;
> +	struct cper_cxl_event_sn *dev_serial_num =  &info->rec.hdr.dev_serial_num;
> +	size_t size = sizeof(*prot_err) + prot_err->dvsec_len;

Not obvious what this is size of.  I'd rename it to reflect that's only
the distance to the end of the dvsec copy.
Or just compute the pointer below directly by putting this maths inline.

> +
> +	if (!(prot_err->valid_bits & PROT_ERR_VALID_ERROR_LOG)) {
> +		pr_err(FW_WARN "Not a valid protocol error log\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!(prot_err->valid_bits & PROT_ERR_VALID_DEVICE_ID)) {
> +		pr_err(FW_WARN "Not a valid Device ID\n");
"No device ID\n"
is more accurate description.

I'd move this down to next to where we check the data is valid.
So keep each validity check next to where it matters rather than
a bunch of checks up here.  (mostly because I started writing you
didn't check it was valid down there before remembering this
earlier code :)

> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Set device serial number unconditionally.
> +	 *
> +	 * Print a warning message if it is not valid. The device serial
> +	 * number is considered valid for CXL 1.1 device, CXL 2.0 device,
is required for
perhaps?  These all got renamed in the CXL spec.  We should use that naming
because it deliberately avoids limiting to particular spec versions.
	CXL RCD, CXL SLD, CXL LD, 
> +	 * CXL 2.0 Logical device, or CXL 2.0 Fabric Manager Managed
> +	 * Logical Device.

Not sure what this is now called.. :(


> +	 */
> +	if (!(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER) ||
> +	      prot_err->agent_type > 0x4 || prot_err->agent_type == RCH_DP)
> +		pr_warn(FW_WARN "Not a valid serial number\n");
> +
> +	dev_serial_num->lower_dw = prot_err->dev_serial_num.lower_dw;
> +	dev_serial_num->upper_dw = prot_err->dev_serial_num.upper_dw;
> +
> +	/*
> +	 * The device ID or agent address is only valid for CXL 1.1 device,
> +	 * CXL 2.0 device, CXL 2.0 Logical device, CXL 2.0 Fabric Manager
> +	 * Managed Logical Device, CXL Root Port, CXL Downstream Switch
> +	 * Port, or CXL Upstream Switch Port.
> +	 */
> +	if (prot_err->agent_type <= 0x7 && prot_err->agent_type != RCH_DP) {

> +		device_id->segment_num = prot_err->agent_addr.segment;
> +		device_id->bus_num = prot_err->agent_addr.bus;
> +		device_id->device_num = prot_err->agent_addr.device;
> +		device_id->func_num = prot_err->agent_addr.function;
> +	} else {
> +		pr_err(FW_WARN "Not a valid agent type\n");
> +		return -EINVAL;
> +	}
> +
> +	info->p_err.cxl_ras = *(struct cxl_ras_capability_regs *)((long)prot_err + size);

Casting to a long isn't nice. Keep it as a pointer for this maths
a u8 * or void * would work.  Particularly if you did it as something
a bit more self documenting like

u8 *dvsec_start = (u8 *)(prot_err + 1);
u8 *cap_start = dvsec_start + prot_err->dvsec_length;

info->p_err.cxl_ras = *(struct cxl_ras_capability_regs *)cap_start;

> +
> +	info->p_err.severity = cper_severity_cxl_aer(gdata->error_severity);
> +
> +	return 0;
> +}


  reply	other threads:[~2024-02-15 12:18 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-09  3:47 [PATCH v2 0/4] acpi/ghes, cper, cxl: Trace FW-First CXL Protocol Errors Smita Koralahalli
2024-01-09  3:47 ` [PATCH v2 1/4] acpi/ghes, cxl: Create a common CXL struct to handle different CXL CPER records Smita Koralahalli
2024-02-15 11:56   ` Jonathan Cameron
2024-01-09  3:47 ` [PATCH v2 2/4] efi/cper, cxl: Make definitions and structures global Smita Koralahalli
2024-02-15 11:58   ` Jonathan Cameron
2024-02-15 14:47   ` Ard Biesheuvel
2024-01-09  3:47 ` [PATCH v2 3/4] acpi/ghes, efi/cper: Recognize and process CXL Protocol Errors Smita Koralahalli
2024-02-15 12:17   ` Jonathan Cameron [this message]
2024-01-09  3:47 ` [PATCH v2 4/4] acpi/ghes, cxl/pci: Trace FW-First " Smita Koralahalli
2024-02-15 12:22   ` Jonathan Cameron
2024-05-07  9:35 ` [PATCH v2 0/4] acpi/ghes, cper, cxl: " Fabio M. De Francesco
2024-05-16 17:59   ` Smita Koralahalli
  -- strict thread matches above, loose matches on Subject: below --
2024-10-01  0:52 Smita Koralahalli
2024-10-01  0:52 ` [PATCH v2 3/4] acpi/ghes, efi/cper: Recognize and process " Smita Koralahalli
2024-10-01 15:47   ` Ira Weiny
2024-10-01 17:41   ` Fan Ni
2024-10-03 19:19     ` Smita Koralahalli
2024-10-02 23:47   ` Dan Williams
2024-10-03 19:15     ` Smita Koralahalli
2024-10-03 23:21       ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240215121757.00005e72@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=ardb@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=vishal.l.verma@intel.com \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).