All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marc Herbert <marc.herbert@linux.intel.com>
To: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>,
	linux-efi@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-cxl@vger.kernel.org
Cc: Ard Biesheuvel <ardb@kernel.org>,
	Alison Schofield <alison.schofield@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Ira Weiny <ira.weiny@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Jonathan Cameron <Jonathan.Cameron@huawei.com>,
	Yazen Ghannam <yazen.ghannam@amd.com>,
	Terry Bowman <terry.bowman@amd.com>,
	Dave Jiang <dave.jiang@intel.com>,
	tony.luck@intel.com, Gregory Price <gourry@gourry.net>
Subject: "invalid agent type: 1" in acpi/ghes, cper: Recognize and cache CXL Protocol errors
Date: Tue, 22 Jul 2025 12:24:39 -0700	[thread overview]
Message-ID: <074f5f77-7bef-4857-97fe-b68ee9b0afaf@linux.intel.com> (raw)
In-Reply-To: <20250123084421.127697-5-Smita.KoralahalliChannabasappa@amd.com>

Hi Smita,

  The code below triggers the error "invalid agent type: 1" in Intel
validation (internal issue 15018133056)

It's not clear to anyone we asked why you did not include RCH_DP in
the `switch (prot_err->agent_type)` in cxl_cper_post_prot_err() below.

I can see how RCH_DP is special in cxl_cper_PRINT_prot_err() and I can
even understand (despite my near-zero CPER knowledge) some of the
special cases there. But in cxl_cper_post_prot_err() here, it's not
clear why RCH_DP would be rejected. Could this be an oversight? If not,
a comment with a short explanation would not hurt.

Marc

PS: the newer cxl_cper_post_prot_err() code is longer and does
something with `wd`. That's irrelevant for this test case since the
function errors and returns earlier anyway.


On 2025-01-23 00:44, Smita Koralahalli wrote:
> Add support in GHES to detect and process CXL CPER Protocol errors, as
> defined in UEFI v2.10, section N.2.13.
> 
> Define struct cxl_cper_prot_err_work_data to cache CXL protocol error
> information, including RAS capabilities and severity, for further
> handling.
> 
> These cached CXL CPER records will later be processed by workqueues
> within the CXL subsystem.
> 
> Signed-off-by: Smita Koralahalli <Smita.KoralahalliChannabasappa@amd.com>
> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  drivers/acpi/apei/ghes.c | 54 ++++++++++++++++++++++++++++++++++++++++
>  include/cxl/event.h      |  6 +++++
>  2 files changed, 60 insertions(+)
> 
> diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
> index b72772494655..4d725d988c43 100644
> --- a/drivers/acpi/apei/ghes.c
> +++ b/drivers/acpi/apei/ghes.c
> @@ -674,6 +674,56 @@ static void ghes_defer_non_standard_event(struct acpi_hest_generic_data *gdata,
>  	schedule_work(&entry->work);
>  }
>  
> +static void cxl_cper_post_prot_err(struct cxl_cper_sec_prot_err *prot_err,
> +				   int severity)
> +{
> +#ifdef CONFIG_ACPI_APEI_PCIEAER
> +	struct cxl_cper_prot_err_work_data wd;
> +	u8 *dvsec_start, *cap_start;
> +
> +	if (!(prot_err->valid_bits & PROT_ERR_VALID_AGENT_ADDRESS)) {
> +		pr_err_ratelimited("CXL CPER invalid agent type\n");
> +		return;
> +	}
> +
> +	if (!(prot_err->valid_bits & PROT_ERR_VALID_ERROR_LOG)) {
> +		pr_err_ratelimited("CXL CPER invalid protocol error log\n");
> +		return;
> +	}
> +
> +	if (prot_err->err_len != sizeof(struct cxl_ras_capability_regs)) {
> +		pr_err_ratelimited("CXL CPER invalid RAS Cap size (%u)\n",
> +				   prot_err->err_len);
> +		return;
> +	}
> +
> +	if (!(prot_err->valid_bits & PROT_ERR_VALID_SERIAL_NUMBER))
> +		pr_warn(FW_WARN "CXL CPER no device serial number\n");
> +
> +	switch (prot_err->agent_type) {
> +	case RCD:
> +	case DEVICE:
> +	case LD:
> +	case FMLD:
> +	case RP:
> +	case DSP:
> +	case USP:
> +		memcpy(&wd.prot_err, prot_err, sizeof(wd.prot_err));
> +
> +		dvsec_start = (u8 *)(prot_err + 1);
> +		cap_start = dvsec_start + prot_err->dvsec_len;
> +
> +		memcpy(&wd.ras_cap, cap_start, sizeof(wd.ras_cap));
> +		wd.severity = cper_severity_to_aer(severity);
> +		break;
> +	default:
> +		pr_err_ratelimited("CXL CPER invalid agent type: %d\n",
> +				   prot_err->agent_type);
> +		return;
> +	}
> +#endif
> +}
> +
>  /* Room for 8 entries for each of the 4 event log queues */
>  #define CXL_CPER_FIFO_DEPTH 32
>  DEFINE_KFIFO(cxl_cper_fifo, struct cxl_cper_work_data, CXL_CPER_FIFO_DEPTH);
> @@ -777,6 +827,10 @@ static bool ghes_do_proc(struct ghes *ghes,
>  		}
>  		else if (guid_equal(sec_type, &CPER_SEC_PROC_ARM)) {
>  			queued = ghes_handle_arm_hw_error(gdata, sev, sync);
> +		} else if (guid_equal(sec_type, &CPER_SEC_CXL_PROT_ERR)) {
> +			struct cxl_cper_sec_prot_err *prot_err = acpi_hest_get_payload(gdata);
> +
> +			cxl_cper_post_prot_err(prot_err, gdata->error_severity);
>  		} else if (guid_equal(sec_type, &CPER_SEC_CXL_GEN_MEDIA_GUID)) {
>  			struct cxl_cper_event_rec *rec = acpi_hest_get_payload(gdata);
>  
> diff --git a/include/cxl/event.h b/include/cxl/event.h
> index 66d85fc87701..ee1c3dec62fa 100644
> --- a/include/cxl/event.h
> +++ b/include/cxl/event.h
> @@ -232,6 +232,12 @@ struct cxl_ras_capability_regs {
>  	u32 header_log[16];
>  };
>  
> +struct cxl_cper_prot_err_work_data {
> +	struct cxl_cper_sec_prot_err prot_err;
> +	struct cxl_ras_capability_regs ras_cap;
> +	int severity;
> +};
> +
>  #ifdef CONFIG_ACPI_APEI_GHES
>  int cxl_cper_register_work(struct work_struct *work);
>  int cxl_cper_unregister_work(struct work_struct *work);

  parent reply	other threads:[~2025-07-22 19:24 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-23  8:44 [PATCH v6 0/6] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors Smita Koralahalli
2025-01-23  8:44 ` [PATCH v6 1/6] efi/cper, cxl: Prefix protocol error struct and function names with cxl_ Smita Koralahalli
2025-02-04  0:12   ` Fan Ni
2025-02-05 19:17   ` Gregory Price
2025-01-23  8:44 ` [PATCH v6 2/6] efi/cper, cxl: Make definitions and structures global Smita Koralahalli
2025-02-04  0:16   ` Fan Ni
2025-02-05 19:16   ` Gregory Price
2025-02-06 10:54     ` Jonathan Cameron
2025-02-06 16:14       ` Gregory Price
2025-02-06 17:14         ` Konstantin Ryabitsev
2025-02-06 17:32           ` Gregory Price
2025-01-23  8:44 ` [PATCH v6 3/6] efi/cper, cxl: Remove cper_cxl.h Smita Koralahalli
2025-02-04  0:20   ` Fan Ni
2025-02-05 19:18   ` Gregory Price
2025-01-23  8:44 ` [PATCH v6 4/6] acpi/ghes, cper: Recognize and cache CXL Protocol errors Smita Koralahalli
2025-02-03 18:59   ` Luck, Tony
2025-02-05 19:35     ` Gregory Price
2025-02-05 22:21   ` Dan Williams
2025-07-22 19:24   ` Marc Herbert [this message]
2025-07-23  7:13     ` "invalid agent type: 1" in " Marc Herbert
2025-07-24 14:49       ` Fabio M. De Francesco
2025-07-25 11:04         ` Jonathan Cameron
2025-07-28 15:01       ` dan.j.williams
2025-07-28 16:25     ` Koralahalli Channabasappa, Smita
2025-07-29  5:41       ` Marc Herbert
2025-07-29 15:52         ` Koralahalli Channabasappa, Smita
2025-07-29 17:39           ` dan.j.williams
2025-01-23  8:44 ` [PATCH v6 5/6] acpi/ghes, cxl/pci: Process CXL CPER Protocol Errors Smita Koralahalli
2025-02-03 19:03   ` Luck, Tony
2025-02-12 21:04     ` Koralahalli Channabasappa, Smita
2025-02-05 19:50   ` Gregory Price
2025-02-05 22:58   ` Dan Williams
2025-02-12 20:57     ` Koralahalli Channabasappa, Smita
2025-01-23  8:44 ` [PATCH v6 6/6] cxl/pci: Add trace logging for CXL PCIe Port RAS errors Smita Koralahalli
2025-01-24 16:36   ` Ira Weiny
2025-02-05 20:01   ` Gregory Price
2025-02-05 23:06   ` Dan Williams
2025-02-03 17:09 ` [PATCH v6 0/6] acpi/ghes, cper, cxl: Process CXL CPER Protocol errors Dave Jiang
2025-02-06 18:38 ` Dave Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=074f5f77-7bef-4857-97fe-b68ee9b0afaf@linux.intel.com \
    --to=marc.herbert@linux.intel.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=Smita.KoralahalliChannabasappa@amd.com \
    --cc=alison.schofield@intel.com \
    --cc=ardb@kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=gourry@gourry.net \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=linux-efi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=terry.bowman@amd.com \
    --cc=tony.luck@intel.com \
    --cc=vishal.l.verma@intel.com \
    --cc=yazen.ghannam@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.