Re: [PATCH v6 10/10] RAS: add firmware-first CPER provider

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Ahmed Tiba" <ahmed.tiba@arm.com>
Cc: conor+dt@kernel.org, robh@kernel.org, linux-cxl@vger.kernel.org,
	devicetree@vger.kernel.org
Subject: Re: [PATCH v6 10/10] RAS: add firmware-first CPER provider
Date: Wed, 17 Jun 2026 14:10:56 +0000	[thread overview]
Message-ID: <20260617141057.75CAF1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260617-topics-ahmtib01-ras_ffh_arm_internal_review-v6-10-91f725174aa0@arm.com>

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [High] Unbounded memory leak of the global GHES estatus pool backing memory on probe failure or repeated module bind/unbind.
- [High] Unnecessary `spinlock_irqsave` entirely defeats the threaded IRQ handler and causes high interrupt latency.
- [High] The threaded IRQ handler unconditionally returns `IRQ_HANDLED` even when no error block is present, bypassing the kernel's spurious interrupt detector.
- [Medium] Missing Kconfig dependency on `OF_RESERVED_MEM` causes guaranteed probe failure.

Pre-existing issues:
- [Critical] Integer overflow in `data_length` validation allows massive out-of-bounds kernel memory read when `raw_data_length` is non-zero.
--

commit d0750a54f0673f163ace66f8e9921ee0f1ba6bd4
Author: Ahmed Tiba <ahmed.tiba@arm.com>

RAS: add firmware-first CPER provider

Add a firmware-first CPER provider that reuses the shared GHES helpers,
wire it into the RAS Kconfig/Makefile and document it in the admin guide.

> diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
> index fc4f4bb94a4c6..3c1c63b2fefc3 100644
> --- a/drivers/ras/Kconfig
> +++ b/drivers/ras/Kconfig
> @@ -34,6 +34,17 @@ if RAS
>  source "arch/x86/ras/Kconfig"
>  source "drivers/ras/amd/atl/Kconfig"
>  
> +config RAS_CPER_ESOURCE
> +	bool "Firmware-first CPER error source block provider"
> +	select GHES_CPER_HELPERS

[Severity: Medium]
Will this fail to probe on configurations without OF_RESERVED_MEM?

The driver calls of_reserved_mem_region_to_resource() unconditionally, but
if CONFIG_OF_RESERVED_MEM is disabled, that function is an inline stub that
always returns -ENOSYS. Should there be a "depends on OF_RESERVED_MEM" here
so it doesn't build broken configurations?

> +	help
> +	  Enable support for firmware-first Common Platform Error Record

[ ... ]

> diff --git a/drivers/ras/cper-esource.c b/drivers/ras/cper-esource.c
> new file mode 100644
> index 0000000000000..cc9f5f522400d
> --- /dev/null
> +++ b/drivers/ras/cper-esource.c

[ ... ]

> +static int cper_esource_validate_status(struct cper_esource *ctx)
> +{
> +	size_t estatus_len;
> +
> +	if (!ctx->estatus->block_status)
> +		return -ENOENT;
> +
> +	if (cper_estatus_check_header(ctx->estatus))
> +		return -EINVAL;
> +
> +	if (ctx->estatus->raw_data_length &&
> +	    (ctx->estatus->raw_data_offset > ctx->status_len ||
> +	     ctx->estatus->raw_data_length >
> +	     ctx->status_len - ctx->estatus->raw_data_offset))
> +		return -EINVAL;

[Severity: Critical]
This isn't a bug introduced by this patch, but does the validation logic
have an integer overflow vulnerability?

When raw_data_length is non-zero, this function relies on
cper_estatus_check_header() to validate data_length.

In drivers/firmware/efi/cper.c:cper_estatus_check_header():
        if (estatus->raw_data_length &&
            estatus->raw_data_offset < sizeof(*estatus) + estatus->data_length)
                return -EINVAL;

If firmware provides a massive data_length (e.g., 0xFFFFFFF0), the addition
wraps around to a very small integer, bypassing the check. The subsequent
cper_estatus_check() will then iterate over this unvalidated length, allowing
a massive out-of-bounds kernel memory read past the ctx->estatus buffer.

Should this driver independently verify that
sizeof(*estatus) + data_length <= ctx->status_len to protect against this?

> +
> +	estatus_len = cper_esource_estatus_len(ctx->estatus);

[ ... ]

> +static void cper_esource_process(struct cper_esource *ctx)
> +{
> +	int rc;
> +	int sev;
> +
> +	guard(spinlock_irqsave)(&ctx->lock);

[Severity: High]
Does this unnecessarily disable local interrupts?

The cper_esource_irq() handler is requested as a threaded IRQ with
IRQF_ONESHOT, which naturally serializes execution. Disabling interrupts
across this entire function defeats the purpose of delegating work to a
threaded context and introduces high interrupt latency, especially since it
contains memory copies and potentially blocking reporting helpers.

> +
> +	memcpy(ctx->estatus, ctx->status, ctx->status_len);

[ ... ]

> +static irqreturn_t cper_esource_irq(int irq, void *data)
> +{
> +	struct cper_esource *ctx = data;
> +
> +	cper_esource_process(ctx);
> +
> +	return IRQ_HANDLED;
> +}

[Severity: High]
If cper_esource_process() returns early without doing any work, shouldn't
this propagate an IRQ_NONE return?

If the firmware hasn't set block_status, cper_esource_process() bails out
early without clearing device state or acknowledging the interrupt. By
always returning IRQ_HANDLED here, the spurious interrupt detector is
bypassed. If this is a level-triggered interrupt, won't this result in an
infinite loop of executing the threaded handler and cause a hard CPU lockup?

> +
> +static int cper_esource_init_ack(struct cper_esource *ctx)

[ ... ]

> +static int cper_esource_probe(struct platform_device *pdev)
> +{
> +	struct device *dev = &pdev->dev;
> +	struct cper_esource *ctx;
> +	size_t size;
> +	int source_id;
> +	int rc;

[ ... ]

> +	rc = cper_esource_init_ack(ctx);
> +	if (rc)
> +		return rc;
> +
> +	rc = cper_esource_init_pool();
> +	if (rc)
> +		return rc;

[Severity: High]
Does this cause an unbounded memory leak if the probe sequence fails or
the module is repeatedly bound/unbound?

cper_esource_init_pool() calls ghes_estatus_pool_init(1).

In drivers/acpi/apei/ghes_cper.c:ghes_estatus_pool_init():
        addr = (unsigned long)vmalloc(len);
        ...
        rc = gen_pool_add(ghes_estatus_pool, addr, len, -1);

This dynamically allocates and adds memory to the global pool on every call,
even if the pool already exists. If the driver is unbound and rebound, or if
platform_get_irq() later returns -EPROBE_DEFER, the allocated memory is
permanently leaked.

> +
> +	ctx->estatus = devm_kzalloc(dev, ctx->status_len, GFP_KERNEL);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260617-topics-ahmtib01-ras_ffh_arm_internal_review-v6-0-91f725174aa0@arm.com?part=10

next prev parent reply	other threads:[~2026-06-17 14:10 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-17 13:54 [PATCH v6 00/10] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
2026-06-17 13:54 ` [PATCH v6 01/10] ACPI: APEI: GHES: share macros via a private header Ahmed Tiba
2026-06-17 14:09   ` sashiko-bot
2026-06-17 13:54 ` [PATCH v6 02/10] ACPI: APEI: GHES: move CPER read helpers Ahmed Tiba
2026-06-17 14:15   ` sashiko-bot
2026-06-17 13:54 ` [PATCH v6 03/10] ACPI: APEI: GHES: move GHESv2 ack and alloc helpers Ahmed Tiba
2026-06-17 13:54 ` [PATCH v6 04/10] ACPI: APEI: GHES: move estatus cache helpers Ahmed Tiba
2026-06-17 14:10   ` sashiko-bot
2026-06-17 13:54 ` [PATCH v6 05/10] ACPI: APEI: GHES: move vendor record helpers Ahmed Tiba
2026-06-17 13:54 ` [PATCH v6 06/10] ACPI: APEI: GHES: move CXL CPER helpers Ahmed Tiba
2026-06-17 14:14   ` sashiko-bot
2026-06-17 13:54 ` [PATCH v6 07/10] ACPI: APEI: introduce GHES helper Ahmed Tiba
2026-06-17 17:17   ` Julian Braha
2026-06-17 13:54 ` [PATCH v6 08/10] ACPI: APEI: share GHES CPER helpers Ahmed Tiba
2026-06-17 14:11   ` sashiko-bot
2026-06-17 13:54 ` [PATCH v6 09/10] dt-bindings: firmware: add arm,ras-cper Ahmed Tiba
2026-06-17 13:54 ` [PATCH v6 10/10] RAS: add firmware-first CPER provider Ahmed Tiba
2026-06-17 14:10   ` sashiko-bot [this message]
2026-06-17 17:40   ` Julian Braha

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260617141057.75CAF1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=ahmed.tiba@arm.com \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=robh@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.