Re: [PATCH v6 10/10] RAS: add firmware-first CPER provider

Devicetree
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Ahmed Tiba" <ahmed.tiba@arm.com>
Cc: conor+dt@kernel.org, robh@kernel.org, linux-cxl@vger.kernel.org,
	devicetree@vger.kernel.org
Subject: Re: [PATCH v6 10/10] RAS: add firmware-first CPER provider
Date: Wed, 17 Jun 2026 14:10:56 +0000	[thread overview]
Message-ID: <20260617141057.75CAF1F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260617-topics-ahmtib01-ras_ffh_arm_internal_review-v6-10-91f725174aa0@arm.com>

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:

New issues:
- [High] Unbounded memory leak of the global GHES estatus pool backing memory on probe failure or repeated module bind/unbind.
- [High] Unnecessary `spinlock_irqsave` entirely defeats the threaded IRQ handler and causes high interrupt latency.
- [High] The threaded IRQ handler unconditionally returns `IRQ_HANDLED` even when no error block is present, bypassing the kernel's spurious interrupt detector.
- [Medium] Missing Kconfig dependency on `OF_RESERVED_MEM` causes guaranteed probe failure.

Pre-existing issues:
- [Critical] Integer overflow in `data_length` validation allows massive out-of-bounds kernel memory read when `raw_data_length` is non-zero.
--

commit d0750a54f0673f163ace66f8e9921ee0f1ba6bd4
Author: Ahmed Tiba <ahmed.tiba@arm.com>

RAS: add firmware-first CPER provider

Add a firmware-first CPER provider that reuses the shared GHES helpers,
wire it into the RAS Kconfig/Makefile and document it in the admin guide.

> diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
> index fc4f4bb94a4c6..3c1c63b2fefc3 100644
> --- a/drivers/ras/Kconfig
> +++ b/drivers/ras/Kconfig
> @@ -34,6 +34,17 @@ if RAS
>  source "arch/x86/ras/Kconfig"
>  source "drivers/ras/amd/atl/Kconfig"
>  
> +config RAS_CPER_ESOURCE
> +	bool "Firmware-first CPER error source block provider"
> +	select GHES_CPER_HELPERS

[Severity: Medium]
Will this fail to probe on configurations without OF_RESERVED_MEM?

The driver calls of_reserved_mem_region_to_resource() unconditionally, but
if CONFIG_OF_RESERVED_MEM is disabled, that function is an inline stub that
always returns -ENOSYS. Should there be a "depends on OF_RESERVED_MEM" here
so it doesn't build broken configurations?

> +	help
> +	  Enable support for firmware-first Common Platform Error Record

[ ... ]

> diff --git a/drivers/ras/cper-esource.c b/drivers/ras/cper-esource.c
> new file mode 100644
> index 0000000000000..cc9f5f522400d
> --- /dev/null
> +++ b/drivers/ras/cper-esource.c

[ ... ]

> +static int cper_esource_validate_status(struct cper_esource *ctx)
> +{
> +	size_t estatus_len;
> +
> +	if (!ctx->estatus->block_status)
> +		return -ENOENT;
> +
> +	if (cper_estatus_check_header(ctx->estatus))
> +		return -EINVAL;
> +
> +	if (ctx->estatus->raw_data_length &&
> +	    (ctx->estatus->raw_data_offset > ctx->status_len ||
> +	     ctx->estatus->raw_data_length >
> +	     ctx->status_len - ctx->estatus->raw_data_offset))
> +		return -EINVAL;

[Severity: Critical]
This isn't a bug introduced by this patch, but does the validation logic
have an integer overflow vulnerability?

When raw_data_length is non-zero, this function relies on
cper_estatus_check_header() to validate data_length.

In drivers/firmware/efi/cper.c:cper_estatus_check_header():
        if (estatus->raw_data_length &&
            estatus->raw_data_offset < sizeof(*estatus) + estatus->data_length)
                return -EINVAL;

If firmware provides a massive data_length (e.g., 0xFFFFFFF0), the addition
wraps around to a very small integer, bypassing the check. The subsequent
cper_estatus_check() will then iterate over this unvalidated length, allowing
a massive out-of-bounds kernel memory read past the ctx->estatus buffer.

Should this driver independently verify that
sizeof(*estatus) + data_length <= ctx->status_len to protect against this?

> +
> +	estatus_len = cper_esource_estatus_len(ctx->estatus);

[ ... ]

> +static void cper_esource_process(struct cper_esource *ctx)
> +{
> +	int rc;
> +	int sev;
> +
> +	guard(spinlock_irqsave)(&ctx->lock);

[Severity: High]
Does this unnecessarily disable local interrupts?

The cper_esource_irq() handler is requested as a threaded IRQ with
IRQF_ONESHOT, which naturally serializes execution. Disabling interrupts
across this entire function defeats the purpose of delegating work to a
threaded context and introduces high interrupt latency, especially since it
contains memory copies and potentially blocking reporting helpers.

> +
> +	memcpy(ctx->estatus, ctx->status, ctx->status_len);

[ ... ]

> +static irqreturn_t cper_esource_irq(int irq, void *data)
> +{
> +	struct cper_esource *ctx = data;
> +
> +	cper_esource_process(ctx);
> +
> +	return IRQ_HANDLED;
> +}

[Severity: High]
If cper_esource_process() returns early without doing any work, shouldn't
this propagate an IRQ_NONE return?

If the firmware hasn't set block_status, cper_esource_process() bails out
early without clearing device state or acknowledging the interrupt. By
always returning IRQ_HANDLED here, the spurious interrupt detector is
bypassed. If this is a level-triggered interrupt, won't this result in an
infinite loop of executing the threaded handler and cause a hard CPU lockup?

> +
> +static int cper_esource_init_ack(struct cper_esource *ctx)

[ ... ]

> +static int cper_esource_probe(struct platform_device *pdev)
> +{
> +	struct device *dev = &pdev->dev;
> +	struct cper_esource *ctx;
> +	size_t size;
> +	int source_id;
> +	int rc;

[ ... ]

> +	rc = cper_esource_init_ack(ctx);
> +	if (rc)
> +		return rc;
> +
> +	rc = cper_esource_init_pool();
> +	if (rc)
> +		return rc;

[Severity: High]
Does this cause an unbounded memory leak if the probe sequence fails or
the module is repeatedly bound/unbound?

cper_esource_init_pool() calls ghes_estatus_pool_init(1).

In drivers/acpi/apei/ghes_cper.c:ghes_estatus_pool_init():
        addr = (unsigned long)vmalloc(len);
        ...
        rc = gen_pool_add(ghes_estatus_pool, addr, len, -1);

This dynamically allocates and adds memory to the global pool on every call,
even if the pool already exists. If the driver is unbound and rebound, or if
platform_get_irq() later returns -EPROBE_DEFER, the allocated memory is
permanently leaked.

> +
> +	ctx->estatus = devm_kzalloc(dev, ctx->status_len, GFP_KERNEL);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260617-topics-ahmtib01-ras_ffh_arm_internal_review-v6-0-91f725174aa0@arm.com?part=10

     prev parent reply	other threads:[~2026-06-17 14:10 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-17 13:54 [PATCH v6 00/10] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
2026-06-17 13:54 ` [PATCH v6 01/10] ACPI: APEI: GHES: share macros via a private header Ahmed Tiba
2026-06-17 14:09   ` sashiko-bot
2026-06-17 13:54 ` [PATCH v6 02/10] ACPI: APEI: GHES: move CPER read helpers Ahmed Tiba
2026-06-17 14:15   ` sashiko-bot
2026-06-17 13:54 ` [PATCH v6 03/10] ACPI: APEI: GHES: move GHESv2 ack and alloc helpers Ahmed Tiba
2026-06-17 13:54 ` [PATCH v6 04/10] ACPI: APEI: GHES: move estatus cache helpers Ahmed Tiba
2026-06-17 14:10   ` sashiko-bot
2026-06-17 13:54 ` [PATCH v6 05/10] ACPI: APEI: GHES: move vendor record helpers Ahmed Tiba
2026-06-17 13:54 ` [PATCH v6 06/10] ACPI: APEI: GHES: move CXL CPER helpers Ahmed Tiba
2026-06-17 14:14   ` sashiko-bot
2026-06-17 13:54 ` [PATCH v6 07/10] ACPI: APEI: introduce GHES helper Ahmed Tiba
2026-06-17 13:54 ` [PATCH v6 08/10] ACPI: APEI: share GHES CPER helpers Ahmed Tiba
2026-06-17 14:11   ` sashiko-bot
2026-06-17 13:54 ` [PATCH v6 09/10] dt-bindings: firmware: add arm,ras-cper Ahmed Tiba
2026-06-17 13:54 ` [PATCH v6 10/10] RAS: add firmware-first CPER provider Ahmed Tiba
2026-06-17 14:10   ` sashiko-bot [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260617141057.75CAF1F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=ahmed.tiba@arm.com \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=robh@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox