Devicetree
 help / color / mirror / Atom feed
From: sashiko-bot@kernel.org
To: "Ahmed Tiba" <ahmed.tiba@arm.com>
Cc: devicetree@vger.kernel.org, conor+dt@kernel.org, robh@kernel.org
Subject: Re: [PATCH v5 10/10] RAS: add firmware-first CPER provider
Date: Fri, 29 May 2026 11:07:08 +0000	[thread overview]
Message-ID: <20260529110709.0B3C01F00893@smtp.kernel.org> (raw)
In-Reply-To: <20260529-topics-ahmtib01-ras_ffh_arm_internal_review-v5-10-2e0500d42642@arm.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [High] Handling synchronous errors via an asynchronous threaded IRQ causes force_sig(SIGBUS) to kill the kernel IRQ thread instead of the faulting user task.
- [High] The driver fails to validate or clear the hardware CPER 'block_status' field, leading to the processing of invalid/stale records and violating the firmware state machine.
- [High] The driver blindly trusts the memory-mapped region size and firmware-provided CPER record lengths, leading to out-of-bounds heap reads.
--

commit 079bcf67c79972332f188b664beb5efc21cf8796
Author: Ahmed Tiba <ahmed.tiba@arm.com>

RAS: add firmware-first CPER provider

Add a firmware-first CPER provider that reuses the shared GHES helpers, wire
it into the RAS Kconfig/Makefile and document it in the admin guide.

> diff --git a/drivers/ras/cper-esource.c b/drivers/ras/cper-esource.c
> new file mode 100644
> index 0000000000000..83f7a910e50a7
> --- /dev/null
> +++ b/drivers/ras/cper-esource.c
[ ... ]
> +static int cper_esource_copy_status(struct cper_esource *ctx)
> +{
> +	memcpy_fromio(ctx->estatus, ctx->status, ctx->status_len);
> +	return 0;
> +}
[ ... ]
> +static void cper_esource_process(struct cper_esource *ctx)
> +{
> +	int sev;
> +
> +	guard(spinlock_irqsave)(&ctx->lock);
> +
> +	if (cper_esource_copy_status(ctx))
> +		return;

[Severity: High]
Should this code check and clear the CPER block_status field?

The CPER/HEST specification uses block_status to indicate if a valid error
record is present. Since cper_esource_copy_status() unconditionally copies
the buffer without checking ctx->estatus->block_status, could a spurious
interrupt cause the driver to process invalid or stale records?

Also, does the hardware require the OS to clear the block_status in IO
memory after consumption, similar to ghes_clear_estatus()? Without doing so,
the firmware might consider the buffer permanently full or the OS might
repeatedly process the same stale error upon subsequent interrupts.

> +
> +	sev = ghes_severity(ctx->estatus->error_severity);
> +	if (sev >= GHES_SEV_PANIC)
> +		cper_esource_fatal(ctx);
> +
> +	if (!ghes_estatus_cached(ctx->estatus) &&

[Severity: High]
Can this lead to out-of-bounds heap reads if the firmware provides
unexpected lengths?

The estatus buffer is dynamically allocated with a size of ctx->status_len,
but the ACPI GHES core helpers (like ghes_estatus_cached() and
apei_estatus_for_each_section()) rely on the firmware-provided data_length
and raw_data_length fields to calculate the total record size.

If cper_estatus_len(ctx->estatus) evaluates to a size larger than
ctx->status_len, these helpers could perform out-of-bounds reads when
comparing memory or iterating over sections. Should the total length
be validated against ctx->status_len before passing it to these helpers?

> +	    ghes_print_estatus(NULL, ctx->generic, ctx->estatus))
> +		ghes_estatus_cache_add(ctx->generic, ctx->estatus);
> +
> +	ghes_cper_handle_status(ctx->dev, ctx->generic, ctx->estatus, ctx->sync);

[Severity: High]
For synchronous errors (when ctx->sync is true), will this signal the wrong
task?

ghes_cper_handle_status() uses force_sig(SIGBUS) to kill the current process
for unrecoverable synchronous errors:

drivers/acpi/apei/ghes_cper.c:ghes_cper_handle_status() {
    ...
    if (sync && !queued) {
        ...
        force_sig(SIGBUS);
    }
}

Because cper_esource_irq() is registered below as a threaded IRQ via
devm_request_threaded_irq(), it executes in the context of a kernel IRQ
thread. Doesn't this mean force_sig(SIGBUS) will kill the IRQ thread instead
of the user task that actually triggered the fault?

> +	cper_esource_ack(ctx);
> +}
[ ... ]

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260529-topics-ahmtib01-ras_ffh_arm_internal_review-v5-0-2e0500d42642@arm.com?part=10

  reply	other threads:[~2026-05-29 11:07 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-29  9:50 [PATCH v5 00/10] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
2026-05-29  9:50 ` [PATCH v5 01/10] ACPI: APEI: GHES: share macros via a private header Ahmed Tiba
2026-05-29 10:23   ` sashiko-bot
2026-05-29 15:52   ` Jonathan Cameron
2026-05-29  9:50 ` [PATCH v5 02/10] ACPI: APEI: GHES: move CPER read helpers Ahmed Tiba
2026-05-29 10:37   ` sashiko-bot
2026-05-29 15:51   ` Jonathan Cameron
2026-05-29  9:50 ` [PATCH v5 03/10] ACPI: APEI: GHES: move GHESv2 ack and alloc helpers Ahmed Tiba
2026-05-29 10:42   ` sashiko-bot
2026-05-29 15:54   ` Jonathan Cameron
2026-05-29  9:50 ` [PATCH v5 04/10] ACPI: APEI: GHES: move estatus cache helpers Ahmed Tiba
2026-05-29 10:21   ` sashiko-bot
2026-05-29 16:03   ` Jonathan Cameron
2026-05-29  9:50 ` [PATCH v5 05/10] ACPI: APEI: GHES: move vendor record helpers Ahmed Tiba
2026-05-29 16:10   ` Jonathan Cameron
2026-05-29  9:50 ` [PATCH v5 06/10] ACPI: APEI: GHES: move CXL CPER helpers Ahmed Tiba
2026-05-29 10:34   ` sashiko-bot
2026-05-29 16:16   ` Jonathan Cameron
2026-05-29  9:50 ` [PATCH v5 07/10] ACPI: APEI: introduce GHES helper Ahmed Tiba
2026-05-29 10:36   ` sashiko-bot
2026-05-29 16:21   ` Jonathan Cameron
2026-05-29  9:50 ` [PATCH v5 08/10] ACPI: APEI: share GHES CPER helpers Ahmed Tiba
2026-05-29 10:40   ` sashiko-bot
2026-05-29 16:32   ` Jonathan Cameron
2026-05-29  9:50 ` [PATCH v5 09/10] dt-bindings: firmware: add arm,ras-cper Ahmed Tiba
2026-05-29 16:44   ` Jonathan Cameron
2026-05-29  9:50 ` [PATCH v5 10/10] RAS: add firmware-first CPER provider Ahmed Tiba
2026-05-29 11:07   ` sashiko-bot [this message]
2026-05-29 17:06   ` Jonathan Cameron
2026-05-29 16:36 ` [PATCH v5 00/10] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Jonathan Cameron

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260529110709.0B3C01F00893@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=ahmed.tiba@arm.com \
    --cc=conor+dt@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=robh@kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox