public inbox for linux-acpi@vger.kernel.org
 help / color / mirror / Atom feed
From: Jonathan Cameron <jonathan.cameron@huawei.com>
To: Ahmed Tiba <ahmed.tiba@arm.com>
Cc: <devicetree@vger.kernel.org>, <linux-acpi@vger.kernel.org>,
	<Dmitry.Lamerov@arm.com>, <catalin.marinas@arm.com>,
	<bp@alien8.de>, <robh@kernel.org>, <rafael@kernel.org>,
	<will@kernel.org>, <conor@kernel.org>,
	<linux-arm-kernel@lists.infradead.org>,
	<linux-doc@vger.kernel.org>, <krzk+dt@kernel.org>,
	<Michael.Zhao2@arm.com>, <tony.luck@intel.com>
Subject: Re: [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider
Date: Tue, 24 Feb 2026 15:55:20 +0000	[thread overview]
Message-ID: <20260224155520.00004e92@huawei.com> (raw)
In-Reply-To: <20260220-topics-ahmtib01-ras_ffh_arm_internal_review-v2-11-347fa2d7351b@arm.com>

On Fri, 20 Feb 2026 13:42:29 +0000
Ahmed Tiba <ahmed.tiba@arm.com> wrote:

> Add a DeviceTree firmware-first CPER provider that reuses the shared
> GHES helpers, wire it into the RAS Kconfig/Makefile and document it in
> the admin guide. Update MAINTAINERS now that the driver exists.
> 
> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
Hi Ahmed,

Various comments inline.

Jonathan

> ---
>  Documentation/admin-guide/RAS/main.rst |  18 +++
>  MAINTAINERS                            |   1 +
>  drivers/acpi/apei/apei-internal.h      |  10 +-
>  drivers/acpi/apei/ghes_cper.c          |   2 +
>  drivers/ras/Kconfig                    |  12 ++
>  drivers/ras/Makefile                   |   1 +
>  drivers/ras/esource-dt.c               | 264 +++++++++++++++++++++++++++++++++
>  include/acpi/ghes_cper.h               |   9 ++
>  8 files changed, 308 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/admin-guide/RAS/main.rst b/Documentation/admin-guide/RAS/main.rst
> index 5a45db32c49b..4ffabaaeabb1 100644
> --- a/Documentation/admin-guide/RAS/main.rst
> +++ b/Documentation/admin-guide/RAS/main.rst
> @@ -205,6 +205,24 @@ Architecture (MCA)\ [#f3]_.
>  .. [#f3] For more details about the Machine Check Architecture (MCA),
>    please read Documentation/arch/x86/x86_64/machinecheck.rst at the Kernel tree.
>  
> +Firmware-first CPER via DeviceTree
> +----------------------------------
> +
> +Some systems expose Common Platform Error Record (CPER) data
> +via DeviceTree instead of ACPI HEST tables.

I'd argue this isn't really DT specific, it's just not ACPI table.
You could for instance use PRP0001 and wire this up on ACPI with only
one trivial change to generic property.h accessor for the boolean.

Or use another firmware information source entirely.

> +Enable ``CONFIG_RAS_ESOURCE_DT`` to build the ``drivers/ras/esource-dt.c``
> +driver and describe the CPER error source buffer with the
> +``Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml`` binding.
> +The driver reuses the GHES CPER helper object in
> +``drivers/acpi/apei/ghes_cper.c`` so the logging, notifier chains, and
> +memory failure handling match the ACPI GHES behaviour even when
> +ACPI is disabled.
> +
> +Once a platform describes a firmware-first provider, both ACPI GHES and the
> +DeviceTree driver reuse the same code paths. This keeps the behaviour
> +consistent regardless of whether the error source is described via ACPI
> +tables or DeviceTree.

> diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
> index fc4f4bb94a4c..ea6d96713020 100644
> --- a/drivers/ras/Kconfig
> +++ b/drivers/ras/Kconfig
> @@ -34,6 +34,18 @@ if RAS
>  source "arch/x86/ras/Kconfig"
>  source "drivers/ras/amd/atl/Kconfig"
>  
> +config RAS_ESOURCE_DT
> +	bool "DeviceTree firmware-first CPER error source block provider"
It isn't really DT specific other than one call that I've suggested you
replace with a generic firmware accessor.

> +	depends on OF

Generally we don't gate on OF unless there are OF specific calls. Here there
aren't so you are just reducing build coverage. || COMPILE_TEST 
maybe.

> +	depends on ARM64

Likewise, nothing in here is arm64 specific that I can spot.

> +	select GHES_CPER_HELPERS
> +	help
> +	  Enable support for firmware-first Common Platform Error Record (CPER)
> +	  error source block providers that are described via DeviceTree
> +	  instead of ACPI HEST tables. The driver reuses the existing GHES
> +	  CPER helpers so the error processing matches the ACPI code paths,
> +	  but it can be built even when ACPI is disabled.
> +

> diff --git a/drivers/ras/esource-dt.c b/drivers/ras/esource-dt.c
> new file mode 100644
> index 000000000000..b575a2258536
> --- /dev/null
> +++ b/drivers/ras/esource-dt.c
> @@ -0,0 +1,264 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * DeviceTree provider for firmware-first CPER error source block.
> + *
> + * This driver shares the GHES CPER helpers so we keep the reporting and
> + * notifier behaviour identical to ACPI GHES
> + *
> + * Copyright (C) 2025 ARM Ltd.
> + * Author: Ahmed Tiba <ahmed.tiba@arm.com>
> + */
> +
> +#include <linux/atomic.h>
> +#include <linux/bitops.h>
> +#include <linux/device.h>
> +#include <linux/interrupt.h>
> +#include <linux/io.h>
> +#include <linux/io-64-nonatomic-lo-hi.h>
Used?
> +#include <linux/module.h>
mod_devicetable.h for of_device_id definition.

> +#include <linux/of_address.h>
> +#include <linux/of_irq.h>
Generally very little reason to include these.  Not sure why you need
them here.

> +#include <linux/panic.h>
> +#include <linux/platform_device.h>
> +#include <linux/slab.h>
> +#include <linux/spinlock.h>
> +
> +#include <acpi/ghes.h>
> +#include <acpi/ghes_cper.h>
> +
> +static atomic_t ghes_ffh_source_ids = ATOMIC_INIT(0);
I'd normally expect an IDA or similar. If nothing else it clearly
indicates we only want a unique ID.
> +
> +struct ghes_ffh_ack {
> +	void __iomem *addr;
> +	u64 preserve;
> +	u64 set;
> +	u8 width;
> +	bool present;
> +};
> +
> +struct ghes_ffh {
> +	struct device *dev;
> +	void __iomem *status;
> +	size_t status_len;
> +
> +	struct ghes_ffh_ack ack;
> +
> +	struct acpi_hest_generic *generic;
> +	struct acpi_hest_generic_status *estatus;
> +
> +	bool sync;
> +	int irq;
> +
> +	/* Serializes access to the firmware-owned buffer. */
If we are serializing it, in what sense is it owned by the firmware?

> +	spinlock_t lock;
> +};


> +
> +static void ghes_ffh_process(struct ghes_ffh *ctx)
> +{
> +	unsigned long flags;
> +	int sev;
> +
> +	spin_lock_irqsave(&ctx->lock, flags);

guard() + include cleanup.h. Then can do returns in error paths.

> +
> +	if (ghes_ffh_copy_status(ctx))
> +		goto out;
Like here to give simpler lfow.


> +
> +	sev = ghes_severity(ctx->estatus->error_severity);
> +	if (sev >= GHES_SEV_PANIC)
> +		ghes_ffh_fatal(ctx);
> +
> +	if (!ghes_estatus_cached(ctx->estatus)) {
> +		if (ghes_print_estatus(NULL, ctx->generic, ctx->estatus))

Combine the two if statements with &&

> +			ghes_estatus_cache_add(ctx->generic, ctx->estatus);
> +	}
> +
> +	ghes_cper_handle_status(ctx->dev, ctx->generic, ctx->estatus, ctx->sync);
> +
> +	ghes_ffh_ack(ctx);
> +
> +out:
> +	spin_unlock_irqrestore(&ctx->lock, flags);
> +}
> +
> +static irqreturn_t ghes_ffh_irq(int irq, void *data)
> +{
> +	struct ghes_ffh *ctx = data;
> +
> +	ghes_ffh_process(ctx);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +static int ghes_ffh_init_ack(struct platform_device *pdev,
> +			     struct ghes_ffh *ctx)
> +{
> +	struct resource *res;
> +	size_t size;
> +
> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> +	if (!res)
> +		return 0;
> +
> +	ctx->ack.addr = devm_ioremap_resource(&pdev->dev, res);
Why not devm_platform_get_and_ioremap_resource()?

> +	if (IS_ERR(ctx->ack.addr))
> +		return PTR_ERR(ctx->ack.addr);
> +
> +	size = resource_size(res);
> +	switch (size) {
> +	case 4:
> +		ctx->ack.width = 32;
> +		ctx->ack.preserve = ~0U;
> +		break;
> +	case 8:
> +		ctx->ack.width = 64;
> +		ctx->ack.preserve = ~0ULL;
> +		break;
> +	default:
> +		dev_err(&pdev->dev, "Unsupported ack resource size %zu\n", size);
> +		return -EINVAL;
> +	}
> +
> +	ctx->ack.set = BIT_ULL(0);
> +	ctx->ack.present = true;
> +	return 0;
> +}
> +
> +static int ghes_ffh_probe(struct platform_device *pdev)

Consider using a 
	struct device *dev = &pdev->dev;
given there is only one device around and it will shorten a bunch of
lines a little.

> +{
> +	struct ghes_ffh *ctx;
> +	struct resource *res;
> +	int rc;
> +
> +	ctx = devm_kzalloc(&pdev->dev, sizeof(*ctx), GFP_KERNEL);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	spin_lock_init(&ctx->lock);
> +	ctx->dev = &pdev->dev;
> +	ctx->sync = of_property_read_bool(pdev->dev.of_node, "arm,sea-notify");
Hmm. I'd allow for other firmware types with
	device_property_read_bool() instead.

> +
> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	if (!res) {
> +		dev_err(&pdev->dev, "status region missing\n");
In probe you can always use dev_err_probe. It pretty prints the return value etc and
saves lines of code.
		return dev_err_probe(&pdev->dev, -EINVAL, "status region missing\n");

Don't worry about slightly long line.

> +		return -EINVAL;
> +	}
> +
> +	ctx->status_len = resource_size(res);
> +	if (!ctx->status_len) {
> +		dev_err(&pdev->dev, "Status region has zero length\n");
As above, use dev_err_probe()

> +		return -EINVAL;
> +	}
> +
> +	ctx->status = devm_ioremap_resource(&pdev->dev, res);
I'd be tempted to use devm_platform_get_and_ioremap_resource() and just
not worry about mapping and unmapping that will unnecessarily occur in the
case of error.

> +	if (IS_ERR(ctx->status))
> +		return PTR_ERR(ctx->status);
> +
> +	rc = ghes_ffh_init_ack(pdev, ctx);
> +	if (rc)
> +		return rc;
> +
> +	rc = ghes_ffh_init_pool();
> +	if (rc)
> +		return rc;
> +
> +	ctx->estatus = devm_kzalloc(&pdev->dev, ctx->status_len, GFP_KERNEL);
> +	if (!ctx->estatus)
> +		return -ENOMEM;
> +
> +	ctx->generic = devm_kzalloc(&pdev->dev, sizeof(*ctx->generic), GFP_KERNEL);
> +	if (!ctx->generic)
> +		return -ENOMEM;
> +
> +	ctx->generic->header.type = ACPI_HEST_TYPE_GENERIC_ERROR;
> +	ctx->generic->header.source_id =
> +		atomic_inc_return(&ghes_ffh_source_ids);
> +	ctx->generic->notify.type = ctx->sync ?
> +		ACPI_HEST_NOTIFY_SEA : ACPI_HEST_NOTIFY_EXTERNAL;
> +	ctx->generic->error_block_length = ctx->status_len;
> +
> +	ctx->irq = platform_get_irq_optional(pdev, 0);
> +	if (ctx->irq <= 0) {
> +		if (ctx->irq == -EPROBE_DEFER)
> +			return ctx->irq;
> +		dev_err(&pdev->dev, "interrupt is required (%d)\n", ctx->irq);
If it's required, why call get_irq_optional?
That only serves to suppress the error message inside the call.  Use
the non optional version and drop this.

> +		return -EINVAL;
> +	}
> +
> +	rc = devm_request_threaded_irq(&pdev->dev, ctx->irq,
> +				       NULL, ghes_ffh_irq,
> +				       IRQF_ONESHOT,
> +				       dev_name(&pdev->dev), ctx);
> +	if (rc)
> +		return rc;
> +
> +	platform_set_drvdata(pdev, ctx);

I can't immediately spot where this is used.  If it isn't don't set it as that
will mislead people into thinking it's needed.

> +	dev_info(&pdev->dev, "Firmware-first CPER status provider (interrupt)\n");

Krysztof already commented on this one.

> +	return 0;
> +}
> +
> +static void ghes_ffh_remove(struct platform_device *pdev)
> +{

If nothing to do, platform drivers don't need a remove so get rid of it.

> +}
> +
> +static const struct of_device_id ghes_ffh_of_match[] = {
> +	{ .compatible = "arm,ras-ffh" },
> +	{ /* sentinel */ }
> +};
> +MODULE_DEVICE_TABLE(of, ghes_ffh_of_match);
> +
> +static struct platform_driver ghes_ffh_driver = {
> +	.driver = {
> +		.name = "esource-dt",
> +		.of_match_table = ghes_ffh_of_match,
> +	},
> +	.probe = ghes_ffh_probe,
> +	.remove = ghes_ffh_remove,
> +};
> +
Common convention is keep this tightly coupled with the
struct platform_driver but not having a blank line here.

> +module_platform_driver(ghes_ffh_driver);
> +
> +MODULE_AUTHOR("Ahmed Tiba <ahmed.tiba@arm.com>");
> +MODULE_DESCRIPTION("Firmware-first CPER provider for DeviceTree platforms");
> +MODULE_LICENSE("GPL");



  parent reply	other threads:[~2026-02-24 15:55 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-20 13:42 [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 01/11] ACPI: APEI: GHES: share macros via a private header Ahmed Tiba
2026-02-24 15:22   ` Jonathan Cameron
2026-03-11 11:39     ` Ahmed Tiba
2026-03-11 12:39       ` Jonathan Cameron
2026-03-11 12:56         ` Ahmed Tiba
2026-02-26  6:44   ` Himanshu Chauhan
2026-03-11 11:55     ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 02/11] ACPI: APEI: GHES: add ghes_cper.o stub Ahmed Tiba
2026-02-24 15:25   ` Jonathan Cameron
2026-03-11 12:19     ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 03/11] ACPI: APEI: GHES: move CPER read helpers Ahmed Tiba
2026-02-24 15:32   ` Jonathan Cameron
2026-03-11 12:38     ` Ahmed Tiba
2026-02-26  5:58   ` Himanshu Chauhan
2026-03-11 13:18     ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 04/11] ACPI: APEI: GHES: move GHESv2 ack and alloc helpers Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 05/11] ACPI: APEI: GHES: move estatus cache helpers Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 06/11] ACPI: APEI: GHES: move vendor record helpers Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 07/11] ACPI: APEI: GHES: move CXL CPER helpers Ahmed Tiba
2026-02-24 15:34   ` Jonathan Cameron
2026-02-20 13:42 ` [PATCH v2 08/11] ACPI: APEI: introduce GHES helper Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers Ahmed Tiba
2026-02-20 19:19   ` kernel test robot
2026-02-20 19:24   ` kernel test robot
2026-02-20 20:37   ` kernel test robot
2026-02-20 21:16   ` kernel test robot
2026-02-20 13:42 ` [PATCH v2 10/11] dt-bindings: firmware: add arm,ras-ffh Ahmed Tiba
2026-02-26  7:03   ` Himanshu Chauhan
2026-03-11 13:41     ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider Ahmed Tiba
2026-02-21  9:06   ` Krzysztof Kozlowski
2026-02-23 19:10     ` Ahmed Tiba
2026-02-24 15:55   ` Jonathan Cameron [this message]
2026-03-12 12:23     ` Ahmed Tiba
2026-03-12 14:50       ` Jonathan Cameron
2026-02-26  7:01   ` Himanshu Chauhan
2026-02-26  7:05 ` [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Himanshu Chauhan
2026-03-11 10:44   ` Ahmed Tiba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260224155520.00004e92@huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=Dmitry.Lamerov@arm.com \
    --cc=Michael.Zhao2@arm.com \
    --cc=ahmed.tiba@arm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=conor@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=krzk+dt@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=robh@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox