public inbox for devicetree@vger.kernel.org
 help / color / mirror / Atom feed
From: Ahmed Tiba <ahmed.tiba@arm.com>
To: Jonathan Cameron <jonathan.cameron@huawei.com>
Cc: devicetree@vger.kernel.org, linux-acpi@vger.kernel.org,
	Dmitry.Lamerov@arm.com, catalin.marinas@arm.com, bp@alien8.de,
	robh@kernel.org, rafael@kernel.org, will@kernel.org,
	conor@kernel.org, linux-arm-kernel@lists.infradead.org,
	linux-doc@vger.kernel.org, krzk+dt@kernel.org,
	Michael.Zhao2@arm.com, tony.luck@intel.com
Subject: Re: [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider
Date: Thu, 12 Mar 2026 12:23:41 +0000	[thread overview]
Message-ID: <4a193f85-e0ed-4367-b69b-587b2cbc977f@arm.com> (raw)
In-Reply-To: <20260224155520.00004e92@huawei.com>

On 24/02/2026 15:55, Jonathan Cameron wrote:
> On Fri, 20 Feb 2026 13:42:29 +0000
> Ahmed Tiba <ahmed.tiba@arm.com> wrote:
> 
>> Add a DeviceTree firmware-first CPER provider that reuses the shared
>> GHES helpers, wire it into the RAS Kconfig/Makefile and document it in
>> the admin guide. Update MAINTAINERS now that the driver exists.
>>
>> Signed-off-by: Ahmed Tiba <ahmed.tiba@arm.com>
> Hi Ahmed,
> 
> Various comments inline.
> 
> Jonathan
> 
>> ---
>>   Documentation/admin-guide/RAS/main.rst |  18 +++
>>   MAINTAINERS                            |   1 +
>>   drivers/acpi/apei/apei-internal.h      |  10 +-
>>   drivers/acpi/apei/ghes_cper.c          |   2 +
>>   drivers/ras/Kconfig                    |  12 ++
>>   drivers/ras/Makefile                   |   1 +
>>   drivers/ras/esource-dt.c               | 264 +++++++++++++++++++++++++++++++++
>>   include/acpi/ghes_cper.h               |   9 ++
>>   8 files changed, 308 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/RAS/main.rst b/Documentation/admin-guide/RAS/main.rst
>> index 5a45db32c49b..4ffabaaeabb1 100644
>> --- a/Documentation/admin-guide/RAS/main.rst
>> +++ b/Documentation/admin-guide/RAS/main.rst
>> @@ -205,6 +205,24 @@ Architecture (MCA)\ [#f3]_.
>>   .. [#f3] For more details about the Machine Check Architecture (MCA),
>>     please read Documentation/arch/x86/x86_64/machinecheck.rst at the Kernel tree.
>>   
>> +Firmware-first CPER via DeviceTree
>> +----------------------------------
>> +
>> +Some systems expose Common Platform Error Record (CPER) data
>> +via DeviceTree instead of ACPI HEST tables.
> 
> I'd argue this isn't really DT specific, it's just not ACPI table.
> You could for instance use PRP0001 and wire this up on ACPI with only
> one trivial change to generic property.h accessor for the boolean.
> 
> Or use another firmware information source entirely.

I'm intentionally keeping the scope DT-only for this series,
so I'll keep the wording DT-focused.

>> +Enable ``CONFIG_RAS_ESOURCE_DT`` to build the ``drivers/ras/esource-dt.c``
>> +driver and describe the CPER error source buffer with the
>> +``Documentation/devicetree/bindings/firmware/arm,ras-ffh.yaml`` binding.
>> +The driver reuses the GHES CPER helper object in
>> +``drivers/acpi/apei/ghes_cper.c`` so the logging, notifier chains, and
>> +memory failure handling match the ACPI GHES behaviour even when
>> +ACPI is disabled.
>> +
>> +Once a platform describes a firmware-first provider, both ACPI GHES and the
>> +DeviceTree driver reuse the same code paths. This keeps the behaviour
>> +consistent regardless of whether the error source is described via ACPI
>> +tables or DeviceTree.
> 
>> diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
>> index fc4f4bb94a4c..ea6d96713020 100644
>> --- a/drivers/ras/Kconfig
>> +++ b/drivers/ras/Kconfig
>> @@ -34,6 +34,18 @@ if RAS
>>   source "arch/x86/ras/Kconfig"
>>   source "drivers/ras/amd/atl/Kconfig"
>>   
>> +config RAS_ESOURCE_DT
>> +	bool "DeviceTree firmware-first CPER error source block provider"
> It isn't really DT specific other than one call that I've suggested you
> replace with a generic firmware accessor.
> 

I'll keep it DT-specific for this series.

>> +	depends on OF
> 
> Generally we don't gate on OF unless there are OF specific calls. Here there
> aren't so you are just reducing build coverage. || COMPILE_TEST
> maybe.
> 
Agreed. I'll drop OF and add COMPILE_TEST.


>> +	depends on ARM64
> 
> Likewise, nothing in here is arm64 specific that I can spot.
> 

Agreed. I'll drop ARM64.

>> +	select GHES_CPER_HELPERS
>> +	help
>> +	  Enable support for firmware-first Common Platform Error Record (CPER)
>> +	  error source block providers that are described via DeviceTree
>> +	  instead of ACPI HEST tables. The driver reuses the existing GHES
>> +	  CPER helpers so the error processing matches the ACPI code paths,
>> +	  but it can be built even when ACPI is disabled.
>> +
> 
>> diff --git a/drivers/ras/esource-dt.c b/drivers/ras/esource-dt.c
>> new file mode 100644
>> index 000000000000..b575a2258536
>> --- /dev/null
>> +++ b/drivers/ras/esource-dt.c
>> @@ -0,0 +1,264 @@
>> +// SPDX-License-Identifier: GPL-2.0-only
>> +/*
>> + * DeviceTree provider for firmware-first CPER error source block.
>> + *
>> + * This driver shares the GHES CPER helpers so we keep the reporting and
>> + * notifier behaviour identical to ACPI GHES
>> + *
>> + * Copyright (C) 2025 ARM Ltd.
>> + * Author: Ahmed Tiba <ahmed.tiba@arm.com>
>> + */
>> +
>> +#include <linux/atomic.h>
>> +#include <linux/bitops.h>
>> +#include <linux/device.h>
>> +#include <linux/interrupt.h>
>> +#include <linux/io.h>
>> +#include <linux/io-64-nonatomic-lo-hi.h>
> Used?

No, I'll drop it.

>> +#include <linux/module.h>
> mod_devicetable.h for of_device_id definition.
> 

Ack. I'll add <linux/mod_devicetable.h> and keep module.h.

>> +#include <linux/of_address.h>
>> +#include <linux/of_irq.h>
> Generally very little reason to include these.  Not sure why you need
> them here.
> 

Agreed, I'll drop both.

>> +#include <linux/panic.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/slab.h>
>> +#include <linux/spinlock.h>
>> +
>> +#include <acpi/ghes.h>
>> +#include <acpi/ghes_cper.h>
>> +
>> +static atomic_t ghes_ffh_source_ids = ATOMIC_INIT(0);
> I'd normally expect an IDA or similar. If nothing else it clearly
> indicates we only want a unique ID.

I'll keep the atomic for now; it's just a monotonic unique ID with no
lifetime tracking. If you strongly prefer IDA I can switch.

>> +
>> +struct ghes_ffh_ack {
>> +	void __iomem *addr;
>> +	u64 preserve;
>> +	u64 set;
>> +	u8 width;
>> +	bool present;
>> +};
>> +
>> +struct ghes_ffh {
>> +	struct device *dev;
>> +	void __iomem *status;
>> +	size_t status_len;
>> +
>> +	struct ghes_ffh_ack ack;
>> +
>> +	struct acpi_hest_generic *generic;
>> +	struct acpi_hest_generic_status *estatus;
>> +
>> +	bool sync;
>> +	int irq;
>> +
>> +	/* Serializes access to the firmware-owned buffer. */
> If we are serializing it, in what sense is it owned by the firmware?
> 

I'll clarify the comment:
firmware owns the buffer contents and the OS only serializes access.

>> +	spinlock_t lock;
>> +};
> 
> 
>> +
>> +static void ghes_ffh_process(struct ghes_ffh *ctx)
>> +{
>> +	unsigned long flags;
>> +	int sev;
>> +
>> +	spin_lock_irqsave(&ctx->lock, flags);
> 
> guard() + include cleanup.h. Then can do returns in error paths.

Agreed. I'll switch to guard() and include <linux/cleanup.h>.

>> +
>> +	if (ghes_ffh_copy_status(ctx))
>> +		goto out;
> Like here to give simpler lfow.
> 
> 
>> +
>> +	sev = ghes_severity(ctx->estatus->error_severity);
>> +	if (sev >= GHES_SEV_PANIC)
>> +		ghes_ffh_fatal(ctx);
>> +
>> +	if (!ghes_estatus_cached(ctx->estatus)) {
>> +		if (ghes_print_estatus(NULL, ctx->generic, ctx->estatus))
> 
> Combine the two if statements with &&
>

Will do.

>> +			ghes_estatus_cache_add(ctx->generic, ctx->estatus);
>> +	}
>> +
>> +	ghes_cper_handle_status(ctx->dev, ctx->generic, ctx->estatus, ctx->sync);
>> +
>> +	ghes_ffh_ack(ctx);
>> +
>> +out:
>> +	spin_unlock_irqrestore(&ctx->lock, flags);
>> +}
>> +
>> +static irqreturn_t ghes_ffh_irq(int irq, void *data)
>> +{
>> +	struct ghes_ffh *ctx = data;
>> +
>> +	ghes_ffh_process(ctx);
>> +
>> +	return IRQ_HANDLED;
>> +}
>> +
>> +static int ghes_ffh_init_ack(struct platform_device *pdev,
>> +			     struct ghes_ffh *ctx)
>> +{
>> +	struct resource *res;
>> +	size_t size;
>> +
>> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
>> +	if (!res)
>> +		return 0;
>> +
>> +	ctx->ack.addr = devm_ioremap_resource(&pdev->dev, res);
> Why not devm_platform_get_and_ioremap_resource()?

Will switch to devm_platform_get_and_ioremap_resource().

>> +	if (IS_ERR(ctx->ack.addr))
>> +		return PTR_ERR(ctx->ack.addr);
>> +
>> +	size = resource_size(res);
>> +	switch (size) {
>> +	case 4:
>> +		ctx->ack.width = 32;
>> +		ctx->ack.preserve = ~0U;
>> +		break;
>> +	case 8:
>> +		ctx->ack.width = 64;
>> +		ctx->ack.preserve = ~0ULL;
>> +		break;
>> +	default:
>> +		dev_err(&pdev->dev, "Unsupported ack resource size %zu\n", size);
>> +		return -EINVAL;
>> +	}
>> +
>> +	ctx->ack.set = BIT_ULL(0);
>> +	ctx->ack.present = true;
>> +	return 0;
>> +}
>> +
>> +static int ghes_ffh_probe(struct platform_device *pdev)
> 
> Consider using a
> 	struct device *dev = &pdev->dev;
> given there is only one device around and it will shorten a bunch of
> lines a little.

I'll use a local dev pointer.

>> +{
>> +	struct ghes_ffh *ctx;
>> +	struct resource *res;
>> +	int rc;
>> +
>> +	ctx = devm_kzalloc(&pdev->dev, sizeof(*ctx), GFP_KERNEL);
>> +	if (!ctx)
>> +		return -ENOMEM;
>> +
>> +	spin_lock_init(&ctx->lock);
>> +	ctx->dev = &pdev->dev;
>> +	ctx->sync = of_property_read_bool(pdev->dev.of_node, "arm,sea-notify");
> Hmm. I'd allow for other firmware types with
> 	device_property_read_bool() instead.

Given DT-only scope, I'll keep of_property_read_bool() here.

>> +
>> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
>> +	if (!res) {
>> +		dev_err(&pdev->dev, "status region missing\n");
> In probe you can always use dev_err_probe. It pretty prints the return value etc and
> saves lines of code.
> 		return dev_err_probe(&pdev->dev, -EINVAL, "status region missing\n");

Agreed. I'll use dev_err_probe() here and for zero length.

> Don't worry about slightly long line.
> 
>> +		return -EINVAL;
>> +	}
>> +
>> +	ctx->status_len = resource_size(res);
>> +	if (!ctx->status_len) {
>> +		dev_err(&pdev->dev, "Status region has zero length\n");
> As above, use dev_err_probe()
> 
>> +		return -EINVAL;
>> +	}
>> +
>> +	ctx->status = devm_ioremap_resource(&pdev->dev, res);
> I'd be tempted to use devm_platform_get_and_ioremap_resource() and just
> not worry about mapping and unmapping that will unnecessarily occur in the
> case of error.

Will do (as above).

>> +	if (IS_ERR(ctx->status))
>> +		return PTR_ERR(ctx->status);
>> +
>> +	rc = ghes_ffh_init_ack(pdev, ctx);
>> +	if (rc)
>> +		return rc;
>> +
>> +	rc = ghes_ffh_init_pool();
>> +	if (rc)
>> +		return rc;
>> +
>> +	ctx->estatus = devm_kzalloc(&pdev->dev, ctx->status_len, GFP_KERNEL);
>> +	if (!ctx->estatus)
>> +		return -ENOMEM;
>> +
>> +	ctx->generic = devm_kzalloc(&pdev->dev, sizeof(*ctx->generic), GFP_KERNEL);
>> +	if (!ctx->generic)
>> +		return -ENOMEM;
>> +
>> +	ctx->generic->header.type = ACPI_HEST_TYPE_GENERIC_ERROR;
>> +	ctx->generic->header.source_id =
>> +		atomic_inc_return(&ghes_ffh_source_ids);
>> +	ctx->generic->notify.type = ctx->sync ?
>> +		ACPI_HEST_NOTIFY_SEA : ACPI_HEST_NOTIFY_EXTERNAL;
>> +	ctx->generic->error_block_length = ctx->status_len;
>> +
>> +	ctx->irq = platform_get_irq_optional(pdev, 0);
>> +	if (ctx->irq <= 0) {
>> +		if (ctx->irq == -EPROBE_DEFER)
>> +			return ctx->irq;
>> +		dev_err(&pdev->dev, "interrupt is required (%d)\n", ctx->irq);
> If it's required, why call get_irq_optional?
> That only serves to suppress the error message inside the call.  Use
> the non optional version and drop this.

I'll use platform_get_irq().

>> +		return -EINVAL;
>> +	}
>> +
>> +	rc = devm_request_threaded_irq(&pdev->dev, ctx->irq,
>> +				       NULL, ghes_ffh_irq,
>> +				       IRQF_ONESHOT,
>> +				       dev_name(&pdev->dev), ctx);
>> +	if (rc)
>> +		return rc;
>> +
>> +	platform_set_drvdata(pdev, ctx);
> 
> I can't immediately spot where this is used.  If it isn't don't set it as that
> will mislead people into thinking it's needed.

Agreed. I'll drop it.

>> +	dev_info(&pdev->dev, "Firmware-first CPER status provider (interrupt)\n");
> 
> Krysztof already commented on this one.
> 
>> +	return 0;
>> +}
>> +
>> +static void ghes_ffh_remove(struct platform_device *pdev)
>> +{
> 
> If nothing to do, platform drivers don't need a remove so get rid of it.

Agreed. I'll remove it.

>> +}
>> +
>> +static const struct of_device_id ghes_ffh_of_match[] = {
>> +	{ .compatible = "arm,ras-ffh" },
>> +	{ /* sentinel */ }
>> +};
>> +MODULE_DEVICE_TABLE(of, ghes_ffh_of_match);
>> +
>> +static struct platform_driver ghes_ffh_driver = {
>> +	.driver = {
>> +		.name = "esource-dt",
>> +		.of_match_table = ghes_ffh_of_match,
>> +	},
>> +	.probe = ghes_ffh_probe,
>> +	.remove = ghes_ffh_remove,
>> +};
>> +
> Common convention is keep this tightly coupled with the
> struct platform_driver but not having a blank line here.

I'll drop the blank line.

>> +module_platform_driver(ghes_ffh_driver);
>> +
>> +MODULE_AUTHOR("Ahmed Tiba <ahmed.tiba@arm.com>");
>> +MODULE_DESCRIPTION("Firmware-first CPER provider for DeviceTree platforms");
>> +MODULE_LICENSE("GPL");
> 
> 


  reply	other threads:[~2026-03-12 12:25 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-20 13:42 [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 01/11] ACPI: APEI: GHES: share macros via a private header Ahmed Tiba
2026-02-24 15:22   ` Jonathan Cameron
2026-03-11 11:39     ` Ahmed Tiba
2026-03-11 12:39       ` Jonathan Cameron
2026-03-11 12:56         ` Ahmed Tiba
2026-02-26  6:44   ` Himanshu Chauhan
2026-03-11 11:55     ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 02/11] ACPI: APEI: GHES: add ghes_cper.o stub Ahmed Tiba
2026-02-24 15:25   ` Jonathan Cameron
2026-03-11 12:19     ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 03/11] ACPI: APEI: GHES: move CPER read helpers Ahmed Tiba
2026-02-24 15:32   ` Jonathan Cameron
2026-03-11 12:38     ` Ahmed Tiba
2026-02-26  5:58   ` Himanshu Chauhan
2026-03-11 13:18     ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 04/11] ACPI: APEI: GHES: move GHESv2 ack and alloc helpers Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 05/11] ACPI: APEI: GHES: move estatus cache helpers Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 06/11] ACPI: APEI: GHES: move vendor record helpers Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 07/11] ACPI: APEI: GHES: move CXL CPER helpers Ahmed Tiba
2026-02-24 15:34   ` Jonathan Cameron
2026-02-20 13:42 ` [PATCH v2 08/11] ACPI: APEI: introduce GHES helper Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 09/11] ACPI: APEI: share GHES CPER helpers Ahmed Tiba
2026-02-20 19:19   ` kernel test robot
2026-02-20 19:24   ` kernel test robot
2026-02-20 20:37   ` kernel test robot
2026-02-20 21:16   ` kernel test robot
2026-02-20 13:42 ` [PATCH v2 10/11] dt-bindings: firmware: add arm,ras-ffh Ahmed Tiba
2026-02-26  7:03   ` Himanshu Chauhan
2026-03-11 13:41     ` Ahmed Tiba
2026-02-20 13:42 ` [PATCH v2 11/11] RAS: add DeviceTree firmware-first CPER provider Ahmed Tiba
2026-02-21  9:06   ` Krzysztof Kozlowski
2026-02-23 19:10     ` Ahmed Tiba
2026-02-24 15:55   ` Jonathan Cameron
2026-03-12 12:23     ` Ahmed Tiba [this message]
2026-03-12 14:50       ` Jonathan Cameron
2026-02-26  7:01   ` Himanshu Chauhan
2026-02-26  7:05 ` [PATCH v2 00/11] ACPI: APEI: share GHES CPER helpers and add DT FFH provider Himanshu Chauhan
2026-03-11 10:44   ` Ahmed Tiba

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4a193f85-e0ed-4367-b69b-587b2cbc977f@arm.com \
    --to=ahmed.tiba@arm.com \
    --cc=Dmitry.Lamerov@arm.com \
    --cc=Michael.Zhao2@arm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=conor@kernel.org \
    --cc=devicetree@vger.kernel.org \
    --cc=jonathan.cameron@huawei.com \
    --cc=krzk+dt@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=rafael@kernel.org \
    --cc=robh@kernel.org \
    --cc=tony.luck@intel.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox