linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 00/17] ARM Error Source Table V2 Support
@ 2025-12-22  9:43 Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 01/17] ACPI/AEST: Parse the AEST table Ruidong Tian
                   ` (17 more replies)
  0 siblings, 18 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

This series introduces support for the ARM Error Source Table (AEST), aligning
with version 2.0 of ACPI for Armv8 RAS Extensions [0].

AEST provides a critical mechanism for hardware to directly notify the
operating system kernel about RAS errors via interrupts, a concept known as
Kernel-first error handling. Compared to firmware-first error handling
(e.g., GHES), AEST offers a more lightweight approach. This efficiency allows
the OS to potentially report every Corrected Error (CE), enabling upper-layer
applications to leverage CE information for error prediction[1][2].

This series is based on Tyler Baicar's preliminary patches [3], which have not
yet been sent to the mailing list as v2.

AEST Driver Architecture
========================

The AEST driver is structured into three primary components:
  - AEST device: Responsible for handling interrupts, managing the lifecycle
                 of AEST nodes, and processing error records.
  - AEST node: Corresponds directly to a RAS node in the hardware
  - AEST record: Represents a set of RAS registers associated with a specific
                 error source.

These components are organized hierarchically as follows:

 ┌──────────────────────────────────────────────────┐
 │             AEST Driver Device Management        │
 │┌─────────────┐    ┌──────────┐     ┌───────────┐ │
 ││ AEST Device ├─┬─►│AEST Node ├──┬─►│AEST Record│ │
 │└─────────────┘ │  └──────────┘  │  └───────────┘ │
 │                │       .        │  ┌───────────┐ │
 │                │       .        ├─►│AEST Record│ │
 │                │       .        │  └───────────┘ │
 │                │  ┌──────────┐  │        .       │
 │                ├─►│AEST Node │  │        .       │
 │                │  └──────────┘  │        .       │
 │                │                │  ┌───────────┐ │
 │                │  ┌──────────┐  └─►│AEST Record│ │
 │                └─►│AEST Node │     └───────────┘ │
 │                   └──────────┘                   │
 └──────────────────────────────────────────────────┘

AEST Interrupt Handle
=====================

Upon an AEST interrupt, the driver performs the following sequence:
1. The AEST device iterates through all registered AEST nodes to identify the
   specific node(s) and record(s) that reported an error.
2. Each node typically contains two types of records:
      - report record: Errors can be located efficiently through a bitmap
                       in the `ERRGSR` register.
      - poll record: The node must individually poll all records to determine
                     if an error has occurred.
3. process record:
      - if error is corrected, The CE threshold is reset, and the error event
        is logged.
      - if error is defered, Relevant registers are dumped, and
        `memory_failure()` is invoked.
      - if error is uncorrected, panic, While UEs typically trigger an
        exception rather than an interrupt, if detected, the system will panic.
4. decode record: The AEST driver notifies other relevant drivers, such as
   EDAC, to further decode the reported RAS register information.

Address Translation
===================

As described in section 2.2 [0], error addresses reported by AEST records
may be "node-specific Logical Addresses" rather than the "System Physical
Addresses" (SPA) used by the kernel. Therefore, the driver needs to translate
these Logical Addresses (LA) to SPA. This translation mechanism is conceptually
similar to AMD's Address Translation Logic (ATL) [4], leading patch 0014 to
introduce a common translation function for both AMD and ARM architectures.

Testing
===================
I have tested this series on THead Yitian710 SOC with customized BIOS. Someone
can also use QEMU[5] for preliminary driver testing.

1. Boot Qemu

qemu-system-aarch64 -smp 4 -m 32G \
  -cpu host --enable-kvm -machine virt,gic-version=3 \
  -kernel Image -initrd initrd.cpio.gz \
  -device virtio-net-pci,netdev=t0 -netdev user,id=t0 \
  -bios /usr/share/edk2/aarch64/QEMU_EFI.fd  \
  -append "rdinit=/sbin/init earlycon verbose debug console=ttyAMA0 aest.dyndbg='+pt'" \
  -nographic -d guest_errors -D qemu.log

2. inject error
devmem 0x90d0808 l 0xc4800390

2.1 Memory error
[   64.959849] AEST: {1}[Hardware Error]: Hardware error from AEST memory.90d0000
[   64.959852] AEST: {1}[Hardware Error]:  Error from memory at SRAT proximity domain 0x0
[   64.959855] AEST: {1}[Hardware Error]:   ERR0FR: 0x40000080044081
[   64.959858] AEST: {1}[Hardware Error]:   ERR0CTRL: 0x108
[   64.959859] AEST: {1}[Hardware Error]:   ERR0STATUS: 0xc4800390
[   64.959860] AEST: {1}[Hardware Error]:   ERR0ADDR: 0x8400000043344521
[   64.959861] AEST: {1}[Hardware Error]:   ERR0MISC0: 0x7fff00000000
[   64.959861] AEST: {1}[Hardware Error]:   ERR0MISC1: 0x0
[   64.959862] AEST: {1}[Hardware Error]:   ERR0MISC2: 0x0
[   64.959863] AEST: {1}[Hardware Error]:   ERR0MISC3: 0x0
[   64.959873] Memory failure: 0x43344: recovery action for free buddy page: Recovered

2.2 CMN error
[  132.044283] AEST: {2}[Hardware Error]: Hardware error from AEST XP
[  132.044286] AEST: {2}[Hardware Error]:  Error from vendor hid ARMHC700 uid 0x0
[  132.044288] AEST: {2}[Hardware Error]:   ERR0FR: 0x48a5
[  132.044290] AEST: {2}[Hardware Error]:   ERR0CTRL: 0x108
[  132.044292] AEST: {2}[Hardware Error]:   ERR0STATUS: 0xc4800390
[  132.044293] AEST: {2}[Hardware Error]:   ERR0ADDR: 0x8400000043344521
[  132.044295] AEST: {2}[Hardware Error]:   ERR0MISC0: 0x0
[  132.044296] AEST: {2}[Hardware Error]:   ERR0MISC1: 0x0
[  132.044298] AEST: {2}[Hardware Error]:   ERR0MISC2: 0x0
[  132.044299] AEST: {2}[Hardware Error]:   ERR0MISC3: 0x0
[  132.044302] Memory failure: 0x43344: recovery action for already poisoned page: Failed

[0]: https://developer.arm.com/documentation/den0085/0101/
[1]: Intel: Predicting Uncorrectable Memory Errors from the Correctable Error History
[2]: Alibaba. Predicting DRAM-Caused Risky VMs in Large-Scale Clouds. Published in HPCA2025
[3]: https://lore.kernel.org/all/20211124170708.3874-1-baicar@os.amperecomputing.com/
[4]: https://lore.kernel.org/all/20240123041401.79812-2-yazen.ghannam@amd.com/
[5]: https://github.com/winterddd/qemu/tree/error_record

Change from V3:
https://lore.kernel.org/all/20250115084228.107573-1-tianruidong@linux.alibaba.com/
1. Add vendor AEST node framework and support CMN700
2. Borislav Petkov
    - Split into multiple smaller patches for easier review.
    - refined the English in the cover letter for better flow.
3. Accept Tomohiro Misono's comment

Change from V2:
https://lore.kernel.org/all/20240321025317.114621-1-tianruidong@linux.alibaba.com/
1. Tomohiro Misono
    - dump register before panic
2. Baolin Wang & Shuai Xue: accept all comment.
3. Support AEST V2.

Change from V1:
https://lore.kernel.org/all/20240304111517.33001-1-tianruidong@linux.alibaba.com/
1. Marc Zyngier
  - Use readq/writeq_relaxed instead of readq/writeq for MMIO address.
  - Add sync for system register operation.
  - Use irq_is_percpu_devid() helper to identify a per-CPU interrupt.
  - Other fix.
2. Set RAS CE threshold in AEST driver.
3. Enable RAS interrupt explicitly in driver.
4. UER and UEO trigger memory_failure other than panic.

Ruidong Tian (17):
  ACPI/AEST: Parse the AEST table
  ras: AEST: Add probe/remove for AEST driver
  ras: AEST: support different group format
  ras: AEST: Unify the read/write interface for system and MMIO register
  ras: AEST: Probe RAS system architecture version
  ras: AEST: Support RAS Common Fault Injection Model Extension
  ras: AEST: Support CE threshold of error record
  ras: AEST: Enable and register IRQs
  ras: AEST: Add cpuhp callback
  ras: AEST: Introduce AEST driver sysfs interface
  ras: AEST: Add error count tracking and debugfs interface
  ras: AEST: Allow configuring CE threshold via debugfs
  ras: AEST: Introduce AEST inject interface to test AEST driver
  ras: ATL: Unify ATL interface for ARM64 and AMD
  ras: AEST: Add framework to process AEST vendor node
  ras: AEST: support vendor node CMN700
  trace, ras: add ARM RAS extension trace event

 Documentation/ABI/testing/debugfs-aest |   98 +++
 MAINTAINERS                            |   11 +
 arch/arm64/include/asm/arm-cmn.h       |   47 ++
 arch/arm64/include/asm/ras.h           |   95 +++
 drivers/acpi/arm64/Kconfig             |   11 +
 drivers/acpi/arm64/Makefile            |    1 +
 drivers/acpi/arm64/aest.c              |  311 +++++++
 drivers/edac/amd64_edac.c              |    2 +-
 drivers/perf/arm-cmn.c                 |   37 +-
 drivers/ras/Kconfig                    |    1 +
 drivers/ras/Makefile                   |    1 +
 drivers/ras/aest/Kconfig               |   17 +
 drivers/ras/aest/Makefile              |    8 +
 drivers/ras/aest/aest-cmn.c            |  332 ++++++++
 drivers/ras/aest/aest-core.c           | 1057 ++++++++++++++++++++++++
 drivers/ras/aest/aest-inject.c         |  131 +++
 drivers/ras/aest/aest-sysfs.c          |  228 +++++
 drivers/ras/aest/aest.h                |  410 +++++++++
 drivers/ras/amd/atl/core.c             |    4 +-
 drivers/ras/amd/atl/internal.h         |    2 +-
 drivers/ras/amd/atl/umc.c              |    3 +-
 drivers/ras/ras.c                      |   27 +-
 include/linux/acpi_aest.h              |   75 ++
 include/linux/cpuhotplug.h             |    1 +
 include/linux/ras.h                    |   17 +-
 include/ras/ras_event.h                |   71 ++
 26 files changed, 2939 insertions(+), 59 deletions(-)
 create mode 100644 Documentation/ABI/testing/debugfs-aest
 create mode 100644 arch/arm64/include/asm/arm-cmn.h
 create mode 100644 arch/arm64/include/asm/ras.h
 create mode 100644 drivers/acpi/arm64/aest.c
 create mode 100644 drivers/ras/aest/Kconfig
 create mode 100644 drivers/ras/aest/Makefile
 create mode 100644 drivers/ras/aest/aest-cmn.c
 create mode 100644 drivers/ras/aest/aest-core.c
 create mode 100644 drivers/ras/aest/aest-inject.c
 create mode 100644 drivers/ras/aest/aest-sysfs.c
 create mode 100644 drivers/ras/aest/aest.h
 create mode 100644 include/linux/acpi_aest.h

-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v4 01/17] ACPI/AEST: Parse the AEST table
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 02/17] ras: AEST: Add probe/remove for AEST driver Ruidong Tian
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

This patch introduces the creation of AEST platform devices, where each
device represents a logical "error node device" grouping one or more
AEST nodes from the ACPI table.

Instead of relying on the optional 'error_node_device' field in the AEST
table[1], this commit uses the interrupt number as the sole identifier for
the parent device. This design simplifies the driver logic by providing a
single, consistent mechanism for grouping nodes.

The 'error_node_device' field can be unspecified, but an AEST node is
always physically associated with a parent component. The interrupt
number serves as a reliable proxy for this association. This approach
is based on the safe assumption that distinct hardware components (e.g.,
SMMU, CMN, GIC) are assigned unique error interrupts and do not share
them.

[1]: https://developer.arm.com/documentation/den0085/latest

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 MAINTAINERS                  |   8 +
 arch/arm64/include/asm/ras.h |  15 ++
 drivers/acpi/arm64/Kconfig   |  11 ++
 drivers/acpi/arm64/Makefile  |   1 +
 drivers/acpi/arm64/aest.c    | 311 +++++++++++++++++++++++++++++++++++
 include/linux/acpi_aest.h    |  56 +++++++
 6 files changed, 402 insertions(+)
 create mode 100644 arch/arm64/include/asm/ras.h
 create mode 100644 drivers/acpi/arm64/aest.c
 create mode 100644 include/linux/acpi_aest.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 84cda6701685..d14e16c3a93b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -341,6 +341,14 @@ S:	Maintained
 F:	drivers/acpi/arm64
 F:	include/linux/acpi_iort.h
 
+ACPI AEST
+M:	Ruidong Tian <tianruidond@linux.alibaba.com>
+L:	linux-acpi@vger.kernel.org
+L:	linux-arm-kernel@lists.infradead.org
+S:	Supported
+F:	drivers/acpi/arm64/aest.c
+F:	include/linux/acpi_aest.h
+
 ACPI FOR RISC-V (ACPI/riscv)
 M:	Sunil V L <sunilvl@ventanamicro.com>
 L:	linux-acpi@vger.kernel.org
diff --git a/arch/arm64/include/asm/ras.h b/arch/arm64/include/asm/ras.h
new file mode 100644
index 000000000000..b6640b9972bf
--- /dev/null
+++ b/arch/arm64/include/asm/ras.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_RAS_H
+#define __ASM_RAS_H
+
+#include <linux/types.h>
+
+struct ras_ext_regs {
+	u64 err_fr;
+	u64 err_ctlr;
+	u64 err_status;
+	u64 err_addr;
+	u64 err_misc[4];
+};
+
+#endif /* __ASM_RAS_H */
diff --git a/drivers/acpi/arm64/Kconfig b/drivers/acpi/arm64/Kconfig
index f2fd79f22e7d..52df190356c8 100644
--- a/drivers/acpi/arm64/Kconfig
+++ b/drivers/acpi/arm64/Kconfig
@@ -24,3 +24,14 @@ config ACPI_APMT
 
 config ACPI_MPAM
 	bool
+
+config ACPI_AEST
+	bool "ARM Error Source Table Support"
+	depends on ARM64_RAS_EXTN
+
+	help
+	  The Arm Error Source Table (AEST) provides details on ACPI
+	  extensions that enable kernel-first handling of errors in a
+	  system that supports the Armv8 RAS extensions.
+
+	  If set, the kernel will report and log hardware errors.
diff --git a/drivers/acpi/arm64/Makefile b/drivers/acpi/arm64/Makefile
index 9390b57cb564..bad77fdbf8dd 100644
--- a/drivers/acpi/arm64/Makefile
+++ b/drivers/acpi/arm64/Makefile
@@ -7,5 +7,6 @@ obj-$(CONFIG_ACPI_IORT) 	+= iort.o
 obj-$(CONFIG_ACPI_MPAM) 	+= mpam.o
 obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o
 obj-$(CONFIG_ARM_AMBA)		+= amba.o
+obj-$(CONFIG_ACPI_AEST) 	+= aest.o
 obj-y				+= dma.o init.o
 obj-y				+= thermal_cpufreq.o
diff --git a/drivers/acpi/arm64/aest.c b/drivers/acpi/arm64/aest.c
new file mode 100644
index 000000000000..b8359b95f40f
--- /dev/null
+++ b/drivers/acpi/arm64/aest.c
@@ -0,0 +1,311 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ARM Error Source Table Support
+ *
+ * Copyright (c) 2025, Alibaba Group.
+ */
+
+#include <linux/xarray.h>
+#include <linux/platform_device.h>
+#include <linux/acpi_aest.h>
+
+#include "init.h"
+
+#undef pr_fmt
+#define pr_fmt(fmt) "ACPI AEST: " fmt
+
+static struct xarray *aest_array;
+
+static void __init aest_init_interface(struct acpi_aest_hdr *hdr,
+				       struct acpi_aest_node *node)
+{
+	struct acpi_aest_node_interface_header *interface;
+
+	interface = ACPI_ADD_PTR(struct acpi_aest_node_interface_header, hdr,
+				 hdr->node_interface_offset);
+
+	node->type = hdr->type;
+	node->interface_hdr = interface;
+
+	switch (interface->group_format) {
+	case ACPI_AEST_NODE_GROUP_FORMAT_4K: {
+		struct acpi_aest_node_interface_4k *interface_4k =
+			(struct acpi_aest_node_interface_4k *)(interface + 1);
+
+		node->common = &interface_4k->common;
+		node->record_implemented =
+			(unsigned long *)&interface_4k->error_record_implemented;
+		node->status_reporting =
+			(unsigned long *)&interface_4k->error_status_reporting;
+		node->addressing_mode =
+			(unsigned long *)&interface_4k->addressing_mode;
+		break;
+	}
+	case ACPI_AEST_NODE_GROUP_FORMAT_16K: {
+		struct acpi_aest_node_interface_16k *interface_16k =
+			(struct acpi_aest_node_interface_16k *)(interface + 1);
+
+		node->common = &interface_16k->common;
+		node->record_implemented =
+			(unsigned long *)interface_16k->error_record_implemented;
+		node->status_reporting =
+			(unsigned long *)interface_16k->error_status_reporting;
+		node->addressing_mode =
+			(unsigned long *)interface_16k->addressing_mode;
+		break;
+	}
+	case ACPI_AEST_NODE_GROUP_FORMAT_64K: {
+		struct acpi_aest_node_interface_64k *interface_64k =
+			(struct acpi_aest_node_interface_64k *)(interface + 1);
+
+		node->common = &interface_64k->common;
+		node->record_implemented =
+			(unsigned long *)interface_64k->error_record_implemented;
+		node->status_reporting =
+			(unsigned long *)interface_64k->error_status_reporting;
+		node->addressing_mode =
+			(unsigned long *)interface_64k->addressing_mode;
+		break;
+	}
+	default:
+		pr_err("invalid group format: %d\n", interface->group_format);
+	}
+
+	node->interrupt = ACPI_ADD_PTR(struct acpi_aest_node_interrupt_v2, hdr,
+				       hdr->node_interrupt_offset);
+
+	node->interrupt_count = hdr->node_interrupt_count;
+}
+
+static struct aest_hnode *__init
+acpi_aest_alloc_ahnode(struct acpi_aest_node *node, u64 error_device_id)
+{
+	struct aest_hnode *ahnode __free(kfree) = NULL;
+
+	ahnode = kzalloc(sizeof(*ahnode), GFP_KERNEL);
+	if (!ahnode)
+		return NULL;
+
+	INIT_LIST_HEAD(&ahnode->list);
+	ahnode->id = error_device_id;
+	ahnode->count = 0;
+	ahnode->type = node->type;
+
+	return_ptr(ahnode);
+}
+static int __init acpi_aest_init_node(struct acpi_aest_hdr *aest_hdr)
+{
+	struct aest_hnode *ahnode;
+	u64 error_device_id;
+	struct acpi_aest_node *node;
+
+	node = kzalloc(sizeof(*node), GFP_KERNEL);
+	if (!node)
+		return -ENOMEM;
+
+	node->spec_pointer =
+		ACPI_ADD_PTR(void, aest_hdr, aest_hdr->node_specific_offset);
+	if (aest_hdr->type == ACPI_AEST_PROCESSOR_ERROR_NODE)
+		node->processor_spec_pointer =
+			ACPI_ADD_PTR(void, node->spec_pointer,
+				     sizeof(struct acpi_aest_processor));
+
+	aest_init_interface(aest_hdr, node);
+
+	if (node->interrupt_count <= 0)
+		return -EINVAL;
+
+	error_device_id = node->interrupt[0].gsiv;
+	ahnode = xa_load(aest_array, error_device_id);
+	if (!ahnode) {
+		ahnode = acpi_aest_alloc_ahnode(node, error_device_id);
+		if (!ahnode)
+			return -ENOMEM;
+		xa_store(aest_array, error_device_id, ahnode, GFP_KERNEL);
+	}
+
+	list_add_tail(&node->list, &ahnode->list);
+	ahnode->count++;
+
+	return 0;
+}
+
+static int __init acpi_aest_init_nodes(struct acpi_table_header *aest_table)
+{
+	struct acpi_aest_hdr *aest_node, *aest_end;
+	struct acpi_table_aest *aest;
+	int rc;
+
+	aest = (struct acpi_table_aest *)aest_table;
+	aest_node = ACPI_ADD_PTR(struct acpi_aest_hdr, aest,
+				 sizeof(struct acpi_table_header));
+	aest_end = ACPI_ADD_PTR(struct acpi_aest_hdr, aest, aest_table->length);
+
+	while (aest_node < aest_end) {
+		if (((u64)aest_node + aest_node->length) > (u64)aest_end) {
+			pr_warn(FW_WARN
+				"AEST node pointer overflow, bad table.\n");
+			return -EINVAL;
+		}
+
+		rc = acpi_aest_init_node(aest_node);
+		if (rc)
+			return rc;
+
+		aest_node = ACPI_ADD_PTR(struct acpi_aest_hdr, aest_node,
+					 aest_node->length);
+	}
+
+	return 0;
+}
+
+static int acpi_aest_parse_irqs(struct platform_device *pdev,
+				struct acpi_aest_node *anode,
+				struct resource *res, int *res_idx, int irqs[2])
+{
+	int i;
+	struct acpi_aest_node_interrupt_v2 *interrupt;
+	int trigger, irq;
+
+	for (i = 0; i < anode->interrupt_count; i++) {
+		interrupt = &anode->interrupt[i];
+		if (irqs[interrupt->type])
+			continue;
+
+		trigger = (interrupt->flags & AEST_INTERRUPT_MODE) ?
+				  ACPI_LEVEL_SENSITIVE :
+				  ACPI_EDGE_SENSITIVE;
+
+		irq = acpi_register_gsi(&pdev->dev, interrupt->gsiv, trigger,
+					ACPI_ACTIVE_HIGH);
+		if (irq <= 0) {
+			pr_err("failed to map AEST GSI %d\n", interrupt->gsiv);
+			return irq;
+		}
+
+		res[*res_idx].start = irq;
+		res[*res_idx].end = irq;
+		res[*res_idx].flags = IORESOURCE_IRQ;
+		res[*res_idx].name = interrupt->type ? AEST_ERI_NAME :
+						       AEST_FHI_NAME;
+
+		(*res_idx)++;
+
+		irqs[interrupt->type] = irq;
+	}
+
+	return 0;
+}
+
+DEFINE_FREE(res, struct resource *, if (_T) kfree(_T))
+static struct platform_device *__init
+acpi_aest_alloc_pdev(struct aest_hnode *ahnode, int index)
+{
+	struct platform_device *pdev __free(platform_device_put) =
+		platform_device_alloc("AEST", index++);
+	struct resource *res __free(res);
+	struct acpi_aest_node *anode;
+	int ret, size, j, irq[AEST_MAX_INTERRUPT_PER_NODE] = { 0 };
+
+	if (!pdev)
+		return ERR_PTR(-ENOMEM);
+
+	res = kcalloc(ahnode->count + AEST_MAX_INTERRUPT_PER_NODE, sizeof(*res),
+		      GFP_KERNEL);
+	if (!res)
+		return ERR_PTR(-ENOMEM);
+
+	j = 0;
+	list_for_each_entry(anode, &ahnode->list, list) {
+		if (anode->interface_hdr->type !=
+		    ACPI_AEST_NODE_SYSTEM_REGISTER) {
+			res[j].name = AEST_NODE_NAME;
+			res[j].start = anode->interface_hdr->address;
+			switch (anode->interface_hdr->group_format) {
+			case ACPI_AEST_NODE_GROUP_FORMAT_4K:
+				size = 4 * KB;
+				break;
+			case ACPI_AEST_NODE_GROUP_FORMAT_16K:
+				size = 16 * KB;
+				break;
+			case ACPI_AEST_NODE_GROUP_FORMAT_64K:
+				size = 64 * KB;
+				break;
+			default:
+				size = 4 * KB;
+			}
+			res[j].end = res[j].start + size - 1;
+			res[j].flags = IORESOURCE_MEM;
+		}
+
+		ret = acpi_aest_parse_irqs(pdev, anode, res, &j, irq);
+		if (ret)
+			return ERR_PTR(ret);
+	}
+
+	ret = platform_device_add_resources(pdev, res, j);
+	if (ret)
+		return ERR_PTR(ret);
+
+	ret = platform_device_add_data(pdev, &ahnode, sizeof(ahnode));
+	if (ret)
+		return ERR_PTR(ret);
+
+	ret = platform_device_add(pdev);
+	if (ret)
+		return ERR_PTR(ret);
+
+	return_ptr(pdev);
+}
+static int __init acpi_aest_alloc_pdevs(void)
+{
+	int ret = 0, index = 0;
+	struct aest_hnode *ahnode = NULL;
+	unsigned long i;
+
+	xa_for_each(aest_array, i, ahnode) {
+		struct platform_device *pdev =
+			acpi_aest_alloc_pdev(ahnode, index++);
+
+		if (IS_ERR(pdev)) {
+			ret = PTR_ERR(pdev);
+			break;
+		}
+	}
+
+	return ret;
+}
+
+static int __init acpi_aest_init(void)
+{
+	int ret;
+
+	if (acpi_disabled)
+		return 0;
+
+	struct acpi_table_header *aest_table __free(acpi_put_table) =
+		acpi_get_table_pointer(ACPI_SIG_AEST, 0);
+	if (IS_ERR(aest_table))
+		return 0;
+
+	aest_array = kzalloc(sizeof(struct xarray), GFP_KERNEL);
+	if (!aest_array)
+		return -ENOMEM;
+
+	xa_init(aest_array);
+
+	ret = acpi_aest_init_nodes(aest_table);
+	if (ret) {
+		pr_err("Failed init aest node %d\n", ret);
+		return ret;
+	}
+
+	ret = acpi_aest_alloc_pdevs();
+	if (ret) {
+		pr_err("Failed alloc pdev %d\n", ret);
+		return ret;
+	}
+
+	return 0;
+}
+subsys_initcall_sync(acpi_aest_init);
diff --git a/include/linux/acpi_aest.h b/include/linux/acpi_aest.h
new file mode 100644
index 000000000000..53c1970e7583
--- /dev/null
+++ b/include/linux/acpi_aest.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ACPI_AEST_H__
+#define __ACPI_AEST_H__
+
+#include <asm/ras.h>
+#include <linux/acpi.h>
+
+/* AEST resource name */
+#define AEST_NODE_NAME "AEST:NODE"
+#define AEST_FHI_NAME "AEST:FHI"
+#define AEST_ERI_NAME "AEST:ERI"
+
+/* AEST interrupt */
+#define AEST_INTERRUPT_MODE BIT(0)
+
+#define AEST_MAX_INTERRUPT_PER_NODE 2
+
+#define KB 1024
+#define MB (1024 * KB)
+#define GB (1024 * MB)
+
+struct aest_hnode {
+	struct list_head list;
+	int count;
+	u32 id;
+	int type;
+};
+
+struct acpi_aest_node {
+	struct list_head list;
+	int type;
+	struct acpi_aest_node_interface_header *interface_hdr;
+	unsigned long *record_implemented;
+	unsigned long *status_reporting;
+	unsigned long *addressing_mode;
+	struct acpi_aest_node_interface_common *common;
+	union {
+		struct acpi_aest_processor *processor;
+		struct acpi_aest_memory *memory;
+		struct acpi_aest_smmu *smmu;
+		struct acpi_aest_vendor_v2 *vendor;
+		struct acpi_aest_gic *gic;
+		struct acpi_aest_pcie *pcie;
+		struct acpi_aest_proxy *proxy;
+		void *spec_pointer;
+	};
+	union {
+		struct acpi_aest_processor_cache *cache;
+		struct acpi_aest_processor_tlb *tlb;
+		struct acpi_aest_processor_generic *generic;
+		void *processor_spec_pointer;
+	};
+	struct acpi_aest_node_interrupt_v2 *interrupt;
+	int interrupt_count;
+};
+#endif /* __ACPI_AEST_H__ */
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 02/17] ras: AEST: Add probe/remove for AEST driver
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 01/17] ACPI/AEST: Parse the AEST table Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 03/17] ras: AEST: support different group format Ruidong Tian
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

Parse register information from the AEST table in the probe function,
create corresponding structures, and mappings AEST record.

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 MAINTAINERS                  |   2 +
 drivers/ras/Kconfig          |   1 +
 drivers/ras/Makefile         |   1 +
 drivers/ras/aest/Kconfig     |  17 +++
 drivers/ras/aest/Makefile    |   5 +
 drivers/ras/aest/aest-core.c | 217 +++++++++++++++++++++++++++++++++++
 drivers/ras/aest/aest.h      | 124 ++++++++++++++++++++
 include/linux/acpi_aest.h    |   9 ++
 8 files changed, 376 insertions(+)
 create mode 100644 drivers/ras/aest/Kconfig
 create mode 100644 drivers/ras/aest/Makefile
 create mode 100644 drivers/ras/aest/aest-core.c
 create mode 100644 drivers/ras/aest/aest.h

diff --git a/MAINTAINERS b/MAINTAINERS
index d14e16c3a93b..fd4c40c4607c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -346,7 +346,9 @@ M:	Ruidong Tian <tianruidond@linux.alibaba.com>
 L:	linux-acpi@vger.kernel.org
 L:	linux-arm-kernel@lists.infradead.org
 S:	Supported
+F:	arch/arm64/include/asm/ras.h
 F:	drivers/acpi/arm64/aest.c
+F:	drivers/ras/aest/
 F:	include/linux/acpi_aest.h
 
 ACPI FOR RISC-V (ACPI/riscv)
diff --git a/drivers/ras/Kconfig b/drivers/ras/Kconfig
index fc4f4bb94a4c..61a2a05d9c94 100644
--- a/drivers/ras/Kconfig
+++ b/drivers/ras/Kconfig
@@ -33,6 +33,7 @@ if RAS
 
 source "arch/x86/ras/Kconfig"
 source "drivers/ras/amd/atl/Kconfig"
+source "drivers/ras/aest/Kconfig"
 
 config RAS_FMPM
 	tristate "FRU Memory Poison Manager"
diff --git a/drivers/ras/Makefile b/drivers/ras/Makefile
index 11f95d59d397..72411ee9deaf 100644
--- a/drivers/ras/Makefile
+++ b/drivers/ras/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_RAS_CEC)	+= cec.o
 
 obj-$(CONFIG_RAS_FMPM)	+= amd/fmpm.o
 obj-y			+= amd/atl/
+obj-y 			+= aest/
diff --git a/drivers/ras/aest/Kconfig b/drivers/ras/aest/Kconfig
new file mode 100644
index 000000000000..0b09a5d5acce
--- /dev/null
+++ b/drivers/ras/aest/Kconfig
@@ -0,0 +1,17 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# ARM Error Source Table Support
+#
+# Copyright (c) 2025, Alibaba Group.
+#
+
+config AEST
+	tristate "ARM AEST Driver"
+	depends on ACPI_AEST && RAS
+
+	help
+	  The Arm Error Source Table (AEST) provides details on ACPI
+	  extensions that enable kernel-first handling of errors in a
+	  system that supports the Armv8 RAS extensions.
+
+	  If set, the kernel will report and log hardware errors.
diff --git a/drivers/ras/aest/Makefile b/drivers/ras/aest/Makefile
new file mode 100644
index 000000000000..a6ba7e36fb43
--- /dev/null
+++ b/drivers/ras/aest/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0-only
+
+obj-$(CONFIG_AEST) 	+= aest.o
+
+aest-y		:= aest-core.o
diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
new file mode 100644
index 000000000000..c7ef6c13fd44
--- /dev/null
+++ b/drivers/ras/aest/aest-core.c
@@ -0,0 +1,217 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ARM Error Source Table Support
+ *
+ * Copyright (c) 2025, Alibaba Group.
+ */
+
+#include <linux/platform_device.h>
+#include <linux/xarray.h>
+#include <linux/ras.h>
+
+#include "aest.h"
+
+DEFINE_PER_CPU(struct aest_device, percpu_adev);
+
+#undef pr_fmt
+#define pr_fmt(fmt) "AEST: " fmt
+
+static int aest_init_record(struct aest_record *record, int i,
+			    struct aest_node *node)
+{
+	struct device *dev = node->adev->dev;
+
+	record->name = devm_kasprintf(dev, GFP_KERNEL, "record%d", i);
+	if (!record->name)
+		return -ENOMEM;
+
+	if (node->base)
+		record->regs_base =
+			node->base + sizeof(struct ras_ext_regs) * i;
+
+	record->addressing_mode = test_bit(i, node->info->addressing_mode);
+	record->index = i;
+	record->node = node;
+
+	aest_record_dbg(record, "base: %p, index: %d, address mode: %x\n",
+			record->regs_base, record->index,
+			record->addressing_mode);
+	return 0;
+}
+
+static void aest_device_remove(struct platform_device *pdev)
+{
+	platform_set_drvdata(pdev, NULL);
+}
+
+static char *alloc_aest_node_name(struct aest_node *node)
+{
+	char *name;
+
+	switch (node->type) {
+	case ACPI_AEST_PROCESSOR_ERROR_NODE:
+		name = devm_kasprintf(node->adev->dev, GFP_KERNEL, "%s.%d",
+				      aest_node_name[node->type],
+				      node->info->processor->processor_id);
+		break;
+	case ACPI_AEST_MEMORY_ERROR_NODE:
+	case ACPI_AEST_SMMU_ERROR_NODE:
+	case ACPI_AEST_VENDOR_ERROR_NODE:
+	case ACPI_AEST_GIC_ERROR_NODE:
+	case ACPI_AEST_PCIE_ERROR_NODE:
+	case ACPI_AEST_PROXY_ERROR_NODE:
+		name = devm_kasprintf(node->adev->dev, GFP_KERNEL, "%s.%llx",
+				      aest_node_name[node->type],
+				      node->info->interface_hdr->address);
+		break;
+	default:
+		name = devm_kasprintf(node->adev->dev, GFP_KERNEL, "Unknown");
+	}
+
+	return name;
+}
+
+static int aest_node_set_errgsr(struct aest_device *adev,
+				struct aest_node *node)
+{
+	struct acpi_aest_node *anode = node->info;
+	u64 errgsr_base = anode->common->error_group_register_base;
+
+	if (anode->interface_hdr->type != ACPI_AEST_NODE_MEMORY_MAPPED)
+		return 0;
+
+	if (!node->base)
+		return 0;
+
+	if (!(anode->interface_hdr->flags & AEST_XFACE_FLAG_ERROR_GROUP)) {
+		node->errgsr = node->base + ERXGROUP;
+		return 0;
+	}
+
+	if (!errgsr_base)
+		return -EINVAL;
+
+	node->errgsr = devm_ioremap(adev->dev, errgsr_base, PAGE_SIZE);
+	if (!node->errgsr)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static int aest_init_node(struct aest_device *adev, struct aest_node *node,
+			  struct acpi_aest_node *anode)
+{
+	int i, ret;
+	u64 address;
+
+	node->adev = adev;
+	node->info = anode;
+	node->type = anode->type;
+	node->name = alloc_aest_node_name(node);
+	if (!node->name)
+		return -ENOMEM;
+	node->record_implemented = anode->record_implemented;
+	node->status_reporting = anode->status_reporting;
+
+	address = anode->interface_hdr->address;
+	if (address) {
+		node->base = devm_ioremap(adev->dev, address, PAGE_SIZE);
+		if (!node->base)
+			return -ENOMEM;
+	}
+
+	ret = aest_node_set_errgsr(adev, node);
+	if (ret)
+		return ret;
+
+	node->record_count = anode->interface_hdr->error_record_count;
+	node->records = devm_kcalloc(adev->dev, node->record_count,
+				     sizeof(struct aest_record), GFP_KERNEL);
+	if (!node->records)
+		return -ENOMEM;
+
+	for (i = 0; i < node->record_count; i++) {
+		ret = aest_init_record(&node->records[i], i, node);
+		if (ret)
+			return ret;
+	}
+	aest_node_dbg(node, "%d records, base: %llx, errgsr: %llx\n",
+		      node->record_count, (u64)node->base, (u64)node->errgsr);
+	return 0;
+}
+
+static int aest_init_nodes(struct aest_device *adev, struct aest_hnode *ahnode)
+{
+	struct acpi_aest_node *anode;
+	struct aest_node *node;
+	int ret, i = 0;
+
+	adev->node_cnt = ahnode->count;
+	adev->nodes = devm_kcalloc(adev->dev, adev->node_cnt,
+				   sizeof(struct aest_node), GFP_KERNEL);
+	if (!adev->nodes)
+		return -ENOMEM;
+
+	list_for_each_entry(anode, &ahnode->list, list) {
+		adev->type = anode->type;
+
+		node = &adev->nodes[i++];
+		ret = aest_init_node(adev, node, anode);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+static int aest_device_probe(struct platform_device *pdev)
+{
+	int ret;
+	struct aest_device *adev;
+	struct aest_hnode *ahnode;
+
+	ahnode = *((struct aest_hnode **)pdev->dev.platform_data);
+	if (!ahnode)
+		return -ENODEV;
+
+	adev = devm_kzalloc(&pdev->dev, sizeof(*adev), GFP_KERNEL);
+	if (!adev)
+		return -ENOMEM;
+
+	adev->dev = &pdev->dev;
+	adev->id = pdev->id;
+	aest_set_name(adev, ahnode);
+	ret = aest_init_nodes(adev, ahnode);
+	if (ret)
+		return ret;
+
+	platform_set_drvdata(pdev, adev);
+
+	aest_dev_dbg(adev, "Node cnt: %x, id: %x\n", adev->node_cnt, adev->id);
+
+	return 0;
+}
+
+static struct platform_driver aest_driver = {
+	.driver	= {
+		.name	= "AEST",
+	},
+	.probe	= aest_device_probe,
+	.remove = aest_device_remove,
+};
+
+static int __init aest_init(void)
+{
+	return platform_driver_register(&aest_driver);
+}
+module_init(aest_init);
+
+static void __exit aest_exit(void)
+{
+	platform_driver_unregister(&aest_driver);
+}
+module_exit(aest_exit);
+
+MODULE_DESCRIPTION("ARM AEST Driver");
+MODULE_AUTHOR("Ruidong Tian <tianruidong@linux.alibaba.com>");
+MODULE_LICENSE("GPL");
diff --git a/drivers/ras/aest/aest.h b/drivers/ras/aest/aest.h
new file mode 100644
index 000000000000..d918240c3f57
--- /dev/null
+++ b/drivers/ras/aest/aest.h
@@ -0,0 +1,124 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * ARM Error Source Table Support
+ *
+ * Copyright (c) 2025, Alibaba Group.
+ */
+
+#include <linux/acpi_aest.h>
+#include <asm/ras.h>
+
+#define MAX_GSI_PER_NODE 2
+
+#define aest_dev_err(__adev, format, ...) \
+	dev_err((__adev)->dev, format, ##__VA_ARGS__)
+#define aest_dev_info(__adev, format, ...) \
+	dev_info((__adev)->dev, format, ##__VA_ARGS__)
+#define aest_dev_dbg(__adev, format, ...) \
+	dev_dbg((__adev)->dev, format, ##__VA_ARGS__)
+
+#define aest_node_err(__node, format, ...)                          \
+	dev_err((__node)->adev->dev, "%s: " format, (__node)->name, \
+		##__VA_ARGS__)
+#define aest_node_info(__node, format, ...)                          \
+	dev_info((__node)->adev->dev, "%s: " format, (__node)->name, \
+		 ##__VA_ARGS__)
+#define aest_node_dbg(__node, format, ...)                          \
+	dev_dbg((__node)->adev->dev, "%s: " format, (__node)->name, \
+		##__VA_ARGS__)
+
+#define aest_record_err(__record, format, ...)                  \
+	dev_err((__record)->node->adev->dev, "%s: %s: " format, \
+		(__record)->node->name, (__record)->name, ##__VA_ARGS__)
+#define aest_record_info(__record, format, ...)                  \
+	dev_info((__record)->node->adev->dev, "%s: %s: " format, \
+		 (__record)->node->name, (__record)->name, ##__VA_ARGS__)
+#define aest_record_dbg(__record, format, ...)                  \
+	dev_dbg((__record)->node->adev->dev, "%s: %s: " format, \
+		(__record)->node->name, (__record)->name, ##__VA_ARGS__)
+
+#define ERXGROUP 0xE00
+
+struct aest_record {
+	char *name;
+	int index;
+	void __iomem *regs_base;
+
+	/*
+	 * This bit specifies the addressing mode  to populate the ERR_ADDR
+	 * register:
+	 *   0b: Error record reports System Physical Addresses (SPA) in
+	 *       the ERR_ADDR register.
+	 *   1b: Error record reports error node-specific Logical Addresses(LA)
+	 *       in the ERR_ADD register. OS must use other means to translate
+	 *       the reported LA into SPA
+	 */
+	int addressing_mode;
+	struct aest_node *node;
+};
+
+struct aest_node {
+	char *name;
+	u8 type;
+	void *errgsr;
+	void *base;
+
+	/*
+	 * This bitmap indicates which of the error records within this error
+	 * node must be polled for error status.
+	 * Bit[n] of this field pertains to error record corresponding to
+	 * index n in this error group.
+	 * Bit[n] = 0b: Error record at index n needs to be polled.
+	 * Bit[n] = 1b: Error record at index n do not needs to be polled.
+	 */
+	unsigned long *record_implemented;
+	/*
+	 * This bitmap indicates which of the error records within this error
+	 * node support error status reporting using ERRGSR register.
+	 * Bit[n] of this field pertains to error record corresponding to
+	 * index n in this error group.
+	 * Bit[n] = 0b: Error record at index n supports error status reporting
+	 *              through ERRGSR.S.
+	 * Bit[n] = 1b: Error record at index n does not support error reporting
+	 *              through the ERRGSR.S bit If this error record is
+	 *              implemented, then it must be polled explicitly for
+	 *              error events.
+	 */
+	unsigned long *status_reporting;
+
+	struct aest_device *adev;
+	struct acpi_aest_node *info;
+
+	int record_count;
+	struct aest_record *records;
+};
+
+struct aest_device {
+	struct device *dev;
+	u32 type;
+	int node_cnt;
+	struct aest_node *nodes;
+	u32 id;
+};
+
+static const char *const aest_node_name[] = {
+	[ACPI_AEST_PROCESSOR_ERROR_NODE] = "processor",
+	[ACPI_AEST_MEMORY_ERROR_NODE] = "memory",
+	[ACPI_AEST_SMMU_ERROR_NODE] = "smmu",
+	[ACPI_AEST_VENDOR_ERROR_NODE] = "vendor",
+	[ACPI_AEST_GIC_ERROR_NODE] = "gic",
+	[ACPI_AEST_PCIE_ERROR_NODE] = "pcie",
+	[ACPI_AEST_PROXY_ERROR_NODE] = "proxy",
+};
+
+static inline int aest_set_name(struct aest_device *adev,
+				struct aest_hnode *ahnode)
+{
+	adev->dev->init_name = devm_kasprintf(adev->dev, GFP_KERNEL, "%s%d",
+					      aest_node_name[ahnode->type],
+					      adev->id);
+	if (!adev->dev->init_name)
+		return -ENOMEM;
+
+	return 0;
+}
diff --git a/include/linux/acpi_aest.h b/include/linux/acpi_aest.h
index 53c1970e7583..77187ce43d44 100644
--- a/include/linux/acpi_aest.h
+++ b/include/linux/acpi_aest.h
@@ -15,6 +15,15 @@
 
 #define AEST_MAX_INTERRUPT_PER_NODE 2
 
+/* AEST interface */
+#define AEST_XFACE_FLAG_SHARED (1 << 0)
+#define AEST_XFACE_FLAG_CLEAR_MISC (1 << 1)
+#define AEST_XFACE_FLAG_ERROR_DEVICE (1 << 2)
+#define AEST_XFACE_FLAG_AFFINITY (1 << 3)
+#define AEST_XFACE_FLAG_ERROR_GROUP (1 << 4)
+#define AEST_XFACE_FLAG_FAULT_INJECT (1 << 5)
+#define AEST_XFACE_FLAG_INT_CONFIG (1 << 6)
+
 #define KB 1024
 #define MB (1024 * KB)
 #define GB (1024 * MB)
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 03/17] ras: AEST: support different group format
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 01/17] ACPI/AEST: Parse the AEST table Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 02/17] ras: AEST: Add probe/remove for AEST driver Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 04/17] ras: AEST: Unify the read/write interface for system and MMIO register Ruidong Tian
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

Support for various AEST group formats allows for flexible configuration of
AEST node address space sizes and maximum record counts per group.

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 drivers/ras/aest/aest-core.c |  6 ++++--
 drivers/ras/aest/aest.h      | 39 +++++++++++++++++++++++++++++++++++-
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
index c7ef6c13fd44..acebb293ac75 100644
--- a/drivers/ras/aest/aest-core.c
+++ b/drivers/ras/aest/aest-core.c
@@ -84,7 +84,7 @@ static int aest_node_set_errgsr(struct aest_device *adev,
 		return 0;
 
 	if (!(anode->interface_hdr->flags & AEST_XFACE_FLAG_ERROR_GROUP)) {
-		node->errgsr = node->base + ERXGROUP;
+		node->errgsr = node->base + node->group->errgsr_offset;
 		return 0;
 	}
 
@@ -112,10 +112,12 @@ static int aest_init_node(struct aest_device *adev, struct aest_node *node,
 		return -ENOMEM;
 	node->record_implemented = anode->record_implemented;
 	node->status_reporting = anode->status_reporting;
+	node->group = &aest_group_config[anode->interface_hdr->group_format];
 
 	address = anode->interface_hdr->address;
 	if (address) {
-		node->base = devm_ioremap(adev->dev, address, PAGE_SIZE);
+		node->base =
+			devm_ioremap(adev->dev, address, node->group->size);
 		if (!node->base)
 			return -ENOMEM;
 	}
diff --git a/drivers/ras/aest/aest.h b/drivers/ras/aest/aest.h
index d918240c3f57..3250675e99b7 100644
--- a/drivers/ras/aest/aest.h
+++ b/drivers/ras/aest/aest.h
@@ -37,7 +37,15 @@
 	dev_dbg((__record)->node->adev->dev, "%s: %s: " format, \
 		(__record)->node->name, (__record)->name, ##__VA_ARGS__)
 
-#define ERXGROUP 0xE00
+#define ERXGROUP_4K_OFFSET 0xE00
+#define ERXGROUP_16K_OFFSET 0x3800
+#define ERXGROUP_64K_OFFSET 0xE000
+#define ERXGROUP_4K_SIZE (4 * KB)
+#define ERXGROUP_16K_SIZE (16 * KB)
+#define ERXGROUP_64K_SIZE (64 * KB)
+#define ERXGROUP_4K_ERRGSR_NUM 1
+#define ERXGROUP_16K_ERRGSR_NUM 4
+#define ERXGROUP_64K_ERRGSR_NUM 14
 
 struct aest_record {
 	char *name;
@@ -57,6 +65,34 @@ struct aest_record {
 	struct aest_node *node;
 };
 
+struct aest_group {
+	int type;
+	int errgsr_num;
+	size_t size;
+	u64 errgsr_offset;
+};
+
+static const struct aest_group aest_group_config[] = {
+	[ACPI_AEST_NODE_GROUP_FORMAT_4K] = {
+		.type = ACPI_AEST_NODE_GROUP_FORMAT_4K,
+		.errgsr_num = ERXGROUP_4K_ERRGSR_NUM,
+		.size = ERXGROUP_4K_SIZE,
+		.errgsr_offset = ERXGROUP_4K_OFFSET,
+	},
+	[ACPI_AEST_NODE_GROUP_FORMAT_16K] = {
+		.type = ACPI_AEST_NODE_GROUP_FORMAT_16K,
+		.errgsr_num = ERXGROUP_16K_ERRGSR_NUM,
+		.size = ERXGROUP_16K_SIZE,
+		.errgsr_offset = ERXGROUP_16K_OFFSET,
+	},
+	[ACPI_AEST_NODE_GROUP_FORMAT_64K] = {
+		.type = ACPI_AEST_NODE_GROUP_FORMAT_64K,
+		.errgsr_num = ERXGROUP_64K_ERRGSR_NUM,
+		.size = ERXGROUP_64K_SIZE,
+		.errgsr_offset = ERXGROUP_64K_OFFSET,
+	},
+};
+
 struct aest_node {
 	char *name;
 	u8 type;
@@ -86,6 +122,7 @@ struct aest_node {
 	 */
 	unsigned long *status_reporting;
 
+	const struct aest_group *group;
 	struct aest_device *adev;
 	struct acpi_aest_node *info;
 
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 04/17] ras: AEST: Unify the read/write interface for system and MMIO register
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
                   ` (2 preceding siblings ...)
  2025-12-22  9:43 ` [PATCH v4 03/17] ras: AEST: support different group format Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 05/17] ras: AEST: Probe RAS system architecture version Ruidong Tian
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

Use record_read/write to simultaneously read and write system registers and
MMIO registers while maintaining code conciseness.

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 drivers/ras/aest/aest-core.c |  1 +
 drivers/ras/aest/aest.h      | 94 ++++++++++++++++++++++++++++++++++++
 2 files changed, 95 insertions(+)

diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
index acebb293ac75..f4a5119dc513 100644
--- a/drivers/ras/aest/aest-core.c
+++ b/drivers/ras/aest/aest-core.c
@@ -29,6 +29,7 @@ static int aest_init_record(struct aest_record *record, int i,
 		record->regs_base =
 			node->base + sizeof(struct ras_ext_regs) * i;
 
+	record->access = &aest_access[node->info->interface_hdr->type];
 	record->addressing_mode = test_bit(i, node->info->addressing_mode);
 	record->index = i;
 	record->node = node;
diff --git a/drivers/ras/aest/aest.h b/drivers/ras/aest/aest.h
index 3250675e99b7..31131cce9928 100644
--- a/drivers/ras/aest/aest.h
+++ b/drivers/ras/aest/aest.h
@@ -10,6 +10,11 @@
 
 #define MAX_GSI_PER_NODE 2
 
+#define record_read(record, offset) \
+	record->access->read(record->regs_base, offset)
+#define record_write(record, offset, val) \
+	record->access->write(record->regs_base, offset, val)
+
 #define aest_dev_err(__adev, format, ...) \
 	dev_err((__adev)->dev, format, ##__VA_ARGS__)
 #define aest_dev_info(__adev, format, ...) \
@@ -47,6 +52,20 @@
 #define ERXGROUP_16K_ERRGSR_NUM 4
 #define ERXGROUP_64K_ERRGSR_NUM 14
 
+#define ERXFR 0x0
+#define ERXCTLR 0x8
+#define ERXSTATUS 0x10
+#define ERXADDR 0x18
+#define ERXMISC0 0x20
+#define ERXMISC1 0x28
+#define ERXMISC2 0x30
+#define ERXMISC3 0x38
+
+struct aest_access {
+	u64 (*read)(void *base, u32 offset);
+	void (*write)(void *base, u32 offset, u64 val);
+};
+
 struct aest_record {
 	char *name;
 	int index;
@@ -63,6 +82,7 @@ struct aest_record {
 	 */
 	int addressing_mode;
 	struct aest_node *node;
+	const struct aest_access *access;
 };
 
 struct aest_group {
@@ -159,3 +179,77 @@ static inline int aest_set_name(struct aest_device *adev,
 
 	return 0;
 }
+
+#define CASE_READ(res, x)                           \
+	case (x): {                                 \
+		res = read_sysreg_s(SYS_##x##_EL1); \
+		break;                              \
+	}
+
+#define CASE_WRITE(val, x)                            \
+	case (x): {                                   \
+		write_sysreg_s((val), SYS_##x##_EL1); \
+		break;                                \
+	}
+
+static inline u64 aest_sysreg_read(void *__unused, u32 offset)
+{
+	u64 res;
+
+	switch (offset) {
+		CASE_READ(res, ERXFR)
+		CASE_READ(res, ERXCTLR)
+		CASE_READ(res, ERXSTATUS)
+		CASE_READ(res, ERXADDR)
+		CASE_READ(res, ERXMISC0)
+		CASE_READ(res, ERXMISC1)
+		CASE_READ(res, ERXMISC2)
+		CASE_READ(res, ERXMISC3)
+	default :
+		res = 0;
+	}
+	return res;
+}
+
+static inline void aest_sysreg_write(void *base, u32 offset, u64 val)
+{
+	switch (offset) {
+		CASE_WRITE(val, ERXFR)
+		CASE_WRITE(val, ERXCTLR)
+		CASE_WRITE(val, ERXSTATUS)
+		CASE_WRITE(val, ERXADDR)
+		CASE_WRITE(val, ERXMISC0)
+		CASE_WRITE(val, ERXMISC1)
+		CASE_WRITE(val, ERXMISC2)
+		CASE_WRITE(val, ERXMISC3)
+	default :
+		return;
+	}
+}
+
+static inline u64 aest_iomem_read(void *base, u32 offset)
+{
+	return readq_relaxed(base + offset);
+}
+
+static inline void aest_iomem_write(void *base, u32 offset, u64 val)
+{
+	writeq_relaxed(val, base + offset);
+}
+
+/* access type is decided by AEST interface type. */
+static const struct aest_access aest_access[] = {
+	[ACPI_AEST_NODE_SYSTEM_REGISTER] = {
+		.read = aest_sysreg_read,
+		.write = aest_sysreg_write,
+	},
+	[ACPI_AEST_NODE_MEMORY_MAPPED] = {
+		.read = aest_iomem_read,
+		.write = aest_iomem_write,
+	},
+	[ACPI_AEST_NODE_SINGLE_RECORD_MEMORY_MAPPED] = {
+		.read = aest_iomem_read,
+		.write = aest_iomem_write,
+	},
+	{ }
+};
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 05/17] ras: AEST: Probe RAS system architecture version
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
                   ` (3 preceding siblings ...)
  2025-12-22  9:43 ` [PATCH v4 04/17] ras: AEST: Unify the read/write interface for system and MMIO register Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 06/17] ras: AEST: Support RAS Common Fault Injection Model Extension Ruidong Tian
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

The RAS version of a component can be probed via its ERRDEVARCH register.

In cases where a component (e.g., SMMU) does not implement an ERRDEVARCH
register, the driver falls back to using the RAS version of the Processing
Element (PE).

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 arch/arm64/include/asm/ras.h |  3 +++
 drivers/ras/aest/aest-core.c | 22 ++++++++++++++++++++++
 drivers/ras/aest/aest.h      |  3 +++
 3 files changed, 28 insertions(+)

diff --git a/arch/arm64/include/asm/ras.h b/arch/arm64/include/asm/ras.h
index b6640b9972bf..da7c441252fe 100644
--- a/arch/arm64/include/asm/ras.h
+++ b/arch/arm64/include/asm/ras.h
@@ -4,6 +4,9 @@
 
 #include <linux/types.h>
 
+/* ERRDEVARCH */
+#define ERRDEVARCH_REV GENMASK(19, 16)
+
 struct ras_ext_regs {
 	u64 err_fr;
 	u64 err_ctlr;
diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
index f4a5119dc513..84b2fb8127ff 100644
--- a/drivers/ras/aest/aest-core.c
+++ b/drivers/ras/aest/aest-core.c
@@ -16,6 +16,27 @@ DEFINE_PER_CPU(struct aest_device, percpu_adev);
 #undef pr_fmt
 #define pr_fmt(fmt) "AEST: " fmt
 
+static int get_aest_node_ver(struct aest_node *node)
+{
+	u64 reg;
+	void *devarch_base;
+
+	if (node->type == ACPI_AEST_GIC_ERROR_NODE) {
+		devarch_base = ioremap(node->info->interface_hdr->address +
+					       GIC_ERRDEVARCH,
+				       PAGE_SIZE);
+		if (!devarch_base)
+			return 0;
+
+		reg = readl_relaxed(devarch_base);
+		iounmap(devarch_base);
+
+		return FIELD_GET(ERRDEVARCH_REV, reg);
+	}
+
+	return FIELD_GET(ID_AA64PFR0_EL1_RAS_MASK, read_cpuid(ID_AA64PFR0_EL1));
+}
+
 static int aest_init_record(struct aest_record *record, int i,
 			    struct aest_node *node)
 {
@@ -108,6 +129,7 @@ static int aest_init_node(struct aest_device *adev, struct aest_node *node,
 	node->adev = adev;
 	node->info = anode;
 	node->type = anode->type;
+	node->version = get_aest_node_ver(node);
 	node->name = alloc_aest_node_name(node);
 	if (!node->name)
 		return -ENOMEM;
diff --git a/drivers/ras/aest/aest.h b/drivers/ras/aest/aest.h
index 31131cce9928..bf0b9a49fdaa 100644
--- a/drivers/ras/aest/aest.h
+++ b/drivers/ras/aest/aest.h
@@ -61,6 +61,8 @@
 #define ERXMISC2 0x30
 #define ERXMISC3 0x38
 
+#define GIC_ERRDEVARCH 0xFFBC
+
 struct aest_access {
 	u64 (*read)(void *base, u32 offset);
 	void (*write)(void *base, u32 offset, u64 val);
@@ -141,6 +143,7 @@ struct aest_node {
 	 *              error events.
 	 */
 	unsigned long *status_reporting;
+	int version;
 
 	const struct aest_group *group;
 	struct aest_device *adev;
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 06/17] ras: AEST: Support RAS Common Fault Injection Model Extension
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
                   ` (4 preceding siblings ...)
  2025-12-22  9:43 ` [PATCH v4 05/17] ras: AEST: Probe RAS system architecture version Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 07/17] ras: AEST: Support CE threshold of error record Ruidong Tian
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

Add inject register descripted in Common Fault Injection Model
Extension.

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 drivers/ras/aest/aest-core.c | 15 ++++++++++++++-
 drivers/ras/aest/aest.h      | 10 ++++++++++
 2 files changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
index 84b2fb8127ff..1218ae51079c 100644
--- a/drivers/ras/aest/aest-core.c
+++ b/drivers/ras/aest/aest-core.c
@@ -124,7 +124,7 @@ static int aest_init_node(struct aest_device *adev, struct aest_node *node,
 			  struct acpi_aest_node *anode)
 {
 	int i, ret;
-	u64 address;
+	u64 address, flags;
 
 	node->adev = adev;
 	node->info = anode;
@@ -145,6 +145,19 @@ static int aest_init_node(struct aest_device *adev, struct aest_node *node,
 			return -ENOMEM;
 	}
 
+	flags = anode->interface_hdr->flags;
+	address = node->info->common->fault_inject_register_base;
+	if ((flags & AEST_XFACE_FLAG_FAULT_INJECT) && address) {
+		if (address - anode->interface_hdr->address < node->group->size)
+			node->inj = node->base +
+				    (address - anode->interface_hdr->address);
+		else {
+			node->inj = devm_ioremap(adev->dev, address, PAGE_SIZE);
+			if (!node->inj)
+				return -ENOMEM;
+		}
+	}
+
 	ret = aest_node_set_errgsr(adev, node);
 	if (ret)
 		return ret;
diff --git a/drivers/ras/aest/aest.h b/drivers/ras/aest/aest.h
index bf0b9a49fdaa..505ecd9635bc 100644
--- a/drivers/ras/aest/aest.h
+++ b/drivers/ras/aest/aest.h
@@ -60,6 +60,9 @@
 #define ERXMISC1 0x28
 #define ERXMISC2 0x30
 #define ERXMISC3 0x38
+#define ERXPFGF 0x800
+#define ERXPFGCTL 0x808
+#define ERXPFGCDN 0x810
 
 #define GIC_ERRDEVARCH 0xFFBC
 
@@ -120,6 +123,7 @@ struct aest_node {
 	u8 type;
 	void *errgsr;
 	void *base;
+	void *inj;
 
 	/*
 	 * This bitmap indicates which of the error records within this error
@@ -208,6 +212,9 @@ static inline u64 aest_sysreg_read(void *__unused, u32 offset)
 		CASE_READ(res, ERXMISC1)
 		CASE_READ(res, ERXMISC2)
 		CASE_READ(res, ERXMISC3)
+		CASE_READ(res, ERXPFGF)
+		CASE_READ(res, ERXPFGCTL)
+		CASE_READ(res, ERXPFGCDN)
 	default :
 		res = 0;
 	}
@@ -225,6 +232,9 @@ static inline void aest_sysreg_write(void *base, u32 offset, u64 val)
 		CASE_WRITE(val, ERXMISC1)
 		CASE_WRITE(val, ERXMISC2)
 		CASE_WRITE(val, ERXMISC3)
+		CASE_WRITE(val, ERXPFGF)
+		CASE_WRITE(val, ERXPFGCTL)
+		CASE_WRITE(val, ERXPFGCDN)
 	default :
 		return;
 	}
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 07/17] ras: AEST: Support CE threshold of error record
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
                   ` (5 preceding siblings ...)
  2025-12-22  9:43 ` [PATCH v4 06/17] ras: AEST: Support RAS Common Fault Injection Model Extension Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 08/17] ras: AEST: Enable and register IRQs Ruidong Tian
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

The CE threshold defines the number of Correctable Errors (CE) that
must occur in a record before triggering an interrupt. Error records
support multiple threshold configurations, including 8B, 16B, and 32B.
This patch detects the supported threshold settings for error records
and sets the default threshold to 1, ensuring an interrupt is generated
for every CE occurrence.

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 arch/arm64/include/asm/ras.h | 41 ++++++++++++++++++++
 drivers/ras/aest/aest-core.c | 74 ++++++++++++++++++++++++++++++++++++
 drivers/ras/aest/aest.h      | 17 +++++++++
 include/linux/acpi_aest.h    |  3 ++
 4 files changed, 135 insertions(+)

diff --git a/arch/arm64/include/asm/ras.h b/arch/arm64/include/asm/ras.h
index da7c441252fe..6c51d27520c0 100644
--- a/arch/arm64/include/asm/ras.h
+++ b/arch/arm64/include/asm/ras.h
@@ -4,9 +4,50 @@
 
 #include <linux/types.h>
 
+/* ERR<n>FR */
+#define ERR_FR_CE GENMASK_ULL(54, 53)
+#define ERR_FR_RP BIT(15)
+#define ERR_FR_CEC GENMASK_ULL(14, 12)
+
+#define ERR_FR_RP_SINGLE_COUNTER 0
+#define ERR_FR_RP_DOUBLE_COUNTER 1
+
+#define ERR_FR_CEC_0B_COUNTER 0
+#define ERR_FR_CEC_8B_COUNTER BIT(1)
+#define ERR_FR_CEC_16B_COUNTER BIT(2)
+
+/* ERR<n>MISC0 */
+
+/* ERR<n>FR.CEC == 0b010, ERR<n>FR.RP == 0  */
+#define ERR_MISC0_8B_OF BIT(39)
+#define ERR_MISC0_8B_CEC GENMASK_ULL(38, 32)
+
+/* ERR<n>FR.CEC == 0b100, ERR<n>FR.RP == 0  */
+#define ERR_MISC0_16B_OF BIT(47)
+#define ERR_MISC0_16B_CEC GENMASK_ULL(46, 32)
+
+#define ERR_MISC0_CEC_SHIFT 31
+
+#define ERR_8B_CEC_MAX (ERR_MISC0_8B_CEC >> ERR_MISC0_CEC_SHIFT)
+#define ERR_16B_CEC_MAX (ERR_MISC0_16B_CEC >> ERR_MISC0_CEC_SHIFT)
+
+/* ERR<n>FR.CEC == 0b100, ERR<n>FR.RP == 1  */
+#define ERR_MISC0_16B_OFO BIT(63)
+#define ERR_MISC0_16B_CECO GENMASK_ULL(62, 48)
+#define ERR_MISC0_16B_OFR BIT(47)
+#define ERR_MISC0_16B_CECR GENMASK_ULL(46, 32)
+
 /* ERRDEVARCH */
 #define ERRDEVARCH_REV GENMASK(19, 16)
 
+enum ras_ce_threshold {
+	RAS_CE_THRESHOLD_0B,
+	RAS_CE_THRESHOLD_8B,
+	RAS_CE_THRESHOLD_16B,
+	RAS_CE_THRESHOLD_32B,
+	UNKNOWN,
+};
+
 struct ras_ext_regs {
 	u64 err_fr;
 	u64 err_ctlr;
diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
index 1218ae51079c..5cfe91a6d72a 100644
--- a/drivers/ras/aest/aest-core.c
+++ b/drivers/ras/aest/aest-core.c
@@ -16,6 +16,79 @@ DEFINE_PER_CPU(struct aest_device, percpu_adev);
 #undef pr_fmt
 #define pr_fmt(fmt) "AEST: " fmt
 
+static enum ras_ce_threshold aest_get_ce_threshold(struct aest_record *record)
+{
+	u64 err_fr, err_fr_cec, err_fr_rp = -1;
+
+	err_fr = record_read(record, ERXFR);
+	err_fr_cec = FIELD_GET(ERR_FR_CEC, err_fr);
+	err_fr_rp = FIELD_GET(ERR_FR_RP, err_fr);
+
+	if (err_fr_cec == ERR_FR_CEC_0B_COUNTER)
+		return RAS_CE_THRESHOLD_0B;
+	else if (err_fr_rp == ERR_FR_RP_DOUBLE_COUNTER)
+		return RAS_CE_THRESHOLD_32B;
+	else if (err_fr_cec == ERR_FR_CEC_8B_COUNTER)
+		return RAS_CE_THRESHOLD_8B;
+	else if (err_fr_cec == ERR_FR_CEC_16B_COUNTER)
+		return RAS_CE_THRESHOLD_16B;
+	else
+		return UNKNOWN;
+}
+
+static const struct ce_threshold_info ce_info[] = {
+	[RAS_CE_THRESHOLD_0B] = { 0 },
+	[RAS_CE_THRESHOLD_8B] = {
+		.max_count = ERR_8B_CEC_MAX,
+		.mask = ERR_MISC0_8B_CEC,
+		.shift = ERR_MISC0_CEC_SHIFT,
+	},
+	[RAS_CE_THRESHOLD_16B] = {
+		.max_count = ERR_16B_CEC_MAX,
+		.mask = ERR_MISC0_16B_CEC,
+		.shift = ERR_MISC0_CEC_SHIFT,
+	},
+};
+
+static void aest_set_ce_threshold(struct aest_record *record)
+{
+	u64 err_misc0;
+	struct ce_threshold *ce = &record->ce;
+	const struct ce_threshold_info *info;
+
+	record->threshold_type = aest_get_ce_threshold(record);
+
+	switch (record->threshold_type) {
+	case RAS_CE_THRESHOLD_0B:
+		aest_record_dbg(record, "do not support CE threshold!\n");
+		return;
+	case RAS_CE_THRESHOLD_8B:
+		aest_record_dbg(record, "support 8 bit CE threshold!\n");
+		break;
+	case RAS_CE_THRESHOLD_16B:
+		aest_record_dbg(record, "support 16 bit CE threshold!\n");
+		break;
+	case RAS_CE_THRESHOLD_32B:
+		aest_record_dbg(record, "not support 32 bit CE threshold!\n");
+		break;
+	default:
+		aest_record_dbg(record, "Unknown misc0 ce threshold!\n");
+	}
+
+	err_misc0 = record_read(record, ERXMISC0);
+	info = &ce_info[record->threshold_type];
+	ce->info = info;
+
+	// Default CE threshold is 1.
+	ce->count = info->max_count;
+	ce->threshold = DEFAULT_CE_THRESHOLD;
+	ce->reg_val = err_misc0 | info->mask;
+
+	record_write(record, ERXMISC0, ce->reg_val);
+	aest_record_dbg(record, "CE threshold is %llx, controlled by Kernel",
+			ce->threshold);
+}
+
 static int get_aest_node_ver(struct aest_node *node)
 {
 	u64 reg;
@@ -54,6 +127,7 @@ static int aest_init_record(struct aest_record *record, int i,
 	record->addressing_mode = test_bit(i, node->info->addressing_mode);
 	record->index = i;
 	record->node = node;
+	aest_set_ce_threshold(record);
 
 	aest_record_dbg(record, "base: %p, index: %d, address mode: %x\n",
 			record->regs_base, record->index,
diff --git a/drivers/ras/aest/aest.h b/drivers/ras/aest/aest.h
index 505ecd9635bc..85eeed79bcbe 100644
--- a/drivers/ras/aest/aest.h
+++ b/drivers/ras/aest/aest.h
@@ -9,6 +9,7 @@
 #include <asm/ras.h>
 
 #define MAX_GSI_PER_NODE 2
+#define DEFAULT_CE_THRESHOLD 1
 
 #define record_read(record, offset) \
 	record->access->read(record->regs_base, offset)
@@ -71,6 +72,19 @@ struct aest_access {
 	void (*write)(void *base, u32 offset, u64 val);
 };
 
+struct ce_threshold_info {
+	const u64 max_count;
+	const u64 mask;
+	const u64 shift;
+};
+
+struct ce_threshold {
+	const struct ce_threshold_info *info;
+	u64 count;
+	u64 threshold;
+	u64 reg_val;
+};
+
 struct aest_record {
 	char *name;
 	int index;
@@ -88,6 +102,9 @@ struct aest_record {
 	int addressing_mode;
 	struct aest_node *node;
 	const struct aest_access *access;
+
+	struct ce_threshold ce;
+	enum ras_ce_threshold threshold_type;
 };
 
 struct aest_group {
diff --git a/include/linux/acpi_aest.h b/include/linux/acpi_aest.h
index 77187ce43d44..a7898c643896 100644
--- a/include/linux/acpi_aest.h
+++ b/include/linux/acpi_aest.h
@@ -13,6 +13,9 @@
 /* AEST interrupt */
 #define AEST_INTERRUPT_MODE BIT(0)
 
+#define AEST_INTERRUPT_FHI_UE_SUPPORT		BIT(0)
+#define AEST_INTERRUPT_FHI_UE_NO_SUPPORT		BIT(1)
+
 #define AEST_MAX_INTERRUPT_PER_NODE 2
 
 /* AEST interface */
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 08/17] ras: AEST: Enable and register IRQs
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
                   ` (6 preceding siblings ...)
  2025-12-22  9:43 ` [PATCH v4 07/17] ras: AEST: Support CE threshold of error record Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 09/17] ras: AEST: Add cpuhp callback Ruidong Tian
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

The interrupt numbers for certain error records may be explicitly
programmed into their configuration register.

And for PPIs, each core will maintains its own copy of the aest_device
structure.

Given that handling RAS errors entails complex processes such as EDAC
and memory_failure, all handling is deferred to and handled within a
bottom-half context.

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 arch/arm64/include/asm/ras.h |  36 +++
 drivers/ras/aest/aest-core.c | 531 ++++++++++++++++++++++++++++++++++-
 drivers/ras/aest/aest.h      |  56 ++++
 include/linux/acpi_aest.h    |   7 +
 include/linux/ras.h          |   8 +
 5 files changed, 637 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/ras.h b/arch/arm64/include/asm/ras.h
index 6c51d27520c0..02cf15278d9f 100644
--- a/arch/arm64/include/asm/ras.h
+++ b/arch/arm64/include/asm/ras.h
@@ -2,6 +2,7 @@
 #ifndef __ASM_RAS_H
 #define __ASM_RAS_H
 
+#include <linux/bits.h>
 #include <linux/types.h>
 
 /* ERR<n>FR */
@@ -37,6 +38,41 @@
 #define ERR_MISC0_16B_OFR BIT(47)
 #define ERR_MISC0_16B_CECR GENMASK_ULL(46, 32)
 
+/* ERR<n>STATUS */
+#define ERR_STATUS_AV BIT(31)
+#define ERR_STATUS_V BIT(30)
+#define ERR_STATUS_UE BIT(29)
+#define ERR_STATUS_ER BIT(28)
+#define ERR_STATUS_OF BIT(27)
+#define ERR_STATUS_MV BIT(26)
+#define ERR_STATUS_CE (BIT(25) | BIT(24))
+#define ERR_STATUS_DE BIT(23)
+#define ERR_STATUS_PN BIT(22)
+#define ERR_STATUS_UET (BIT(21) | BIT(20))
+#define ERR_STATUS_CI BIT(19)
+#define ERR_STATUS_IERR GENMASK_ULL(15, 8)
+#define ERR_STATUS_SERR GENMASK_ULL(7, 0)
+
+/* Theses bits are	 write-one-to-clear */
+#define ERR_STATUS_W1TC                                                  \
+	(ERR_STATUS_AV | ERR_STATUS_V | ERR_STATUS_UE | ERR_STATUS_ER |  \
+	 ERR_STATUS_OF | ERR_STATUS_MV | ERR_STATUS_CE | ERR_STATUS_DE | \
+	 ERR_STATUS_PN | ERR_STATUS_UET | ERR_STATUS_CI)
+
+#define ERR_STATUS_UET_UC 0
+#define ERR_STATUS_UET_UEU 1
+#define ERR_STATUS_UET_UEO 2
+#define ERR_STATUS_UET_UER 3
+
+/* ERR<n>ADDR */
+#define ERR_ADDR_AI BIT(61)
+#define ERR_ADDR_PADDR GENMASK_ULL(55, 0)
+
+/* ERR<n>CTLR */
+#define ERR_CTLR_CFI BIT(8)
+#define ERR_CTLR_FI BIT(3)
+#define ERR_CTLR_UI BIT(2)
+
 /* ERRDEVARCH */
 #define ERRDEVARCH_REV GENMASK(19, 16)
 
diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
index 5cfe91a6d72a..5ec0ba38f51b 100644
--- a/drivers/ras/aest/aest-core.c
+++ b/drivers/ras/aest/aest-core.c
@@ -5,8 +5,11 @@
  * Copyright (c) 2025, Alibaba Group.
  */
 
+#include <linux/interrupt.h>
+#include <linux/panic.h>
 #include <linux/platform_device.h>
 #include <linux/xarray.h>
+#include <linux/genalloc.h>
 #include <linux/ras.h>
 
 #include "aest.h"
@@ -16,6 +19,439 @@ DEFINE_PER_CPU(struct aest_device, percpu_adev);
 #undef pr_fmt
 #define pr_fmt(fmt) "AEST: " fmt
 
+/*
+ * This memory pool is only to be used to save AEST node in AEST irq context.
+ * There can be 500 AEST node at most.
+ */
+#define AEST_NODE_ALLOCED_MAX 500
+
+#define AEST_LOG_PREFIX_BUFFER 64
+
+BLOCKING_NOTIFIER_HEAD(aest_decoder_chain);
+
+static void aest_print(struct aest_event *event)
+{
+	static atomic_t seqno = { 0 };
+	unsigned int curr_seqno;
+	char pfx_seq[AEST_LOG_PREFIX_BUFFER];
+	int index;
+	struct ras_ext_regs *regs;
+
+	curr_seqno = atomic_inc_return(&seqno);
+	snprintf(pfx_seq, sizeof(pfx_seq), "{%u}" HW_ERR, curr_seqno);
+	pr_info("%sHardware error from AEST %s\n", pfx_seq, event->node_name);
+
+	switch (event->type) {
+	case ACPI_AEST_PROCESSOR_ERROR_NODE:
+		pr_err("%s Error from CPU%d\n", pfx_seq, event->id0);
+		break;
+	case ACPI_AEST_MEMORY_ERROR_NODE:
+		pr_err("%s Error from memory at SRAT proximity domain %#x\n",
+		       pfx_seq, event->id0);
+		break;
+	case ACPI_AEST_SMMU_ERROR_NODE:
+		pr_err("%s Error from SMMU IORT node %#x subcomponent %#x\n",
+		       pfx_seq, event->id0, event->id1);
+		break;
+	case ACPI_AEST_VENDOR_ERROR_NODE:
+		pr_err("%s Error from vendor hid %8.8s uid %#x\n", pfx_seq,
+		       event->hid, event->id1);
+		break;
+	case ACPI_AEST_GIC_ERROR_NODE:
+		pr_err("%s Error from GIC type %#x instance %#x\n", pfx_seq,
+		       event->id0, event->id1);
+		break;
+	default:
+		pr_err("%s Unknown AEST node type\n", pfx_seq);
+		return;
+	}
+
+	index = event->index;
+	regs = &event->regs;
+
+	pr_err("%s  ERR%dFR: 0x%llx\n", pfx_seq, index, regs->err_fr);
+	pr_err("%s  ERR%dCTRL: 0x%llx\n", pfx_seq, index, regs->err_ctlr);
+	pr_err("%s  ERR%dSTATUS: 0x%llx\n", pfx_seq, index, regs->err_status);
+	if (regs->err_status & ERR_STATUS_AV)
+		pr_err("%s  ERR%dADDR: 0x%llx\n", pfx_seq, index,
+		       regs->err_addr);
+
+	if (regs->err_status & ERR_STATUS_MV) {
+		pr_err("%s  ERR%dMISC0: 0x%llx\n", pfx_seq, index,
+		       regs->err_misc[0]);
+		pr_err("%s  ERR%dMISC1: 0x%llx\n", pfx_seq, index,
+		       regs->err_misc[1]);
+		pr_err("%s  ERR%dMISC2: 0x%llx\n", pfx_seq, index,
+		       regs->err_misc[2]);
+		pr_err("%s  ERR%dMISC3: 0x%llx\n", pfx_seq, index,
+		       regs->err_misc[3]);
+	}
+}
+
+static void aest_handle_memory_failure(u64 addr)
+{
+	unsigned long pfn;
+
+	pfn = PHYS_PFN(addr);
+
+	if (!pfn_valid(pfn)) {
+		pr_warn(HW_ERR "Invalid physical address: %#llx\n", addr);
+		return;
+	}
+
+#ifdef CONFIG_MEMORY_FAILURE
+	memory_failure(pfn, 0);
+#endif
+}
+
+static void init_aest_event(struct aest_event *event,
+			    struct aest_record *record,
+			    struct ras_ext_regs *regs)
+{
+	struct aest_node *node = record->node;
+	struct acpi_aest_node *info = node->info;
+
+	event->type = node->type;
+	event->node_name = node->name;
+	switch (node->type) {
+	case ACPI_AEST_PROCESSOR_ERROR_NODE:
+		if (info->processor->flags &
+		    (ACPI_AEST_PROC_FLAG_SHARED | ACPI_AEST_PROC_FLAG_GLOBAL))
+			event->id0 = smp_processor_id();
+		else
+			event->id0 = get_cpu_for_acpi_id(
+				info->processor->processor_id);
+
+		event->id1 = info->processor->resource_type;
+		break;
+	case ACPI_AEST_MEMORY_ERROR_NODE:
+		event->id0 = info->memory->srat_proximity_domain;
+		break;
+	case ACPI_AEST_SMMU_ERROR_NODE:
+		event->id0 = info->smmu->iort_node_reference;
+		event->id1 = info->smmu->subcomponent_reference;
+		break;
+	case ACPI_AEST_VENDOR_ERROR_NODE:
+		event->id0 = 0;
+		event->id1 = info->vendor->acpi_uid;
+		event->hid = info->vendor->acpi_hid;
+		break;
+	case ACPI_AEST_GIC_ERROR_NODE:
+		event->id0 = info->gic->interface_type;
+		event->id1 = info->gic->instance_id;
+		break;
+	default:
+		event->id0 = 0;
+		event->id1 = 0;
+	}
+
+	memcpy(&event->regs, regs, sizeof(*regs));
+	event->index = record->index;
+	event->addressing_mode = record->addressing_mode;
+}
+
+static int aest_node_gen_pool_add(struct aest_device *adev,
+				  struct aest_record *record,
+				  struct ras_ext_regs *regs)
+{
+	struct aest_event *event;
+
+	if (!adev->pool)
+		return -EINVAL;
+
+	event = (void *)gen_pool_alloc(adev->pool, sizeof(*event));
+	if (!event)
+		return -ENOMEM;
+
+	init_aest_event(event, record, regs);
+	llist_add(&event->llnode, &adev->event_list);
+
+	return 0;
+}
+
+static void aest_log(struct aest_record *record, struct ras_ext_regs *regs)
+{
+	struct aest_device *adev = record->node->adev;
+
+	if (!aest_node_gen_pool_add(adev, record, regs))
+		schedule_work(&adev->aest_work);
+}
+
+void aest_register_decode_chain(struct notifier_block *nb)
+{
+	blocking_notifier_chain_register(&aest_decoder_chain, nb);
+}
+EXPORT_SYMBOL_GPL(aest_register_decode_chain);
+
+void aest_unregister_decode_chain(struct notifier_block *nb)
+{
+	blocking_notifier_chain_unregister(&aest_decoder_chain, nb);
+}
+EXPORT_SYMBOL_GPL(aest_unregister_decode_chain);
+
+static void aest_node_pool_process(struct work_struct *work)
+{
+	struct llist_node *head;
+	struct aest_event *event;
+	struct aest_device *adev =
+		container_of(work, struct aest_device, aest_work);
+	u64 status, addr;
+
+	head = llist_del_all(&adev->event_list);
+	if (!head)
+		return;
+
+	head = llist_reverse_order(head);
+	llist_for_each_entry(event, head, llnode) {
+		aest_print(event);
+
+		status = event->regs.err_status;
+		if (!(event->regs.err_addr & ERR_ADDR_AI) &&
+		    (status & (ERR_STATUS_UE | ERR_STATUS_DE))) {
+			if (event->addressing_mode == AEST_ADDREESS_SPA)
+				addr = event->regs.err_addr & PHYS_MASK;
+			aest_handle_memory_failure(addr);
+		}
+
+		blocking_notifier_call_chain(&aest_decoder_chain, 0, event);
+		gen_pool_free(adev->pool, (unsigned long)event, sizeof(*event));
+	}
+}
+
+static int aest_node_pool_init(struct aest_device *adev)
+{
+	unsigned long addr, size;
+
+	size = ilog2(sizeof(struct aest_event));
+	adev->pool =
+		devm_gen_pool_create(adev->dev, size, -1, dev_name(adev->dev));
+	if (!adev->pool)
+		return -ENOMEM;
+
+	size = PAGE_ALIGN(size * AEST_NODE_ALLOCED_MAX);
+	addr = (unsigned long)devm_kzalloc(adev->dev, size, GFP_KERNEL);
+	if (!addr)
+		return -ENOMEM;
+
+	return gen_pool_add(adev->pool, addr, size, -1);
+}
+
+static void aest_panic(struct aest_record *record, struct ras_ext_regs *regs,
+		       char *msg)
+{
+	struct aest_event event = { 0 };
+
+	init_aest_event(&event, record, regs);
+
+	aest_print(&event);
+
+	panic(msg);
+}
+
+static void aest_proc_record(struct aest_record *record, void *data)
+{
+	struct ras_ext_regs regs = { 0 };
+	int *count = data;
+	u64 ue;
+
+	regs.err_status = record_read(record, ERXSTATUS);
+	if (!(regs.err_status & ERR_STATUS_V))
+		return;
+
+	(*count)++;
+
+	if (regs.err_status & ERR_STATUS_AV)
+		regs.err_addr = record_read(record, ERXADDR);
+
+	regs.err_fr = record_read(record, ERXFR);
+	regs.err_ctlr = record_read(record, ERXCTLR);
+
+	if (regs.err_status & ERR_STATUS_MV) {
+		regs.err_misc[0] = record_read(record, ERXMISC0);
+		regs.err_misc[1] = record_read(record, ERXMISC1);
+		if (record->node->version >= ID_AA64PFR0_EL1_RAS_V1P1) {
+			regs.err_misc[2] = record_read(record, ERXMISC2);
+			regs.err_misc[3] = record_read(record, ERXMISC3);
+		}
+
+		if (record->node->info->interface_hdr->flags &
+		    AEST_XFACE_FLAG_CLEAR_MISC) {
+			record_write(record, ERXMISC0, 0);
+			record_write(record, ERXMISC1, 0);
+			if (record->node->version >= ID_AA64PFR0_EL1_RAS_V1P1) {
+				record_write(record, ERXMISC2, 0);
+				record_write(record, ERXMISC3, 0);
+			}
+			/* ce count is 0 if record do not support ce */
+		} else if (record->ce.count > 0)
+			record_write(record, ERXMISC0, record->ce.reg_val);
+	}
+
+	/* panic if unrecoverable and uncontainable error encountered */
+	ue = FIELD_GET(ERR_STATUS_UET, regs.err_status);
+	if ((regs.err_status & ERR_STATUS_UE) &&
+	    (ue == ERR_STATUS_UET_UC || ue == ERR_STATUS_UET_UEU))
+		aest_panic(record, &regs,
+			   "AEST: unrecoverable error encountered");
+
+	aest_log(record, &regs);
+
+	/* Write-one-to-clear the bits we've seen */
+	regs.err_status &= ERR_STATUS_W1TC;
+
+	/* Multi bit filed need to write all-ones to clear. */
+	if (regs.err_status & ERR_STATUS_CE)
+		regs.err_status |= ERR_STATUS_CE;
+
+	/* Multi bit filed need to write all-ones to clear. */
+	if (regs.err_status & ERR_STATUS_UET)
+		regs.err_status |= ERR_STATUS_UET;
+
+	record_write(record, ERXSTATUS, regs.err_status);
+}
+
+static void aest_node_foreach_record(void (*func)(struct aest_record *, void *),
+				     struct aest_node *node, void *data,
+				     unsigned long *bitmap)
+{
+	int i;
+
+	for_each_clear_bit(i, bitmap, node->record_count) {
+		aest_select_record(node, i);
+
+		func(&node->records[i], data);
+
+		aest_sync(node);
+	}
+}
+
+static int aest_proc(struct aest_node *node)
+{
+	int count = 0, i, j, size = node->record_count;
+	u64 err_group = 0;
+
+	aest_node_dbg(node, "Poll bitmap %*pb\n", size,
+		      node->record_implemented);
+	aest_node_foreach_record(aest_proc_record, node, &count,
+				 node->record_implemented);
+
+	if (!node->errgsr)
+		return count;
+
+	aest_node_dbg(node, "Report bitmap %*pb\n", size,
+		      node->status_reporting);
+	for (i = 0; i < BITS_TO_U64(size); i++) {
+		err_group = readq_relaxed((void *)node->errgsr + i * 8);
+		aest_node_dbg(node, "errgsr[%d]: 0x%llx\n", i, err_group);
+
+		for_each_set_bit(j, (unsigned long *)&err_group,
+				 BITS_PER_LONG) {
+			/*
+			 * Error group base is only valid in Memory Map node,
+			 * so driver do not need to write select register and
+			 * sync.
+			 */
+			if (test_bit(i * BITS_PER_LONG + j,
+				     node->status_reporting))
+				continue;
+			aest_proc_record(&node->records[j], &count);
+		}
+	}
+
+	return count;
+}
+
+static irqreturn_t aest_irq_func(int irq, void *input)
+{
+	struct aest_device *adev = input;
+	int i;
+
+	for (i = 0; i < adev->node_cnt; i++)
+		aest_proc(&adev->nodes[i]);
+
+	return IRQ_HANDLED;
+}
+
+static int aest_register_irq(struct aest_device *adev)
+{
+	int i, irq, ret;
+	char *irq_desc;
+
+	irq_desc = devm_kasprintf(adev->dev, GFP_KERNEL, "%s.%s.",
+				  dev_driver_string(adev->dev),
+				  dev_name(adev->dev));
+	if (!irq_desc)
+		return -ENOMEM;
+
+	for (i = 0; i < MAX_GSI_PER_NODE; i++) {
+		irq = adev->irq[i];
+
+		if (!irq)
+			continue;
+
+		if (irq_is_percpu_devid(irq)) {
+			ret = request_percpu_irq(irq, aest_irq_func, irq_desc,
+						 adev->adev_oncore);
+			if (ret)
+				goto free;
+		} else {
+			ret = devm_request_irq(adev->dev, irq, aest_irq_func, 0,
+					       irq_desc, adev);
+			if (ret)
+				return ret;
+		}
+	}
+	return 0;
+
+free:
+	for (; i >= 0; i--) {
+		irq = adev->irq[i];
+
+		if (irq_is_percpu_devid(irq))
+			free_percpu_irq(irq, adev->adev_oncore);
+	}
+
+	return ret;
+}
+
+static void aest_enable_irq(struct aest_record *record)
+{
+	u64 err_ctlr;
+	struct aest_device *adev = record->node->adev;
+
+	err_ctlr = record_read(record, ERXCTLR);
+
+	if (adev->irq[ACPI_AEST_NODE_FAULT_HANDLING])
+		err_ctlr |= (ERR_CTLR_FI | ERR_CTLR_CFI);
+	if (adev->irq[ACPI_AEST_NODE_ERROR_RECOVERY])
+		err_ctlr |= ERR_CTLR_UI;
+
+	record_write(record, ERXCTLR, err_ctlr);
+}
+
+static void aest_config_irq(struct aest_node *node)
+{
+	int i;
+	struct acpi_aest_node_interrupt_v2 *interrupt;
+
+	if (!node->irq_config)
+		return;
+
+	for (i = 0; i < node->info->interrupt_count; i++) {
+		interrupt = &node->info->interrupt[i];
+
+		if (interrupt->type == ACPI_AEST_NODE_FAULT_HANDLING)
+			writeq_relaxed(interrupt->gsiv, node->irq_config);
+
+		if (interrupt->type == ACPI_AEST_NODE_ERROR_RECOVERY)
+			writeq_relaxed(interrupt->gsiv, node->irq_config + 8);
+
+		aest_node_dbg(node, "config irq type %d gsiv %d at %llx",
+			      interrupt->type, interrupt->gsiv,
+			      (u64)node->irq_config);
+	}
+}
+
 static enum ras_ce_threshold aest_get_ce_threshold(struct aest_record *record)
 {
 	u64 err_fr, err_fr_cec, err_fr_rp = -1;
@@ -128,6 +564,7 @@ static int aest_init_record(struct aest_record *record, int i,
 	record->index = i;
 	record->node = node;
 	aest_set_ce_threshold(record);
+	aest_enable_irq(record);
 
 	aest_record_dbg(record, "base: %p, index: %d, address mode: %x\n",
 			record->regs_base, record->index,
@@ -232,6 +669,21 @@ static int aest_init_node(struct aest_device *adev, struct aest_node *node,
 		}
 	}
 
+	address = node->info->common->interrupt_config_register_base;
+	if ((flags & AEST_XFACE_FLAG_INT_CONFIG) && address) {
+		if (address - anode->interface_hdr->address < node->group->size)
+			node->irq_config =
+				node->base +
+				(address - anode->interface_hdr->address);
+		else {
+			node->irq_config =
+				devm_ioremap(adev->dev, address, PAGE_SIZE);
+			if (!node->irq_config)
+				return -ENOMEM;
+		}
+	}
+	aest_config_irq(node);
+
 	ret = aest_node_set_errgsr(adev, node);
 	if (ret)
 		return ret;
@@ -276,6 +728,66 @@ static int aest_init_nodes(struct aest_device *adev, struct aest_hnode *ahnode)
 	return 0;
 }
 
+static int __setup_ppi(struct aest_device *adev)
+{
+	int cpu, i;
+	struct aest_device *oncore_adev;
+	struct aest_node *oncore_node;
+	size_t size;
+
+	adev->adev_oncore = &percpu_adev;
+	for_each_possible_cpu(cpu) {
+		oncore_adev = per_cpu_ptr(&percpu_adev, cpu);
+		memcpy(oncore_adev, adev, sizeof(struct aest_device));
+
+		oncore_adev->nodes =
+			devm_kcalloc(adev->dev, oncore_adev->node_cnt,
+				     sizeof(struct aest_node), GFP_KERNEL);
+		if (!oncore_adev->nodes)
+			return -ENOMEM;
+
+		size = adev->node_cnt * sizeof(struct aest_node);
+		memcpy(oncore_adev->nodes, adev->nodes, size);
+		for (i = 0; i < oncore_adev->node_cnt; i++) {
+			oncore_node = &oncore_adev->nodes[i];
+			oncore_node->records = devm_kcalloc(
+				adev->dev, oncore_node->record_count,
+				sizeof(struct aest_record), GFP_KERNEL);
+			if (!oncore_node->records)
+				return -ENOMEM;
+
+			size = oncore_node->record_count *
+			       sizeof(struct aest_record);
+			memcpy(oncore_node->records, adev->nodes[i].records,
+			       size);
+		}
+
+		aest_dev_dbg(adev, "Init device on CPU%d.\n", cpu);
+	}
+
+	return 0;
+}
+
+static int aest_setup_irq(struct platform_device *pdev,
+			  struct aest_device *adev)
+{
+	int fhi_irq, eri_irq;
+
+	fhi_irq = platform_get_irq_byname_optional(pdev, AEST_FHI_NAME);
+	if (fhi_irq > 0)
+		adev->irq[0] = fhi_irq;
+
+	eri_irq = platform_get_irq_byname_optional(pdev, AEST_ERI_NAME);
+	if (eri_irq > 0)
+		adev->irq[1] = eri_irq;
+
+	/* Allocate and initialise the percpu device pointer for PPI */
+	if (irq_is_percpu(fhi_irq) || irq_is_percpu(eri_irq))
+		return __setup_ppi(adev);
+
+	return 0;
+}
+
 static int aest_device_probe(struct platform_device *pdev)
 {
 	int ret;
@@ -289,14 +801,31 @@ static int aest_device_probe(struct platform_device *pdev)
 	adev = devm_kzalloc(&pdev->dev, sizeof(*adev), GFP_KERNEL);
 	if (!adev)
 		return -ENOMEM;
-
 	adev->dev = &pdev->dev;
 	adev->id = pdev->id;
 	aest_set_name(adev, ahnode);
+
+	INIT_WORK(&adev->aest_work, aest_node_pool_process);
+	ret = aest_node_pool_init(adev);
+	if (ret) {
+		aest_dev_err(adev, "Failed init aest node pool.\n");
+		return ret;
+	}
+	init_llist_head(&adev->event_list);
+
 	ret = aest_init_nodes(adev, ahnode);
 	if (ret)
 		return ret;
 
+	ret = aest_setup_irq(pdev, adev);
+	if (ret)
+		return ret;
+
+	ret = aest_register_irq(adev);
+	if (ret) {
+		aest_dev_err(adev, "register irq failed\n");
+		return ret;
+	}
 	platform_set_drvdata(pdev, adev);
 
 	aest_dev_dbg(adev, "Node cnt: %x, id: %x\n", adev->node_cnt, adev->id);
diff --git a/drivers/ras/aest/aest.h b/drivers/ras/aest/aest.h
index 85eeed79bcbe..a5e43b2a2e90 100644
--- a/drivers/ras/aest/aest.h
+++ b/drivers/ras/aest/aest.h
@@ -67,6 +67,34 @@
 
 #define GIC_ERRDEVARCH 0xFFBC
 
+struct aest_event {
+	struct llist_node llnode;
+	char *node_name;
+	u32 type;
+	/*
+	 * Different nodes have different meanings:
+	 *   - Processor node	: processor number.
+	 *   - Memory node	: SRAT proximity domain.
+	 *   - SMMU node	: IORT proximity domain.
+	 *   - GIC node		: interface type.
+	 */
+	u32 id0;
+	/*
+	 * Different nodes have different meanings:
+	 *   - Processor node	: processor resource type.
+	 *   - Memory node	: Non.
+	 *   - SMMU node	: subcomponent reference.
+	 *   - Vendor node	: Unique ID.
+	 *   - GIC node		: instance identifier.
+	 */
+	u32 id1;
+	/* Vendor node	: hardware ID. */
+	char *hid;
+	u32 index;
+	int addressing_mode;
+	struct ras_ext_regs regs;
+};
+
 struct aest_access {
 	u64 (*read)(void *base, u32 offset);
 	void (*write)(void *base, u32 offset, u64 val);
@@ -141,6 +169,7 @@ struct aest_node {
 	void *errgsr;
 	void *base;
 	void *inj;
+	void *irq_config;
 
 	/*
 	 * This bitmap indicates which of the error records within this error
@@ -172,6 +201,7 @@ struct aest_node {
 
 	int record_count;
 	struct aest_record *records;
+	struct aest_node __percpu *oncore_node;
 };
 
 struct aest_device {
@@ -180,6 +210,12 @@ struct aest_device {
 	int node_cnt;
 	struct aest_node *nodes;
 	u32 id;
+	int irq[MAX_GSI_PER_NODE];
+
+	struct work_struct aest_work;
+	struct gen_pool *pool;
+	struct llist_head event_list;
+	struct aest_device __percpu *adev_oncore;
 };
 
 static const char *const aest_node_name[] = {
@@ -283,3 +319,23 @@ static const struct aest_access aest_access[] = {
 	},
 	{ }
 };
+
+/*
+ * Each PE may has multi error record, you must selects an error
+ * record to be accessed through the Error Record System
+ * registers.
+ */
+static inline void aest_select_record(struct aest_node *node, int index)
+{
+	if (node->type == ACPI_AEST_PROCESSOR_ERROR_NODE) {
+		write_sysreg_s(index, SYS_ERRSELR_EL1);
+		isb();
+	}
+}
+
+/* Ensure all writes has taken effect. */
+static inline void aest_sync(struct aest_node *node)
+{
+	if (node->type == ACPI_AEST_PROCESSOR_ERROR_NODE)
+		isb();
+}
diff --git a/include/linux/acpi_aest.h b/include/linux/acpi_aest.h
index a7898c643896..3a899f57f92f 100644
--- a/include/linux/acpi_aest.h
+++ b/include/linux/acpi_aest.h
@@ -10,6 +10,13 @@
 #define AEST_FHI_NAME "AEST:FHI"
 #define AEST_ERI_NAME "AEST:ERI"
 
+/* AEST component */
+#define ACPI_AEST_PROC_FLAG_GLOBAL	(1<<0)
+#define ACPI_AEST_PROC_FLAG_SHARED	(1<<1)
+
+#define AEST_ADDREESS_SPA	0
+#define AEST_ADDREESS_LA	1
+
 /* AEST interrupt */
 #define AEST_INTERRUPT_MODE BIT(0)
 
diff --git a/include/linux/ras.h b/include/linux/ras.h
index 468941bfe855..05096f049dac 100644
--- a/include/linux/ras.h
+++ b/include/linux/ras.h
@@ -63,4 +63,12 @@ amd_convert_umc_mca_addr_to_sys_addr(struct atl_err *err) { return -EINVAL; }
 #define GET_LOGICAL_INDEX(mpidr) -EINVAL
 #endif /* CONFIG_ARM || CONFIG_ARM64 */
 
+#if IS_ENABLED(CONFIG_AEST)
+void aest_register_decode_chain(struct notifier_block *nb);
+void aest_unregister_decode_chain(struct notifier_block *nb);
+#else
+static inline void aest_register_decode_chain(struct notifier_block *nb) {}
+static inline void aest_unregister_decode_chain(struct notifier_block *nb) {}
+#endif /* CONFIG_AEST */
+
 #endif /* __RAS_H__ */
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 09/17] ras: AEST: Add cpuhp callback
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
                   ` (7 preceding siblings ...)
  2025-12-22  9:43 ` [PATCH v4 08/17] ras: AEST: Enable and register IRQs Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 10/17] ras: AEST: Introduce AEST driver sysfs interface Ruidong Tian
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

Move the configuration of interrupts and CE thresholds
into the CPU hotplug callbacks for the per-CPU AEST node.

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 drivers/ras/aest/aest-core.c | 118 ++++++++++++++++++++++++++++++++++-
 drivers/ras/aest/aest.h      |   5 ++
 include/linux/cpuhotplug.h   |   1 +
 3 files changed, 121 insertions(+), 3 deletions(-)

diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
index 5ec0ba38f51b..686dde6f2e68 100644
--- a/drivers/ras/aest/aest-core.c
+++ b/drivers/ras/aest/aest-core.c
@@ -5,6 +5,7 @@
  * Copyright (c) 2025, Alibaba Group.
  */
 
+#include <linux/cpu.h>
 #include <linux/interrupt.h>
 #include <linux/panic.h>
 #include <linux/platform_device.h>
@@ -563,8 +564,6 @@ static int aest_init_record(struct aest_record *record, int i,
 	record->addressing_mode = test_bit(i, node->info->addressing_mode);
 	record->index = i;
 	record->node = node;
-	aest_set_ce_threshold(record);
-	aest_enable_irq(record);
 
 	aest_record_dbg(record, "base: %p, index: %d, address mode: %x\n",
 			record->regs_base, record->index,
@@ -572,9 +571,113 @@ static int aest_init_record(struct aest_record *record, int i,
 	return 0;
 }
 
+static void aest_online_record(struct aest_record *record, void *data)
+{
+	if (record_read(record, ERXFR) & ERR_FR_CE)
+		aest_set_ce_threshold(record);
+
+	aest_enable_irq(record);
+}
+
+static void aest_online_oncore_node(struct aest_node *node)
+{
+	int count;
+
+	count = aest_proc(node);
+	aest_node_dbg(node, "Find %d error on CPU%d before AEST probe\n", count,
+		      smp_processor_id());
+
+	aest_node_foreach_record(aest_online_record, node, NULL,
+				 node->record_implemented);
+
+	aest_node_foreach_record(aest_online_record, node, NULL,
+				 node->status_reporting);
+}
+
+static void aest_online_oncore_dev(void *data)
+{
+	int fhi_irq, eri_irq, i;
+	struct aest_device *adev = this_cpu_ptr(data);
+
+	for (i = 0; i < adev->node_cnt; i++)
+		aest_online_oncore_node(&adev->nodes[i]);
+
+	fhi_irq = adev->irq[ACPI_AEST_NODE_FAULT_HANDLING];
+	if (fhi_irq > 0)
+		enable_percpu_irq(fhi_irq, IRQ_TYPE_NONE);
+	eri_irq = adev->irq[ACPI_AEST_NODE_ERROR_RECOVERY];
+	if (eri_irq > 0)
+		enable_percpu_irq(eri_irq, IRQ_TYPE_NONE);
+}
+
+static void aest_offline_oncore_dev(void *data)
+{
+	int fhi_irq, eri_irq;
+	struct aest_device *adev = this_cpu_ptr(data);
+
+	fhi_irq = adev->irq[ACPI_AEST_NODE_FAULT_HANDLING];
+	if (fhi_irq > 0)
+		disable_percpu_irq(fhi_irq);
+	eri_irq = adev->irq[ACPI_AEST_NODE_ERROR_RECOVERY];
+	if (eri_irq > 0)
+		disable_percpu_irq(eri_irq);
+}
+
+static void aest_online_dev(struct aest_device *adev)
+{
+	int count, i;
+	struct aest_node *node;
+
+	for (i = 0; i < adev->node_cnt; i++) {
+		node = &adev->nodes[i];
+
+		if (!node->name)
+			continue;
+
+		count = aest_proc(node);
+		aest_node_dbg(node, "Find %d error before AEST probe\n", count);
+
+		aest_config_irq(node);
+
+		aest_node_foreach_record(aest_online_record, node, NULL,
+					 node->record_implemented);
+		aest_node_foreach_record(aest_online_record, node, NULL,
+					 node->status_reporting);
+	}
+}
+
+static int aest_starting_cpu(unsigned int cpu)
+{
+	pr_debug("CPU%d starting\n", cpu);
+	aest_online_oncore_dev(&percpu_adev);
+
+	return 0;
+}
+
+static int aest_dying_cpu(unsigned int cpu)
+{
+	pr_debug("CPU%d dying\n", cpu);
+	aest_offline_oncore_dev(&percpu_adev);
+
+	return 0;
+}
+
 static void aest_device_remove(struct platform_device *pdev)
 {
+	struct aest_device *adev = platform_get_drvdata(pdev);
+	int i;
+
 	platform_set_drvdata(pdev, NULL);
+
+	if (adev->type != ACPI_AEST_PROCESSOR_ERROR_NODE)
+		return;
+
+	on_each_cpu(aest_offline_oncore_dev, adev->adev_oncore, 1);
+
+	for (i = 0; i < MAX_GSI_PER_NODE; i++) {
+		if (adev->irq[i])
+			free_percpu_irq(adev->irq[i], adev->adev_oncore);
+	}
 }
 
 static char *alloc_aest_node_name(struct aest_node *node)
@@ -682,7 +785,6 @@ static int aest_init_node(struct aest_device *adev, struct aest_node *node,
 				return -ENOMEM;
 		}
 	}
-	aest_config_irq(node);
 
 	ret = aest_node_set_errgsr(adev, node);
 	if (ret)
@@ -826,6 +928,16 @@ static int aest_device_probe(struct platform_device *pdev)
 		aest_dev_err(adev, "register irq failed\n");
 		return ret;
 	}
+
+	if (aest_dev_is_oncore(adev))
+		ret = cpuhp_setup_state(CPUHP_AP_ARM_AEST_STARTING,
+					"drivers/acpi/arm64/aest:starting",
+					aest_starting_cpu, aest_dying_cpu);
+	else
+		aest_online_dev(adev);
+	if (ret)
+		return ret;
+
 	platform_set_drvdata(pdev, adev);
 
 	aest_dev_dbg(adev, "Node cnt: %x, id: %x\n", adev->node_cnt, adev->id);
diff --git a/drivers/ras/aest/aest.h b/drivers/ras/aest/aest.h
index a5e43b2a2e90..f85e81ff35a6 100644
--- a/drivers/ras/aest/aest.h
+++ b/drivers/ras/aest/aest.h
@@ -339,3 +339,8 @@ static inline void aest_sync(struct aest_node *node)
 	if (node->type == ACPI_AEST_PROCESSOR_ERROR_NODE)
 		isb();
 }
+
+static inline bool aest_dev_is_oncore(struct aest_device *adev)
+{
+	return adev->type == ACPI_AEST_PROCESSOR_ERROR_NODE;
+}
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 62cd7b35a29c..831fe9011943 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -179,6 +179,7 @@ enum cpuhp_state {
 	CPUHP_AP_HYPERV_TIMER_STARTING,
 	/* Must be the last timer callback */
 	CPUHP_AP_DUMMY_TIMER_STARTING,
+	CPUHP_AP_ARM_AEST_STARTING,
 	CPUHP_AP_ARM_XEN_STARTING,
 	CPUHP_AP_ARM_XEN_RUNSTATE_STARTING,
 	CPUHP_AP_ARM_CORESIGHT_STARTING,
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 10/17] ras: AEST: Introduce AEST driver sysfs interface
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
                   ` (8 preceding siblings ...)
  2025-12-22  9:43 ` [PATCH v4 09/17] ras: AEST: Add cpuhp callback Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-23  0:57   ` kernel test robot
  2025-12-22  9:43 ` [PATCH v4 11/17] ras: AEST: Add error count tracking and debugfs interface Ruidong Tian
                   ` (7 subsequent siblings)
  17 siblings, 1 reply; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

Exposes certain AEST driver information to userspace.

Only ROOT can access these interface because it includes
hardware-sensitive information:

  ls /sys/kernel/debug/aest/
  memory<id> smmu<id> ...

  ls /sys/kernel/debug/aest/memory<id>/
  record0 record1 ...

All details at:
        Documentation/ABI/testing/debugfs-aest

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 Documentation/ABI/testing/debugfs-aest |  31 +++++++
 MAINTAINERS                            |   1 +
 drivers/ras/aest/Makefile              |   1 +
 drivers/ras/aest/aest-core.c           |  13 +++
 drivers/ras/aest/aest-sysfs.c          | 118 +++++++++++++++++++++++++
 drivers/ras/aest/aest.h                |   8 ++
 6 files changed, 172 insertions(+)
 create mode 100644 Documentation/ABI/testing/debugfs-aest
 create mode 100644 drivers/ras/aest/aest-sysfs.c

diff --git a/Documentation/ABI/testing/debugfs-aest b/Documentation/ABI/testing/debugfs-aest
new file mode 100644
index 000000000000..1152fc83c3fc
--- /dev/null
+++ b/Documentation/ABI/testing/debugfs-aest
@@ -0,0 +1,31 @@
+What:		/sys/kernel/debug/aest/<name>.<id>/
+Date:		Dec 2025
+KernelVersion	6.19
+Contact:	Ruidong Tian <tianruidong@linux.alibaba.com>
+Description:
+		Directory represented a AEST device, <name> means device type,
+		like:
+
+			processor
+			memory
+			smmu
+			...
+		<id> is the unique ID for this device.
+
+What:		/sys/kernel/debug/aest/<name>.<id>/<node_name>/*
+Date:		Dec 2025
+KernelVersion	6.19
+Contact:	Ruidong Tian <tianruidong@linux.alibaba.com>
+Description:
+		Attibute for aest node which belong this device, the format
+		of node name is: <Node Type>-<Node Address>
+
+		See more at:
+			https://developer.arm.com/documentation/den0085/latest/
+
+What:		/sys/kernel/debug/aest/<name>.<id>/<node_name>/record<index>/err_*
+Date:		Dec 2025
+KernelVersion	6.19
+Contact:	Ruidong Tian <tianruidong@linux.alibaba.com>
+Description:
+		(RO) Read err_* register and return val.
diff --git a/MAINTAINERS b/MAINTAINERS
index fd4c40c4607c..2c148b7ab4b2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -346,6 +346,7 @@ M:	Ruidong Tian <tianruidond@linux.alibaba.com>
 L:	linux-acpi@vger.kernel.org
 L:	linux-arm-kernel@lists.infradead.org
 S:	Supported
+F:	Documentation/ABI/testing/debugfs-aest
 F:	arch/arm64/include/asm/ras.h
 F:	drivers/acpi/arm64/aest.c
 F:	drivers/ras/aest/
diff --git a/drivers/ras/aest/Makefile b/drivers/ras/aest/Makefile
index a6ba7e36fb43..75495413d2b6 100644
--- a/drivers/ras/aest/Makefile
+++ b/drivers/ras/aest/Makefile
@@ -3,3 +3,4 @@
 obj-$(CONFIG_AEST) 	+= aest.o
 
 aest-y		:= aest-core.o
+aest-y		+= aest-sysfs.o
diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
index 686dde6f2e68..3bcc635cf8e4 100644
--- a/drivers/ras/aest/aest-core.c
+++ b/drivers/ras/aest/aest-core.c
@@ -20,6 +20,9 @@ DEFINE_PER_CPU(struct aest_device, percpu_adev);
 #undef pr_fmt
 #define pr_fmt(fmt) "AEST: " fmt
 
+#ifdef CONFIG_DEBUG_FS
+struct dentry *aest_debugfs;
+#endif
 /*
  * This memory pool is only to be used to save AEST node in AEST irq context.
  * There can be 500 AEST node at most.
@@ -940,6 +943,8 @@ static int aest_device_probe(struct platform_device *pdev)
 
 	platform_set_drvdata(pdev, adev);
 
+	aest_dev_init_debugfs(adev);
+
 	aest_dev_dbg(adev, "Node cnt: %x, id: %x\n", adev->node_cnt, adev->id);
 
 	return 0;
@@ -955,12 +960,20 @@ static struct platform_driver aest_driver = {
 
 static int __init aest_init(void)
 {
+#ifdef CONFIG_DEBUG_FS
+	aest_debugfs = debugfs_create_dir("aest", NULL);
+#endif
+
 	return platform_driver_register(&aest_driver);
 }
 module_init(aest_init);
 
 static void __exit aest_exit(void)
 {
+#ifdef CONFIG_DEBUG_FS
+	debugfs_remove(aest_debugfs);
+#endif
+
 	platform_driver_unregister(&aest_driver);
 }
 module_exit(aest_exit);
diff --git a/drivers/ras/aest/aest-sysfs.c b/drivers/ras/aest/aest-sysfs.c
new file mode 100644
index 000000000000..f3b5427ff4f0
--- /dev/null
+++ b/drivers/ras/aest/aest-sysfs.c
@@ -0,0 +1,118 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ARM Error Source Table Support
+ *
+ * Copyright (c) 2025, Alibaba Group.
+ */
+
+#include "aest.h"
+
+/*******************************************************************************
+ *
+ * Attribute for AEST record
+ *
+ ******************************************************************************/
+
+#define DEFINE_AEST_DEBUGFS_ATTR(name, offset) \
+static int name##_get(void *data, u64 *val) \
+{ \
+	struct aest_record *record = data; \
+	*val = record_read(record, offset); \
+	return 0; \
+} \
+static int name##_set(void *data, u64 val) \
+{ \
+	struct aest_record *record = data; \
+	record_write(record, offset, val); \
+	return 0; \
+} \
+DEFINE_DEBUGFS_ATTRIBUTE(name##_ops, name##_get, name##_set, "%#llx\n")
+
+DEFINE_AEST_DEBUGFS_ATTR(err_fr, ERXFR);
+DEFINE_AEST_DEBUGFS_ATTR(err_ctrl, ERXCTLR);
+DEFINE_AEST_DEBUGFS_ATTR(err_status, ERXSTATUS);
+DEFINE_AEST_DEBUGFS_ATTR(err_addr, ERXADDR);
+DEFINE_AEST_DEBUGFS_ATTR(err_misc0, ERXMISC0);
+DEFINE_AEST_DEBUGFS_ATTR(err_misc1, ERXMISC1);
+DEFINE_AEST_DEBUGFS_ATTR(err_misc2, ERXMISC2);
+DEFINE_AEST_DEBUGFS_ATTR(err_misc3, ERXMISC3);
+
+static void aest_record_init_debugfs(struct aest_record *record)
+{
+	debugfs_create_file("err_fr", 0600, record->debugfs, record,
+								&err_fr_ops);
+	debugfs_create_file("err_ctrl", 0600, record->debugfs, record,
+								&err_ctrl_ops);
+	debugfs_create_file("err_status", 0600, record->debugfs, record,
+								&err_status_ops);
+	debugfs_create_file("err_addr", 0600, record->debugfs, record,
+								&err_addr_ops);
+	debugfs_create_file("err_misc0", 0600, record->debugfs, record,
+								&err_misc0_ops);
+	debugfs_create_file("err_misc1", 0600, record->debugfs, record,
+								&err_misc1_ops);
+	debugfs_create_file("err_misc2", 0600, record->debugfs, record,
+								&err_misc2_ops);
+	debugfs_create_file("err_misc3", 0600, record->debugfs, record,
+								&err_misc3_ops);
+}
+
+static void
+aest_node_init_debugfs(struct aest_node *node)
+{
+	int i;
+	struct aest_record *record;
+
+	for (i = 0; i < node->record_count; i++) {
+		record = &node->records[i];
+		if (!record->name)
+			continue;
+		record->debugfs = debugfs_create_dir(record->name,
+								node->debugfs);
+		aest_record_init_debugfs(record);
+	}
+}
+
+static void
+aest_oncore_dev_init_debugfs(struct aest_device *adev)
+{
+	int cpu, i;
+	struct aest_node *node;
+	struct aest_device *percpu_dev;
+	char name[16];
+
+	for_each_possible_cpu(cpu) {
+		percpu_dev = this_cpu_ptr(adev->adev_oncore);
+
+		snprintf(name, sizeof(name), "processor%u", cpu);
+		percpu_dev->debugfs = debugfs_create_dir(name, aest_debugfs);
+
+		for (i = 0; i < adev->node_cnt; i++) {
+			node = &adev->nodes[i];
+
+			node->debugfs = debugfs_create_dir(node->name,
+							percpu_dev->debugfs);
+			aest_node_init_debugfs(node);
+		}
+	}
+}
+
+void aest_dev_init_debugfs(struct aest_device *adev)
+{
+	int i;
+	struct aest_node *node;
+
+	adev->debugfs = debugfs_create_dir(dev_name(adev->dev), aest_debugfs);
+	if (aest_dev_is_oncore(adev)) {
+		aest_oncore_dev_init_debugfs(adev);
+		return;
+	}
+
+	for (i = 0; i < adev->node_cnt; i++) {
+		node = &adev->nodes[i];
+		if (!node->name)
+			continue;
+		node->debugfs = debugfs_create_dir(node->name, adev->debugfs);
+		aest_node_init_debugfs(node);
+	}
+}
diff --git a/drivers/ras/aest/aest.h b/drivers/ras/aest/aest.h
index f85e81ff35a6..ceb9e32bcee3 100644
--- a/drivers/ras/aest/aest.h
+++ b/drivers/ras/aest/aest.h
@@ -7,6 +7,7 @@
 
 #include <linux/acpi_aest.h>
 #include <asm/ras.h>
+#include <linux/debugfs.h>
 
 #define MAX_GSI_PER_NODE 2
 #define DEFAULT_CE_THRESHOLD 1
@@ -67,6 +68,8 @@
 
 #define GIC_ERRDEVARCH 0xFFBC
 
+extern struct dentry *aest_debugfs;
+
 struct aest_event {
 	struct llist_node llnode;
 	char *node_name;
@@ -133,6 +136,7 @@ struct aest_record {
 
 	struct ce_threshold ce;
 	enum ras_ce_threshold threshold_type;
+	struct dentry *debugfs;
 };
 
 struct aest_group {
@@ -201,6 +205,7 @@ struct aest_node {
 
 	int record_count;
 	struct aest_record *records;
+	struct dentry *debugfs;
 	struct aest_node __percpu *oncore_node;
 };
 
@@ -215,6 +220,7 @@ struct aest_device {
 	struct work_struct aest_work;
 	struct gen_pool *pool;
 	struct llist_head event_list;
+	struct dentry *debugfs;
 	struct aest_device __percpu *adev_oncore;
 };
 
@@ -344,3 +350,5 @@ static inline bool aest_dev_is_oncore(struct aest_device *adev)
 {
 	return adev->type == ACPI_AEST_PROCESSOR_ERROR_NODE;
 }
+
+void aest_dev_init_debugfs(struct aest_device *adev);
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 11/17] ras: AEST: Add error count tracking and debugfs interface
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
                   ` (9 preceding siblings ...)
  2025-12-22  9:43 ` [PATCH v4 10/17] ras: AEST: Introduce AEST driver sysfs interface Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 12/17] ras: AEST: Allow configuring CE threshold via debugfs Ruidong Tian
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

This commit introduces error counting functionality for AEST records.
Previously, error statistics were not directly available for individual
error records or AEST nodes.

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 Documentation/ABI/testing/debugfs-aest | 14 ++++++
 drivers/ras/aest/aest-core.c           | 21 +++++++++
 drivers/ras/aest/aest-sysfs.c          | 64 ++++++++++++++++++++++++++
 drivers/ras/aest/aest.h                | 10 ++++
 4 files changed, 109 insertions(+)

diff --git a/Documentation/ABI/testing/debugfs-aest b/Documentation/ABI/testing/debugfs-aest
index 1152fc83c3fc..a984fcedede2 100644
--- a/Documentation/ABI/testing/debugfs-aest
+++ b/Documentation/ABI/testing/debugfs-aest
@@ -23,9 +23,23 @@ Description:
 		See more at:
 			https://developer.arm.com/documentation/den0085/latest/
 
+What:		/sys/kernel/debug/aest/<name>.<id>/<node_name>/err_count
+Date:		Dec 2025
+KernelVersion	6.19
+Contact:	Ruidong Tian <tianruidong@linux.alibaba.com>
+Description:
+		(RO) Outputs error statistics for all error records of this node.
+
 What:		/sys/kernel/debug/aest/<name>.<id>/<node_name>/record<index>/err_*
 Date:		Dec 2025
 KernelVersion	6.19
 Contact:	Ruidong Tian <tianruidong@linux.alibaba.com>
 Description:
 		(RO) Read err_* register and return val.
+
+What:		/sys/kernel/debug/aest/<name>.<id>/<node_name>/record<index>/err_count
+Date:		Dec 2025
+KernelVersion	6.19
+Contact:	Ruidong Tian <tianruidong@linux.alibaba.com>
+Description:
+		(RO) Outputs error statistics for all this records.
diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
index 3bcc635cf8e4..75cca98024ad 100644
--- a/drivers/ras/aest/aest-core.c
+++ b/drivers/ras/aest/aest-core.c
@@ -170,6 +170,27 @@ static int aest_node_gen_pool_add(struct aest_device *adev,
 	init_aest_event(event, record, regs);
 	llist_add(&event->llnode, &adev->event_list);
 
+	if (regs->err_status & ERR_STATUS_CE)
+		record->count.ce++;
+	if (regs->err_status & ERR_STATUS_DE)
+		record->count.de++;
+	if (regs->err_status & ERR_STATUS_UE) {
+		switch (regs->err_status & ERR_STATUS_UET) {
+		case ERR_STATUS_UET_UC:
+			record->count.uc++;
+			break;
+		case ERR_STATUS_UET_UEU:
+			record->count.ueu++;
+			break;
+		case ERR_STATUS_UET_UER:
+			record->count.uer++;
+			break;
+		case ERR_STATUS_UET_UEO:
+			record->count.ueo++;
+			break;
+		}
+	}
+
 	return 0;
 }
 
diff --git a/drivers/ras/aest/aest-sysfs.c b/drivers/ras/aest/aest-sysfs.c
index f3b5427ff4f0..b54e879506aa 100644
--- a/drivers/ras/aest/aest-sysfs.c
+++ b/drivers/ras/aest/aest-sysfs.c
@@ -7,6 +7,46 @@
 
 #include "aest.h"
 
+static void
+aest_error_count(struct aest_record *record, void *data)
+{
+	struct record_count *count = data;
+
+	count->ce += record->count.ce;
+	count->de += record->count.de;
+	count->uc += record->count.uc;
+	count->ueu += record->count.ueu;
+	count->uer += record->count.uer;
+	count->ueo += record->count.ueo;
+}
+
+/*******************************************************************************
+ *
+ * Debugfs for AEST node
+ *
+ ******************************************************************************/
+
+static int aest_node_err_count_show(struct seq_file *m, void *data)
+{
+	struct aest_node *node = m->private;
+	struct record_count count = { 0 };
+	int i;
+
+	for (i = 0; i < node->record_count; i++)
+		aest_error_count(&node->records[i], &count);
+
+	seq_printf(m, "CE: %llu\n"
+				"DE: %llu\n"
+				"UC: %llu\n"
+				"UEU: %llu\n"
+				"UEO: %llu\n"
+				"UER: %llu\n",
+				count.ce, count.de, count.uc, count.ueu,
+				count.uer, count.ueo);
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(aest_node_err_count);
+
 /*******************************************************************************
  *
  * Attribute for AEST record
@@ -37,6 +77,25 @@ DEFINE_AEST_DEBUGFS_ATTR(err_misc1, ERXMISC1);
 DEFINE_AEST_DEBUGFS_ATTR(err_misc2, ERXMISC2);
 DEFINE_AEST_DEBUGFS_ATTR(err_misc3, ERXMISC3);
 
+static int aest_record_err_count_show(struct seq_file *m, void *data)
+{
+	struct aest_record *record = m->private;
+	struct record_count count = { 0 };
+
+	aest_error_count(record, &count);
+
+	seq_printf(m, "CE: %llu\n"
+				"DE: %llu\n"
+				"UC: %llu\n"
+				"UEU: %llu\n"
+				"UEO: %llu\n"
+				"UER: %llu\n",
+				count.ce, count.de, count.uc, count.ueu,
+				count.uer, count.ueo);
+	return 0;
+}
+DEFINE_SHOW_ATTRIBUTE(aest_record_err_count);
+
 static void aest_record_init_debugfs(struct aest_record *record)
 {
 	debugfs_create_file("err_fr", 0600, record->debugfs, record,
@@ -55,6 +114,8 @@ static void aest_record_init_debugfs(struct aest_record *record)
 								&err_misc2_ops);
 	debugfs_create_file("err_misc3", 0600, record->debugfs, record,
 								&err_misc3_ops);
+	debugfs_create_file("err_count", 0400, record->debugfs, record,
+						&aest_record_err_count_fops);
 }
 
 static void
@@ -63,6 +124,9 @@ aest_node_init_debugfs(struct aest_node *node)
 	int i;
 	struct aest_record *record;
 
+	debugfs_create_file("err_count", 0400, node->debugfs, node,
+					&aest_node_err_count_fops);
+
 	for (i = 0; i < node->record_count; i++) {
 		record = &node->records[i];
 		if (!record->name)
diff --git a/drivers/ras/aest/aest.h b/drivers/ras/aest/aest.h
index ceb9e32bcee3..802430857dc4 100644
--- a/drivers/ras/aest/aest.h
+++ b/drivers/ras/aest/aest.h
@@ -116,6 +116,15 @@ struct ce_threshold {
 	u64 reg_val;
 };
 
+struct record_count {
+	u64 ce;
+	u64 de;
+	u64 uc;
+	u64 uer;
+	u64 ueo;
+	u64 ueu;
+};
+
 struct aest_record {
 	char *name;
 	int index;
@@ -136,6 +145,7 @@ struct aest_record {
 
 	struct ce_threshold ce;
 	enum ras_ce_threshold threshold_type;
+	struct record_count count;
 	struct dentry *debugfs;
 };
 
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 12/17] ras: AEST: Allow configuring CE threshold via debugfs
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
                   ` (10 preceding siblings ...)
  2025-12-22  9:43 ` [PATCH v4 11/17] ras: AEST: Add error count tracking and debugfs interface Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 13/17] ras: AEST: Introduce AEST inject interface to test AEST driver Ruidong Tian
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

This commit introduces the ability to configure the Corrected Error (CE)
threshold for AEST records through debugfs. This allows administrators to
dynamically adjust the CE threshold for error reporting.

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 Documentation/ABI/testing/debugfs-aest | 16 ++++++++++
 drivers/ras/aest/aest-sysfs.c          | 42 ++++++++++++++++++++++++++
 2 files changed, 58 insertions(+)

diff --git a/Documentation/ABI/testing/debugfs-aest b/Documentation/ABI/testing/debugfs-aest
index a984fcedede2..76ba1b77b274 100644
--- a/Documentation/ABI/testing/debugfs-aest
+++ b/Documentation/ABI/testing/debugfs-aest
@@ -23,6 +23,14 @@ Description:
 		See more at:
 			https://developer.arm.com/documentation/den0085/latest/
 
+What:		/sys/kernel/debug/aest/<name>.<id>/<node_name>/ce_threshold
+Date:		Dec 2025
+KernelVersion	6.19
+Contact:	Ruidong Tian <tianruidong@linux.alibaba.com>
+Description:
+		(WO) Write the ce threshold to all records of this node. Failed
+		if input exceeded the maximum threshold
+
 What:		/sys/kernel/debug/aest/<name>.<id>/<node_name>/err_count
 Date:		Dec 2025
 KernelVersion	6.19
@@ -37,6 +45,14 @@ Contact:	Ruidong Tian <tianruidong@linux.alibaba.com>
 Description:
 		(RO) Read err_* register and return val.
 
+What:		/sys/kernel/debug/aest/<name>.<id>/<node_name>/record<index>/ce_threshold
+Date:		Dec 2025
+KernelVersion	6.19
+Contact:	Ruidong Tian <tianruidong@linux.alibaba.com>
+Description:
+		(RW) Read and write the ce threshold to this record. Failed
+		if input exceeded the maximum threshold
+
 What:		/sys/kernel/debug/aest/<name>.<id>/<node_name>/record<index>/err_count
 Date:		Dec 2025
 KernelVersion	6.19
diff --git a/drivers/ras/aest/aest-sysfs.c b/drivers/ras/aest/aest-sysfs.c
index b54e879506aa..392e7ad8328e 100644
--- a/drivers/ras/aest/aest-sysfs.c
+++ b/drivers/ras/aest/aest-sysfs.c
@@ -7,6 +7,25 @@
 
 #include "aest.h"
 
+static void
+aest_store_threshold(struct aest_record *record, void *data)
+{
+	u64 err_misc0, *threshold = data;
+	struct ce_threshold *ce = &record->ce;
+
+	if (*threshold > ce->info->max_count)
+		return;
+
+	ce->threshold = *threshold;
+	ce->count = ce->info->max_count - ce->threshold + 1;
+
+	err_misc0 = record_read(record, ERXMISC0);
+	ce->reg_val = (err_misc0 & ~ce->info->mask) |
+			(ce->count << ce->info->shift);
+
+	record_write(record, ERXMISC0, ce->reg_val);
+}
+
 static void
 aest_error_count(struct aest_record *record, void *data)
 {
@@ -77,6 +96,27 @@ DEFINE_AEST_DEBUGFS_ATTR(err_misc1, ERXMISC1);
 DEFINE_AEST_DEBUGFS_ATTR(err_misc2, ERXMISC2);
 DEFINE_AEST_DEBUGFS_ATTR(err_misc3, ERXMISC3);
 
+static int record_ce_threshold_get(void *data, u64 *val)
+{
+	struct aest_record *record = data;
+
+	*val = record->ce.threshold;
+	return 0;
+}
+
+static int record_ce_threshold_set(void *data, u64 val)
+{
+	u64 threshold = val;
+	struct aest_record *record = data;
+
+	aest_store_threshold(record, &threshold);
+
+	return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(record_ce_threshold_ops, record_ce_threshold_get,
+					record_ce_threshold_set, "%llu\n");
+
 static int aest_record_err_count_show(struct seq_file *m, void *data)
 {
 	struct aest_record *record = m->private;
@@ -116,6 +156,8 @@ static void aest_record_init_debugfs(struct aest_record *record)
 								&err_misc3_ops);
 	debugfs_create_file("err_count", 0400, record->debugfs, record,
 						&aest_record_err_count_fops);
+	debugfs_create_file("ce_threshold", 0600, record->debugfs, record,
+						&record_ce_threshold_ops);
 }
 
 static void
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 13/17] ras: AEST: Introduce AEST inject interface to test AEST driver
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
                   ` (11 preceding siblings ...)
  2025-12-22  9:43 ` [PATCH v4 12/17] ras: AEST: Allow configuring CE threshold via debugfs Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 14/17] ras: ATL: Unified ATL interface for ARM64 and AMD Ruidong Tian
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

AEST offers both soft and hard injection. Soft injection simulates errors
in software, providing flexibility to define the error register content.
Hard injection, on the other hand, utilizes error injection registers to
introduce hardware faults, strictly requiring values that adhere to their
specifications.

Read Documentation/ABI/testing/debugfs-aest to learn how to use them.

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 Documentation/ABI/testing/debugfs-aest |  37 +++++++
 drivers/ras/aest/Makefile              |   1 +
 drivers/ras/aest/aest-core.c           |  24 +++--
 drivers/ras/aest/aest-inject.c         | 131 +++++++++++++++++++++++++
 drivers/ras/aest/aest-sysfs.c          |   8 +-
 drivers/ras/aest/aest.h                |   2 +
 6 files changed, 193 insertions(+), 10 deletions(-)
 create mode 100644 drivers/ras/aest/aest-inject.c

diff --git a/Documentation/ABI/testing/debugfs-aest b/Documentation/ABI/testing/debugfs-aest
index 76ba1b77b274..bd7742a36321 100644
--- a/Documentation/ABI/testing/debugfs-aest
+++ b/Documentation/ABI/testing/debugfs-aest
@@ -59,3 +59,40 @@ KernelVersion	6.19
 Contact:	Ruidong Tian <tianruidong@linux.alibaba.com>
 Description:
 		(RO) Outputs error statistics for all this records.
+
+What:		/sys/kernel/debug/aest/<name>.<id>/<node_name>/record<index>/inject/err_*
+Date:		Dec 2025
+KernelVersion	6.19
+Contact:	Ruidong Tian <tianruidong@linux.alibaba.com>
+Description:
+		(RW) These registers are used to simulate soft injection errors
+		by holding error register values. You can write any values
+		to them. To trigger the injection, you need to write soft_inject
+		at last. The validity of the injected error depends on the
+		value written to err_status.
+
+		Accepts values -  any.
+
+What:		/sys/kernel/debug/aest/<name>.<id>/<node_name>/record<index>/inject/soft_inject
+Date:		Dec 2025
+KernelVersion	6.19
+Contact:	Ruidong Tian <tianruidong@linux.alibaba.com>
+Description:
+		(WO) Write any value to this file to trigger the error
+		injection. Make sure you have specified all necessary error
+		parameters, i.e. this write should be the last step when
+		injecting errors.
+
+		Accepts values -  any.
+
+What:		/sys/kernel/debug/aest/<name>.<id>/<node_name>/record<index>/inject/hard_inject
+Date:		Dec 2025
+KernelVersion	6.19
+Contact:	Ruidong Tian <tianruidong@linux.alibaba.com>
+Description:
+		(WO) If the AEST table provides error injection registers,
+		you can write to them via this interface. For instance,
+		values can be written to the ERXPFGCTL register. The post-injection
+		behavior is then determined by the hardware specification.
+
+		Accepts values - any.
diff --git a/drivers/ras/aest/Makefile b/drivers/ras/aest/Makefile
index 75495413d2b6..5ee10fc8b2e9 100644
--- a/drivers/ras/aest/Makefile
+++ b/drivers/ras/aest/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_AEST) 	+= aest.o
 
 aest-y		:= aest-core.o
 aest-y		+= aest-sysfs.o
+aest-y		+= aest-inject.o
diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
index 75cca98024ad..a290b482bf8b 100644
--- a/drivers/ras/aest/aest-core.c
+++ b/drivers/ras/aest/aest-core.c
@@ -273,7 +273,7 @@ static void aest_panic(struct aest_record *record, struct ras_ext_regs *regs,
 	panic(msg);
 }
 
-static void aest_proc_record(struct aest_record *record, void *data)
+void aest_proc_record(struct aest_record *record, void *data, bool fake)
 {
 	struct ras_ext_regs regs = { 0 };
 	int *count = data;
@@ -315,9 +315,15 @@ static void aest_proc_record(struct aest_record *record, void *data)
 	/* panic if unrecoverable and uncontainable error encountered */
 	ue = FIELD_GET(ERR_STATUS_UET, regs.err_status);
 	if ((regs.err_status & ERR_STATUS_UE) &&
-	    (ue == ERR_STATUS_UET_UC || ue == ERR_STATUS_UET_UEU))
-		aest_panic(record, &regs,
-			   "AEST: unrecoverable error encountered");
+	    (ue == ERR_STATUS_UET_UC || ue == ERR_STATUS_UET_UEU)) {
+		if (fake)
+			aest_record_info(
+				record,
+				"Simulated error! Skip panic due to fault injection\n");
+		else
+			aest_panic(record, &regs,
+				   "AEST: unrecoverable error encountered");
+	}
 
 	aest_log(record, &regs);
 
@@ -335,7 +341,8 @@ static void aest_proc_record(struct aest_record *record, void *data)
 	record_write(record, ERXSTATUS, regs.err_status);
 }
 
-static void aest_node_foreach_record(void (*func)(struct aest_record *, void *),
+static void aest_node_foreach_record(void (*func)(struct aest_record *, void *,
+						  bool),
 				     struct aest_node *node, void *data,
 				     unsigned long *bitmap)
 {
@@ -344,7 +351,7 @@ static void aest_node_foreach_record(void (*func)(struct aest_record *, void *),
 	for_each_clear_bit(i, bitmap, node->record_count) {
 		aest_select_record(node, i);
 
-		func(&node->records[i], data);
+		func(&node->records[i], data, false);
 
 		aest_sync(node);
 	}
@@ -379,7 +386,7 @@ static int aest_proc(struct aest_node *node)
 			if (test_bit(i * BITS_PER_LONG + j,
 				     node->status_reporting))
 				continue;
-			aest_proc_record(&node->records[j], &count);
+			aest_proc_record(&node->records[j], &count, false);
 		}
 	}
 
@@ -595,7 +602,8 @@ static int aest_init_record(struct aest_record *record, int i,
 	return 0;
 }
 
-static void aest_online_record(struct aest_record *record, void *data)
+static void aest_online_record(struct aest_record *record, void *data,
+			       bool __unused)
 {
 	if (record_read(record, ERXFR) & ERR_FR_CE)
 		aest_set_ce_threshold(record);
diff --git a/drivers/ras/aest/aest-inject.c b/drivers/ras/aest/aest-inject.c
new file mode 100644
index 000000000000..fe6ccac8338e
--- /dev/null
+++ b/drivers/ras/aest/aest-inject.c
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ARM Error Source Table Support
+ *
+ * Copyright (c) 2024, Alibaba Group.
+ */
+
+#include "aest.h"
+
+static struct ras_ext_regs regs_inj;
+
+struct inj_attr {
+	struct attribute attr;
+	ssize_t (*show)(struct aest_node *n, struct inj_attr *a, char *b);
+	ssize_t (*store)(struct aest_node *n, struct inj_attr *a, const char *b,
+				size_t c);
+};
+
+struct aest_inject {
+	struct aest_node *node;
+	struct kobject kobj;
+};
+
+#define to_inj(k)	container_of(k, struct aest_inject, kobj)
+#define to_inj_attr(a)	container_of(a, struct inj_attr, attr)
+
+static u64 aest_sysreg_read_inject(void *__unused, u32 offset)
+{
+	u64 *p = (u64 *)&regs_inj;
+
+	return p[offset/8];
+}
+
+static void aest_sysreg_write_inject(void *base, u32 offset, u64 val)
+{
+	u64 *p = (u64 *)&regs_inj;
+
+	p[offset/8] = val;
+}
+
+static u64 aest_iomem_read_inject(void *base, u32 offset)
+{
+	u64 *p = (u64 *)&regs_inj;
+
+	return p[offset/8];
+}
+
+static void aest_iomem_write_inject(void *base, u32 offset, u64 val)
+{
+	u64 *p = (u64 *)&regs_inj;
+
+	p[offset/8] = val;
+}
+
+static struct aest_access aest_access_inject[] = {
+	[ACPI_AEST_NODE_SYSTEM_REGISTER] = {
+		.read = aest_sysreg_read_inject,
+		.write = aest_sysreg_write_inject,
+	},
+
+	[ACPI_AEST_NODE_MEMORY_MAPPED] = {
+		.read = aest_iomem_read_inject,
+		.write = aest_iomem_write_inject,
+	},
+	[ACPI_AEST_NODE_SINGLE_RECORD_MEMORY_MAPPED] = {
+		.read = aest_iomem_read_inject,
+		.write = aest_iomem_write_inject,
+	},
+	{ }
+};
+
+static int soft_inject_store(void *data, u64 val)
+{
+	int count = 0;
+	struct aest_record record_inj, *record = data;
+	struct aest_node node_inj, *node = record->node;
+
+	memcpy(&node_inj, node, sizeof(*node));
+	node_inj.name = "AEST-injection";
+
+	record_inj.access = &aest_access_inject[node->info->interface_hdr->type];
+	record_inj.node = &node_inj;
+	record_inj.index = record->index;
+
+	regs_inj.err_status |= ERR_STATUS_V;
+
+	aest_proc_record(&record_inj, &count, true);
+
+	if (count != 1)
+		return -EIO;
+
+	return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(soft_inject_ops, NULL, soft_inject_store, "%llu\n");
+
+static int hard_inject_store(void *data, u64 val)
+{
+	struct aest_record *record = data;
+	struct aest_node *node = record->node;
+
+	if (!node->inj)
+		return -EPERM;
+
+	aest_select_record(node, record->index);
+	record_write(record, ERXPFGCTL, val);
+	record_write(record, ERXPFGCDN, 0x100);
+	aest_sync(node);
+
+	return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(hard_inject_ops, NULL, hard_inject_store, "%llu\n");
+
+void aest_inject_init_debugfs(struct aest_record *record)
+{
+	struct dentry *inj;
+
+	inj = debugfs_create_dir("inject", record->debugfs);
+
+	debugfs_create_u64("err_fr", 0600, inj, &regs_inj.err_fr);
+	debugfs_create_u64("err_ctrl", 0600, inj, &regs_inj.err_ctlr);
+	debugfs_create_u64("err_status", 0600, inj, &regs_inj.err_status);
+	debugfs_create_u64("err_addr", 0600, inj, &regs_inj.err_addr);
+	debugfs_create_u64("err_misc0", 0600, inj, &regs_inj.err_misc[0]);
+	debugfs_create_u64("err_misc1", 0600, inj, &regs_inj.err_misc[1]);
+	debugfs_create_u64("err_misc2", 0600, inj, &regs_inj.err_misc[2]);
+	debugfs_create_u64("err_misc3", 0600, inj, &regs_inj.err_misc[3]);
+	debugfs_create_file("soft_inject", 0400, inj, record, &soft_inject_ops);
+
+	if (record->node->inj)
+		debugfs_create_file("hard_inject", 0400, inj, record, &hard_inject_ops);
+}
diff --git a/drivers/ras/aest/aest-sysfs.c b/drivers/ras/aest/aest-sysfs.c
index 392e7ad8328e..66e9c1103f99 100644
--- a/drivers/ras/aest/aest-sysfs.c
+++ b/drivers/ras/aest/aest-sysfs.c
@@ -158,6 +158,7 @@ static void aest_record_init_debugfs(struct aest_record *record)
 						&aest_record_err_count_fops);
 	debugfs_create_file("ce_threshold", 0600, record->debugfs, record,
 						&record_ce_threshold_ops);
+	aest_inject_init_debugfs(record);
 }
 
 static void
@@ -190,8 +191,8 @@ aest_oncore_dev_init_debugfs(struct aest_device *adev)
 	for_each_possible_cpu(cpu) {
 		percpu_dev = this_cpu_ptr(adev->adev_oncore);
 
-		snprintf(name, sizeof(name), "processor%u", cpu);
-		percpu_dev->debugfs = debugfs_create_dir(name, aest_debugfs);
+		snprintf(name, sizeof(name), "processor%u%u", cpu);
+		percpu_dev->debugfs = debugfs_create_dir(name, adev->debugfs);
 
 		for (i = 0; i < adev->node_cnt; i++) {
 			node = &adev->nodes[i];
@@ -208,6 +209,9 @@ void aest_dev_init_debugfs(struct aest_device *adev)
 	int i;
 	struct aest_node *node;
 
+	if (!aest_debugfs)
+		dev_err(adev->dev, "debugfs not enabled\n");
+
 	adev->debugfs = debugfs_create_dir(dev_name(adev->dev), aest_debugfs);
 	if (aest_dev_is_oncore(adev)) {
 		aest_oncore_dev_init_debugfs(adev);
diff --git a/drivers/ras/aest/aest.h b/drivers/ras/aest/aest.h
index 802430857dc4..2f6a7b9ca4ef 100644
--- a/drivers/ras/aest/aest.h
+++ b/drivers/ras/aest/aest.h
@@ -362,3 +362,5 @@ static inline bool aest_dev_is_oncore(struct aest_device *adev)
 }
 
 void aest_dev_init_debugfs(struct aest_device *adev);
+void aest_inject_init_debugfs(struct aest_record *record);
+void aest_proc_record(struct aest_record *record, void *data, bool fake);
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 14/17] ras: ATL: Unified ATL interface for ARM64 and AMD
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
                   ` (12 preceding siblings ...)
  2025-12-22  9:43 ` [PATCH v4 13/17] ras: AEST: Introduce AEST inject interface to test AEST driver Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 14/17] ras: ATL: Unify " Ruidong Tian
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

Translate device normalize address in AMD, also named logical address,
to system physical address is a common interface in RAS. Provides common
interface both for AMD and ARM.

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 drivers/edac/amd64_edac.c      |  2 +-
 drivers/ras/aest/aest-core.c   |  3 +++
 drivers/ras/amd/atl/core.c     |  4 ++--
 drivers/ras/amd/atl/internal.h |  2 +-
 drivers/ras/amd/atl/umc.c      |  3 ++-
 drivers/ras/ras.c              | 24 +++++++++++-------------
 include/linux/ras.h            |  9 ++++-----
 7 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 2391f3469961..478cfef37892 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2847,7 +2847,7 @@ static void decode_umc_error(int node_id, struct mce *m)
 	a_err.ipid = m->ipid;
 	a_err.cpu  = m->extcpu;
 
-	sys_addr = amd_convert_umc_mca_addr_to_sys_addr(&a_err);
+	sys_addr = convert_ras_la_to_spa(&a_err);
 	if (IS_ERR_VALUE(sys_addr)) {
 		err.err_code = ERR_NORM_ADDR;
 		goto log_error;
diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
index a290b482bf8b..052211ca3e2a 100644
--- a/drivers/ras/aest/aest-core.c
+++ b/drivers/ras/aest/aest-core.c
@@ -235,6 +235,9 @@ static void aest_node_pool_process(struct work_struct *work)
 		    (status & (ERR_STATUS_UE | ERR_STATUS_DE))) {
 			if (event->addressing_mode == AEST_ADDREESS_SPA)
 				addr = event->regs.err_addr & PHYS_MASK;
+			else
+				addr = convert_ras_la_to_spa(event);
+
 			aest_handle_memory_failure(addr);
 		}
 
diff --git a/drivers/ras/amd/atl/core.c b/drivers/ras/amd/atl/core.c
index 0f7cd6dab0b0..4f44c0ce97ec 100644
--- a/drivers/ras/amd/atl/core.c
+++ b/drivers/ras/amd/atl/core.c
@@ -210,7 +210,7 @@ static int __init amd_atl_init(void)
 
 	/* Increment this module's recount so that it can't be easily unloaded. */
 	__module_get(THIS_MODULE);
-	amd_atl_register_decoder(convert_umc_mca_addr_to_sys_addr);
+	atl_register_decoder(convert_umc_mca_addr_to_sys_addr);
 
 	pr_info("AMD Address Translation Library initialized\n");
 	return 0;
@@ -222,7 +222,7 @@ static int __init amd_atl_init(void)
  */
 static void __exit amd_atl_exit(void)
 {
-	amd_atl_unregister_decoder();
+	atl_unregister_decoder();
 }
 
 module_init(amd_atl_init);
diff --git a/drivers/ras/amd/atl/internal.h b/drivers/ras/amd/atl/internal.h
index 82a56d9c2be1..423a6193fdc7 100644
--- a/drivers/ras/amd/atl/internal.h
+++ b/drivers/ras/amd/atl/internal.h
@@ -279,7 +279,7 @@ int denormalize_address(struct addr_ctx *ctx);
 int dehash_address(struct addr_ctx *ctx);
 
 unsigned long norm_to_sys_addr(u8 socket_id, u8 die_id, u8 coh_st_inst_id, unsigned long addr);
-unsigned long convert_umc_mca_addr_to_sys_addr(struct atl_err *err);
+unsigned long convert_umc_mca_addr_to_sys_addr(void *data);
 
 u64 add_base_and_hole(struct addr_ctx *ctx, u64 addr);
 u64 remove_base_and_hole(struct addr_ctx *ctx, u64 addr);
diff --git a/drivers/ras/amd/atl/umc.c b/drivers/ras/amd/atl/umc.c
index befc616d5e8a..57a78c380467 100644
--- a/drivers/ras/amd/atl/umc.c
+++ b/drivers/ras/amd/atl/umc.c
@@ -399,8 +399,9 @@ static u8 get_coh_st_inst_id(struct atl_err *err)
 	return FIELD_GET(UMC_CHANNEL_NUM, err->ipid);
 }
 
-unsigned long convert_umc_mca_addr_to_sys_addr(struct atl_err *err)
+unsigned long convert_umc_mca_addr_to_sys_addr(void *data)
 {
+	struct atl_err *err = data;
 	u8 socket_id = topology_physical_package_id(err->cpu);
 	u8 coh_st_inst_id = get_coh_st_inst_id(err);
 	unsigned long addr = get_addr(err->addr);
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index 2a5b5a9fdcb3..050b49466a18 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -10,36 +10,34 @@
 #include <linux/ras.h>
 #include <linux/uuid.h>
 
-#if IS_ENABLED(CONFIG_AMD_ATL)
 /*
  * Once set, this function pointer should never be unset.
  *
  * The library module will set this pointer if it successfully loads. The module
  * should not be unloaded except for testing and debug purposes.
  */
-static unsigned long (*amd_atl_umc_na_to_spa)(struct atl_err *err);
+static unsigned long (*atl_ras_la_to_spa)(void *err);
 
-void amd_atl_register_decoder(unsigned long (*f)(struct atl_err *))
+void atl_register_decoder(unsigned long (*f)(void *))
 {
-	amd_atl_umc_na_to_spa = f;
+	atl_ras_la_to_spa = f;
 }
-EXPORT_SYMBOL_GPL(amd_atl_register_decoder);
+EXPORT_SYMBOL_GPL(atl_register_decoder);
 
-void amd_atl_unregister_decoder(void)
+void atl_unregister_decoder(void)
 {
-	amd_atl_umc_na_to_spa = NULL;
+	atl_ras_la_to_spa = NULL;
 }
-EXPORT_SYMBOL_GPL(amd_atl_unregister_decoder);
+EXPORT_SYMBOL_GPL(atl_unregister_decoder);
 
-unsigned long amd_convert_umc_mca_addr_to_sys_addr(struct atl_err *err)
+unsigned long convert_ras_la_to_spa(void *err)
 {
-	if (!amd_atl_umc_na_to_spa)
+	if (!atl_ras_la_to_spa)
 		return -EINVAL;
 
-	return amd_atl_umc_na_to_spa(err);
+	return atl_ras_la_to_spa(err);
 }
-EXPORT_SYMBOL_GPL(amd_convert_umc_mca_addr_to_sys_addr);
-#endif /* CONFIG_AMD_ATL */
+EXPORT_SYMBOL_GPL(convert_ras_la_to_spa);
 
 #define CREATE_TRACE_POINTS
 #define TRACE_INCLUDE_PATH ../../include/ras
diff --git a/include/linux/ras.h b/include/linux/ras.h
index 05096f049dac..2270a8eb1038 100644
--- a/include/linux/ras.h
+++ b/include/linux/ras.h
@@ -42,14 +42,9 @@ struct atl_err {
 };
 
 #if IS_ENABLED(CONFIG_AMD_ATL)
-void amd_atl_register_decoder(unsigned long (*f)(struct atl_err *));
-void amd_atl_unregister_decoder(void);
 void amd_retire_dram_row(struct atl_err *err);
-unsigned long amd_convert_umc_mca_addr_to_sys_addr(struct atl_err *err);
 #else
 static inline void amd_retire_dram_row(struct atl_err *err) { }
-static inline unsigned long
-amd_convert_umc_mca_addr_to_sys_addr(struct atl_err *err) { return -EINVAL; }
 #endif /* CONFIG_AMD_ATL */
 
 #if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
@@ -63,6 +58,10 @@ amd_convert_umc_mca_addr_to_sys_addr(struct atl_err *err) { return -EINVAL; }
 #define GET_LOGICAL_INDEX(mpidr) -EINVAL
 #endif /* CONFIG_ARM || CONFIG_ARM64 */
 
+void atl_register_decoder(unsigned long (*f)(void *));
+void atl_unregister_decoder(void);
+unsigned long convert_ras_la_to_spa(void *err);
+
 #if IS_ENABLED(CONFIG_AEST)
 void aest_register_decode_chain(struct notifier_block *nb);
 void aest_unregister_decode_chain(struct notifier_block *nb);
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 14/17] ras: ATL: Unify ATL interface for ARM64 and AMD
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
                   ` (13 preceding siblings ...)
  2025-12-22  9:43 ` [PATCH v4 14/17] ras: ATL: Unified ATL interface for ARM64 and AMD Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22 23:34   ` kernel test robot
  2025-12-25  6:58   ` kernel test robot
  2025-12-22  9:43 ` [PATCH v4 15/17] ras: AEST: Add framework to process AEST vendor node Ruidong Tian
                   ` (2 subsequent siblings)
  17 siblings, 2 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

Certain components report errors using their internal physical addresses.
For instance, on AMD platforms, the UMC (Unified Memory Controller) reports
normalized addresses, while on Arm systems, memory controllers may report
logical addresses. These addresses must be translated into System Physical
Addresses (SPA) before the OS can utilize them effectively.

AMD already provides the amd_atl_umc_na_to_spa interface for physical address
translation. This patch introduces a common function, atl_ras_la_to_spa,
intended for use by both AMD and Arm64 architectures. The parameters of this
function are architecture-specific data required for the address translation
process.

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 drivers/edac/amd64_edac.c      |  2 +-
 drivers/ras/aest/aest-core.c   |  3 +++
 drivers/ras/amd/atl/core.c     |  4 ++--
 drivers/ras/amd/atl/internal.h |  2 +-
 drivers/ras/amd/atl/umc.c      |  3 ++-
 drivers/ras/ras.c              | 24 +++++++++++-------------
 include/linux/ras.h            |  9 ++++-----
 7 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/drivers/edac/amd64_edac.c b/drivers/edac/amd64_edac.c
index 2391f3469961..478cfef37892 100644
--- a/drivers/edac/amd64_edac.c
+++ b/drivers/edac/amd64_edac.c
@@ -2847,7 +2847,7 @@ static void decode_umc_error(int node_id, struct mce *m)
 	a_err.ipid = m->ipid;
 	a_err.cpu  = m->extcpu;
 
-	sys_addr = amd_convert_umc_mca_addr_to_sys_addr(&a_err);
+	sys_addr = convert_ras_la_to_spa(&a_err);
 	if (IS_ERR_VALUE(sys_addr)) {
 		err.err_code = ERR_NORM_ADDR;
 		goto log_error;
diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
index a290b482bf8b..052211ca3e2a 100644
--- a/drivers/ras/aest/aest-core.c
+++ b/drivers/ras/aest/aest-core.c
@@ -235,6 +235,9 @@ static void aest_node_pool_process(struct work_struct *work)
 		    (status & (ERR_STATUS_UE | ERR_STATUS_DE))) {
 			if (event->addressing_mode == AEST_ADDREESS_SPA)
 				addr = event->regs.err_addr & PHYS_MASK;
+			else
+				addr = convert_ras_la_to_spa(event);
+
 			aest_handle_memory_failure(addr);
 		}
 
diff --git a/drivers/ras/amd/atl/core.c b/drivers/ras/amd/atl/core.c
index 0f7cd6dab0b0..4f44c0ce97ec 100644
--- a/drivers/ras/amd/atl/core.c
+++ b/drivers/ras/amd/atl/core.c
@@ -210,7 +210,7 @@ static int __init amd_atl_init(void)
 
 	/* Increment this module's recount so that it can't be easily unloaded. */
 	__module_get(THIS_MODULE);
-	amd_atl_register_decoder(convert_umc_mca_addr_to_sys_addr);
+	atl_register_decoder(convert_umc_mca_addr_to_sys_addr);
 
 	pr_info("AMD Address Translation Library initialized\n");
 	return 0;
@@ -222,7 +222,7 @@ static int __init amd_atl_init(void)
  */
 static void __exit amd_atl_exit(void)
 {
-	amd_atl_unregister_decoder();
+	atl_unregister_decoder();
 }
 
 module_init(amd_atl_init);
diff --git a/drivers/ras/amd/atl/internal.h b/drivers/ras/amd/atl/internal.h
index 82a56d9c2be1..423a6193fdc7 100644
--- a/drivers/ras/amd/atl/internal.h
+++ b/drivers/ras/amd/atl/internal.h
@@ -279,7 +279,7 @@ int denormalize_address(struct addr_ctx *ctx);
 int dehash_address(struct addr_ctx *ctx);
 
 unsigned long norm_to_sys_addr(u8 socket_id, u8 die_id, u8 coh_st_inst_id, unsigned long addr);
-unsigned long convert_umc_mca_addr_to_sys_addr(struct atl_err *err);
+unsigned long convert_umc_mca_addr_to_sys_addr(void *data);
 
 u64 add_base_and_hole(struct addr_ctx *ctx, u64 addr);
 u64 remove_base_and_hole(struct addr_ctx *ctx, u64 addr);
diff --git a/drivers/ras/amd/atl/umc.c b/drivers/ras/amd/atl/umc.c
index befc616d5e8a..57a78c380467 100644
--- a/drivers/ras/amd/atl/umc.c
+++ b/drivers/ras/amd/atl/umc.c
@@ -399,8 +399,9 @@ static u8 get_coh_st_inst_id(struct atl_err *err)
 	return FIELD_GET(UMC_CHANNEL_NUM, err->ipid);
 }
 
-unsigned long convert_umc_mca_addr_to_sys_addr(struct atl_err *err)
+unsigned long convert_umc_mca_addr_to_sys_addr(void *data)
 {
+	struct atl_err *err = data;
 	u8 socket_id = topology_physical_package_id(err->cpu);
 	u8 coh_st_inst_id = get_coh_st_inst_id(err);
 	unsigned long addr = get_addr(err->addr);
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index 2a5b5a9fdcb3..050b49466a18 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -10,36 +10,34 @@
 #include <linux/ras.h>
 #include <linux/uuid.h>
 
-#if IS_ENABLED(CONFIG_AMD_ATL)
 /*
  * Once set, this function pointer should never be unset.
  *
  * The library module will set this pointer if it successfully loads. The module
  * should not be unloaded except for testing and debug purposes.
  */
-static unsigned long (*amd_atl_umc_na_to_spa)(struct atl_err *err);
+static unsigned long (*atl_ras_la_to_spa)(void *err);
 
-void amd_atl_register_decoder(unsigned long (*f)(struct atl_err *))
+void atl_register_decoder(unsigned long (*f)(void *))
 {
-	amd_atl_umc_na_to_spa = f;
+	atl_ras_la_to_spa = f;
 }
-EXPORT_SYMBOL_GPL(amd_atl_register_decoder);
+EXPORT_SYMBOL_GPL(atl_register_decoder);
 
-void amd_atl_unregister_decoder(void)
+void atl_unregister_decoder(void)
 {
-	amd_atl_umc_na_to_spa = NULL;
+	atl_ras_la_to_spa = NULL;
 }
-EXPORT_SYMBOL_GPL(amd_atl_unregister_decoder);
+EXPORT_SYMBOL_GPL(atl_unregister_decoder);
 
-unsigned long amd_convert_umc_mca_addr_to_sys_addr(struct atl_err *err)
+unsigned long convert_ras_la_to_spa(void *err)
 {
-	if (!amd_atl_umc_na_to_spa)
+	if (!atl_ras_la_to_spa)
 		return -EINVAL;
 
-	return amd_atl_umc_na_to_spa(err);
+	return atl_ras_la_to_spa(err);
 }
-EXPORT_SYMBOL_GPL(amd_convert_umc_mca_addr_to_sys_addr);
-#endif /* CONFIG_AMD_ATL */
+EXPORT_SYMBOL_GPL(convert_ras_la_to_spa);
 
 #define CREATE_TRACE_POINTS
 #define TRACE_INCLUDE_PATH ../../include/ras
diff --git a/include/linux/ras.h b/include/linux/ras.h
index 05096f049dac..2270a8eb1038 100644
--- a/include/linux/ras.h
+++ b/include/linux/ras.h
@@ -42,14 +42,9 @@ struct atl_err {
 };
 
 #if IS_ENABLED(CONFIG_AMD_ATL)
-void amd_atl_register_decoder(unsigned long (*f)(struct atl_err *));
-void amd_atl_unregister_decoder(void);
 void amd_retire_dram_row(struct atl_err *err);
-unsigned long amd_convert_umc_mca_addr_to_sys_addr(struct atl_err *err);
 #else
 static inline void amd_retire_dram_row(struct atl_err *err) { }
-static inline unsigned long
-amd_convert_umc_mca_addr_to_sys_addr(struct atl_err *err) { return -EINVAL; }
 #endif /* CONFIG_AMD_ATL */
 
 #if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
@@ -63,6 +58,10 @@ amd_convert_umc_mca_addr_to_sys_addr(struct atl_err *err) { return -EINVAL; }
 #define GET_LOGICAL_INDEX(mpidr) -EINVAL
 #endif /* CONFIG_ARM || CONFIG_ARM64 */
 
+void atl_register_decoder(unsigned long (*f)(void *));
+void atl_unregister_decoder(void);
+unsigned long convert_ras_la_to_spa(void *err);
+
 #if IS_ENABLED(CONFIG_AEST)
 void aest_register_decode_chain(struct notifier_block *nb);
 void aest_unregister_decode_chain(struct notifier_block *nb);
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 15/17] ras: AEST: Add framework to process AEST vendor node
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
                   ` (14 preceding siblings ...)
  2025-12-22  9:43 ` [PATCH v4 14/17] ras: ATL: Unify " Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 16/17] ras: AEST: support vendor node CMN700 Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 17/17] trace, ras: add ARM RAS extension trace event Ruidong Tian
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

AEST table include vendor error node to support the component that do
not implement standard Arm RAS architecture[1]. Each vendor node may
have their own initialize and interrupt handle function. This patch
supply a framework to process vendor error nodes, the vendor process
function is binded with vendor HID.

[1]: https://developer.arm.com/documentation/ddi0587/latest/

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 drivers/ras/aest/aest-core.c | 28 +++++++++++++++++++++++++++-
 drivers/ras/aest/aest.h      |  5 +++++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
index 052211ca3e2a..4d20e54832fd 100644
--- a/drivers/ras/aest/aest-core.c
+++ b/drivers/ras/aest/aest-core.c
@@ -925,6 +925,29 @@ static int aest_setup_irq(struct platform_device *pdev,
 	return 0;
 }
 
+static struct aest_vendor_match vendor_match[] = {
+	{  },
+};
+
+static int
+aest_vendor_probe(struct aest_device *adev, struct aest_hnode *ahnode)
+{
+	int i;
+	struct acpi_aest_node *anode;
+
+	anode = list_first_entry(&ahnode->list, struct acpi_aest_node, list);
+	if (!anode)
+		return -ENODEV;
+
+	aest_dev_dbg(adev, "Try to probe vendor node %s\n", anode->vendor->acpi_hid);
+	for (i = 0; i < ARRAY_SIZE(vendor_match); i++) {
+		if (!strncmp(vendor_match[i].hid, anode->vendor->acpi_hid, 8))
+			return vendor_match[i].probe(adev, ahnode);
+	}
+
+	return -ENODEV;
+}
+
 static int aest_device_probe(struct platform_device *pdev)
 {
 	int ret;
@@ -950,7 +973,10 @@ static int aest_device_probe(struct platform_device *pdev)
 	}
 	init_llist_head(&adev->event_list);
 
-	ret = aest_init_nodes(adev, ahnode);
+	if (ahnode->type == ACPI_AEST_VENDOR_ERROR_NODE)
+		ret = aest_vendor_probe(adev, ahnode);
+	else
+		ret = aest_init_nodes(adev, ahnode);
 	if (ret)
 		return ret;
 
diff --git a/drivers/ras/aest/aest.h b/drivers/ras/aest/aest.h
index 2f6a7b9ca4ef..304c03839d31 100644
--- a/drivers/ras/aest/aest.h
+++ b/drivers/ras/aest/aest.h
@@ -244,6 +244,11 @@ static const char *const aest_node_name[] = {
 	[ACPI_AEST_PROXY_ERROR_NODE] = "proxy",
 };
 
+struct aest_vendor_match {
+	char hid[ACPI_ID_LEN];
+	int (*probe)(struct aest_device *adev, struct aest_hnode *anode);
+};
+
 static inline int aest_set_name(struct aest_device *adev,
 				struct aest_hnode *ahnode)
 {
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 16/17] ras: AEST: support vendor node CMN700
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
                   ` (15 preceding siblings ...)
  2025-12-22  9:43 ` [PATCH v4 15/17] ras: AEST: Add framework to process AEST vendor node Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  2025-12-22  9:43 ` [PATCH v4 17/17] trace, ras: add ARM RAS extension trace event Ruidong Tian
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

The CMN (Coherent Mesh Network) architecture incorporates five distinct
device types. Each device type is associated with an error group register
set. The struct aest_cmn_700 models a single CMN instance, while struct
aest_cmn_700_child represents an individual CMN device.

CMN's error records utilize a memory-mapped single error record view [1].
Critically, one error record corresponds to one AEST node, implying that
a single CMN instance can generate hundreds of AEST nodes. To manage this
scale, this driver introduces a virtual AEST node, which represents an
entire CMN device, such as an HNI or HNF. This allows an HNF AEST node,
for instance, to leverage its errgsr register to pinpoint which specific
error record has reported an error.

During the AEST probe phase, the CMN AEST driver identifies the CMN node
type using the cmn_node_info register. It then reorganizes all AEST nodes
belonging to the same CMN node type into a cohesive CMN AEST node
structure. To locate the relevant CMN register addresses, the CMN's
presence in the DSDT is required, along with the CMN node offset
specified in the AEST vendor specification data [1].

[1]: https://developer.arm.com/documentation/102308/latest/

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 arch/arm64/include/asm/arm-cmn.h |  47 +++++
 drivers/perf/arm-cmn.c           |  37 +---
 drivers/ras/aest/Makefile        |   1 +
 drivers/ras/aest/aest-cmn.c      | 332 +++++++++++++++++++++++++++++++
 drivers/ras/aest/aest-core.c     |  42 ++--
 drivers/ras/aest/aest.h          |  39 ++++
 6 files changed, 446 insertions(+), 52 deletions(-)
 create mode 100644 arch/arm64/include/asm/arm-cmn.h
 create mode 100644 drivers/ras/aest/aest-cmn.c

diff --git a/arch/arm64/include/asm/arm-cmn.h b/arch/arm64/include/asm/arm-cmn.h
new file mode 100644
index 000000000000..1b9f50679794
--- /dev/null
+++ b/arch/arm64/include/asm/arm-cmn.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2015 ARM Ltd.
+ */
+#ifndef __ASM_ARM_CMN_H
+#define __ASM_ARM_CMN_H
+
+#include <asm/sysreg.h>
+
+/* Common register stuff */
+#define CMN_NODE_INFO			0x0000
+#define CMN_NI_NODE_TYPE		GENMASK_ULL(15, 0)
+#define CMN_NI_NODE_ID			GENMASK_ULL(31, 16)
+#define CMN_NI_LOGICAL_ID		GENMASK_ULL(47, 32)
+
+enum cmn_node_type {
+	CMN_TYPE_INVALID,
+	CMN_TYPE_DVM,
+	CMN_TYPE_CFG,
+	CMN_TYPE_DTC,
+	CMN_TYPE_HNI,
+	CMN_TYPE_HNF,
+	CMN_TYPE_XP,
+	CMN_TYPE_SBSX,
+	CMN_TYPE_MPAM_S,
+	CMN_TYPE_MPAM_NS,
+	CMN_TYPE_RNI,
+	CMN_TYPE_RND = 0xd,
+	CMN_TYPE_RNSAM = 0xf,
+	CMN_TYPE_MTSX,
+	CMN_TYPE_HNP,
+	CMN_TYPE_CXRA = 0x100,
+	CMN_TYPE_CXHA,
+	CMN_TYPE_CXLA,
+	CMN_TYPE_CCRA,
+	CMN_TYPE_CCHA,
+	CMN_TYPE_CCLA,
+	CMN_TYPE_CCLA_RNI,
+	CMN_TYPE_HNS = 0x200,
+	CMN_TYPE_HNS_MPAM_S,
+	CMN_TYPE_HNS_MPAM_NS,
+	CMN_TYPE_APB = 0x1000,
+	/* Not a real node type */
+	CMN_TYPE_WP = 0x7770
+};
+
+#endif /* __ASM_ARM_CMN_H */
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index 23245352a3fc..989482096dfb 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -2,6 +2,7 @@
 // Copyright (C) 2016-2020 Arm Limited
 // ARM CMN/CI interconnect PMU driver
 
+#include <asm/arm-cmn.h>
 #include <linux/acpi.h>
 #include <linux/bitfield.h>
 #include <linux/bitops.h>
@@ -19,11 +20,6 @@
 #include <linux/sort.h>
 
 /* Common register stuff */
-#define CMN_NODE_INFO			0x0000
-#define CMN_NI_NODE_TYPE		GENMASK_ULL(15, 0)
-#define CMN_NI_NODE_ID			GENMASK_ULL(31, 16)
-#define CMN_NI_LOGICAL_ID		GENMASK_ULL(47, 32)
-
 #define CMN_CHILD_INFO			0x0080
 #define CMN_CI_CHILD_COUNT		GENMASK_ULL(15, 0)
 #define CMN_CI_CHILD_PTR_OFFSET		GENMASK_ULL(31, 16)
@@ -241,37 +237,6 @@ enum cmn_revision {
 	REV_CI700_R2P0,
 };
 
-enum cmn_node_type {
-	CMN_TYPE_INVALID,
-	CMN_TYPE_DVM,
-	CMN_TYPE_CFG,
-	CMN_TYPE_DTC,
-	CMN_TYPE_HNI,
-	CMN_TYPE_HNF,
-	CMN_TYPE_XP,
-	CMN_TYPE_SBSX,
-	CMN_TYPE_MPAM_S,
-	CMN_TYPE_MPAM_NS,
-	CMN_TYPE_RNI,
-	CMN_TYPE_RND = 0xd,
-	CMN_TYPE_RNSAM = 0xf,
-	CMN_TYPE_MTSX,
-	CMN_TYPE_HNP,
-	CMN_TYPE_CXRA = 0x100,
-	CMN_TYPE_CXHA,
-	CMN_TYPE_CXLA,
-	CMN_TYPE_CCRA,
-	CMN_TYPE_CCHA,
-	CMN_TYPE_CCLA,
-	CMN_TYPE_CCLA_RNI,
-	CMN_TYPE_HNS = 0x200,
-	CMN_TYPE_HNS_MPAM_S,
-	CMN_TYPE_HNS_MPAM_NS,
-	CMN_TYPE_APB = 0x1000,
-	/* Not a real node type */
-	CMN_TYPE_WP = 0x7770
-};
-
 enum cmn_filter_select {
 	SEL_NONE = -1,
 	SEL_OCCUP1ID,
diff --git a/drivers/ras/aest/Makefile b/drivers/ras/aest/Makefile
index 5ee10fc8b2e9..e5a45fde6d36 100644
--- a/drivers/ras/aest/Makefile
+++ b/drivers/ras/aest/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_AEST) 	+= aest.o
 aest-y		:= aest-core.o
 aest-y		+= aest-sysfs.o
 aest-y		+= aest-inject.o
+aest-y		+= aest-cmn.o
diff --git a/drivers/ras/aest/aest-cmn.c b/drivers/ras/aest/aest-cmn.c
new file mode 100644
index 000000000000..456203377c79
--- /dev/null
+++ b/drivers/ras/aest/aest-cmn.c
@@ -0,0 +1,332 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ARM Error Source Table CMN700 Support
+ *
+ * Copyright (c) 2025, Alibaba Inc
+ */
+
+#include "linux/bitfield.h"
+#include "linux/io.h"
+#include <asm/arm-cmn.h>
+
+#include "aest.h"
+
+/*
+ * CMN include 5 device types, each device type has an error group register set
+ * which contains a set of error records. The struct aest_cmn_700 represents
+ * one CMN Instance, and the struct aest_cmn_700_child represent one CMN device.
+ * The error record of CMN use memory-mapped single error record view, so one
+ * record is correspond to one AEST node, it means there will be hundreds of
+ * AEST node of CMN. As described in chapters 2.6.3.4 of Arm ACPI Spec[1], we
+ * use vendor define data to recognize the device type of an AEST node. So AEST
+ * driver can enumerate all CMN AEST node to initialize struct aest_cmn_700 and
+ * aest_cmn_700_child with HID, UID and other CMN info described in AEST or CMN
+ * register.
+ *
+ * Each CMN Instance has their own error interrupt and the struct aest_cmn_700
+ * is passed to interrupt context. OS check error group register set to locate
+ * record which report error. All procedure is similar with chapters 3.8 in
+ * Arm CMN Spec[2].
+ *
+ * The CMN RAS architecture is showed as follow:
+ *
+ *                     +----+
+ *                  -->|XP  |     ......
+ *                  |  +----+
+ *                  |
+ *                  |  +----+     ......
+ *                  |  |HNI |     +----------------+
+ *                  |  +----+   ->|record/AEST node|
+ *                  |           | +----------------+
+ *  +------------+  |  +----+   |    .
+ *  |CMN Instance|--|  |HNF |---|    .
+ *  +------------+  |  +----+   |    .
+ *                  |           | +----------------+
+ *                  |  +----+   ->|record/AEST node|
+ *                  |  |SBSX|     +----------------+
+ *                  |  +----+     ......
+ *                  |
+ *                  |  +----+
+ *                  -->|CCG |     ......
+ *                     +----+
+ *
+ * [1]: https://developer.arm.com/documentation/den0093/latest
+ * [2]: https://developer.arm.com/documentation/102308/latest
+ */
+
+#define CMN_RAS_DEV_NUM 6
+#define CMN700_ERRGSR_NUM 8
+#define CMN_MAX_UID 8
+#define CMN_ERRDEVARCH 0x3FB8
+#define CMN_ERRDEVARCH_REV GENMASK(19, 16)
+#define CMN_ERRGSR_OFFSET 0x3000
+
+struct cmn_vendor_data {
+	int node_type;
+	int node_id;
+	int logic_id;
+};
+
+struct cmn_config {
+	int errgsr_num;
+	int dev_num;
+	int ras_ver;
+	const int *node_id_map;
+	const char *const *node_name;
+	int (*errgsr_mapping)(int errgsr_bit);
+	u64 (*errgsr_offset)(u64 hnd_ofset, int node_idx);
+};
+
+static const char *const cmn700_node_name[] = {
+	[CMN_TYPE_HNI] = "HNI",	 [CMN_TYPE_HNF] = "HNF",
+	[CMN_TYPE_XP] = "XP",	 [CMN_TYPE_SBSX] = "SBSX",
+	[CMN_TYPE_CXRA] = "RND", [CMN_TYPE_MTSX] = "MTSX",
+};
+
+static const int cmn700_node_id_map[] = {
+	[CMN_TYPE_HNI] = 1,  [CMN_TYPE_HNF] = 2,  [CMN_TYPE_XP] = 0,
+	[CMN_TYPE_SBSX] = 3, [CMN_TYPE_CXRA] = 4, [CMN_TYPE_MTSX] = 5,
+};
+
+static u64 cmn_dev_array[CMN_MAX_UID];
+static struct cmn_config *cmn_config;
+
+static u64 cmn700_errgsr_offset(u64 hnd_offset, int node_idx)
+{
+	return hnd_offset + CMN_ERRGSR_OFFSET +
+	       (node_idx * 2) * CMN700_ERRGSR_NUM * 8;
+}
+
+static struct cmn_config cmn700_config = {
+	.errgsr_num = CMN700_ERRGSR_NUM,
+	.dev_num = CMN_RAS_DEV_NUM,
+	.ras_ver = 1,
+	.node_name = cmn700_node_name,
+	.node_id_map = cmn700_node_id_map,
+	.errgsr_mapping = cmn700_errgsr_mapping,
+	.errgsr_offset = cmn700_errgsr_offset,
+};
+
+static acpi_status aest_cmn_700_resource_ioremap(struct acpi_resource *res,
+						 void *data)
+{
+	struct acpi_resource_address64 addr64;
+	u32 *uid = data;
+	acpi_status status;
+
+	status = acpi_resource_to_address64(res, &addr64);
+	if (ACPI_FAILURE(status) || (addr64.resource_type != ACPI_MEMORY_RANGE))
+		return AE_OK;
+
+	cmn_dev_array[*uid] = (u64)ioremap(addr64.address.minimum,
+					   addr64.address.address_length);
+
+	pr_debug("CMN device resource [%llx-%llx] ioremap to %llx\n",
+		 addr64.address.minimum, addr64.address.maximum,
+		 cmn_dev_array[*uid]);
+
+	return AE_CTRL_TERMINATE;
+}
+
+static acpi_status aest_cmn_get_dev_by_uid(acpi_handle handle, u32 level,
+					   void *data, void **return_value)
+{
+	u32 *match_uid = data;
+	acpi_status status;
+	unsigned long long uid;
+
+	status = acpi_evaluate_integer(handle, METHOD_NAME__UID, NULL, &uid);
+	if (ACPI_FAILURE(status)) {
+		pr_err("Do not find devive\n");
+		return_ACPI_STATUS(status);
+	}
+
+	if (uid != *match_uid)
+		return AE_OK;
+
+	pr_debug("CMN device instance %llx, walk through resource\n", uid);
+
+	status = acpi_walk_resources(handle, METHOD_NAME__CRS,
+				     aest_cmn_700_resource_ioremap, data);
+
+	if (ACPI_FAILURE(status)) {
+		pr_err("Device do not have resource\n");
+		return_ACPI_STATUS(status);
+	}
+
+	return AE_CTRL_TERMINATE;
+}
+
+static inline int aest_cmn_node_ver(void *base)
+{
+	return FIELD_GET(CMN_ERRDEVARCH_REV,
+			 readl_relaxed(base + CMN_ERRDEVARCH));
+}
+
+static int aest_cmn_init_node(struct aest_device *adev,
+			      struct aest_node *cmn_node,
+			      struct acpi_aest_node *anode, u64 type,
+			      u64 errgsr_addr)
+{
+	cmn_node->info = anode;
+	cmn_node->name = devm_kasprintf(adev->dev, GFP_KERNEL, "%s",
+					cmn_config->node_name[type]);
+	if (!cmn_node->name)
+		return -ENOMEM;
+	cmn_node->errgsr = (void *)errgsr_addr;
+	cmn_node->type = anode->type;
+	cmn_node->adev = adev;
+	cmn_node->version = cmn_config->ras_ver;
+	cmn_node->errgsr_num = cmn_config->errgsr_num;
+	cmn_node->errgsr_mapping = cmn_config->errgsr_mapping;
+	cmn_node->record_count = cmn_node->errgsr_num * BITS_PER_LONG / 2;
+	cmn_node->record_implemented = devm_bitmap_zalloc(
+		adev->dev, cmn_node->record_count, GFP_KERNEL);
+	if (!cmn_node->record_implemented)
+		return -ENOMEM;
+	bitmap_set(cmn_node->record_implemented, 0, cmn_node->record_count);
+
+	cmn_node->status_reporting = devm_bitmap_zalloc(
+		adev->dev, cmn_node->record_count, GFP_KERNEL);
+	if (!cmn_node->status_reporting)
+		return -ENOMEM;
+	bitmap_set(cmn_node->status_reporting, 0, cmn_node->record_count);
+
+	cmn_node->records = devm_kcalloc(adev->dev, cmn_node->record_count,
+					 sizeof(struct aest_record),
+					 GFP_KERNEL);
+	if (!cmn_node->records)
+		return -ENOMEM;
+
+	aest_node_dbg(cmn_node, "Node init with errgsr %llx\n", errgsr_addr);
+
+	return 0;
+}
+
+static int aest_cmn_reorgnize_node(struct aest_device *adev,
+				   struct acpi_aest_node *anode, u64 base)
+{
+	struct aest_node *cmn_node;
+	u64 hnd_offset, cmn_node_offset, reg, logic_id, type, node_id;
+	u64 errgsr_addr, hnd_base;
+	struct aest_record *record;
+	int ret, node_index;
+	struct cmn_vendor_data *vendor_data;
+
+	if (anode->interface_hdr->type !=
+	    ACPI_AEST_NODE_SINGLE_RECORD_MEMORY_MAPPED) {
+		aest_dev_err(adev, "CMN just use single memory mapping\n");
+		return -ENODEV;
+	}
+
+	hnd_offset = *((u64 *)anode->vendor->vendor_specific_data);
+	cmn_node_offset = *((u64 *)&anode->vendor->vendor_specific_data[8]);
+
+	reg = readq_relaxed((void *)base + cmn_node_offset + CMN_NODE_INFO);
+
+	logic_id = FIELD_GET(CMN_NI_LOGICAL_ID, reg);
+	type = FIELD_GET(CMN_NI_NODE_TYPE, reg);
+	node_id = FIELD_GET(CMN_NI_NODE_ID, reg);
+
+	hnd_base = base + hnd_offset;
+	node_index = cmn_config->node_id_map[type];
+	errgsr_addr = base + cmn_config->errgsr_offset(hnd_offset, node_index);
+
+	// node not register, create it
+	cmn_node = &adev->nodes[node_index];
+	if (!cmn_node->errgsr) {
+		ret = aest_cmn_init_node(adev, cmn_node, anode, type,
+					 errgsr_addr);
+		if (ret)
+			return -ENOMEM;
+	}
+
+	aest_dev_dbg(adev, "node type %llx, id %llx, offset %llx\n", type,
+		     logic_id, cmn_node_offset);
+
+	if (!test_bit(0, anode->record_implemented))
+		clear_bit(logic_id, cmn_node->record_implemented);
+
+	if (!test_bit(0, anode->status_reporting))
+		clear_bit(logic_id, cmn_node->status_reporting);
+
+	record = &cmn_node->records[logic_id];
+	record->name =
+		devm_kasprintf(adev->dev, GFP_KERNEL, "record%lld", logic_id);
+	if (!record->name)
+		return -ENOMEM;
+	record->regs_base = devm_ioremap(
+		adev->dev, (resource_size_t)anode->interface_hdr->address,
+		sizeof(struct ras_ext_regs));
+	if (!record->regs_base)
+		return -ENOMEM;
+	record->addressing_mode = test_bit(0, anode->addressing_mode);
+	record->node = cmn_node;
+	record->index = logic_id;
+	record->access = &aest_access[anode->interface_hdr->type];
+
+	vendor_data = devm_kzalloc(adev->dev, sizeof(struct cmn_vendor_data),
+				   GFP_KERNEL);
+	vendor_data->node_type = type;
+	vendor_data->node_id = node_id;
+	vendor_data->logic_id = logic_id;
+
+	record->vendor_data = vendor_data;
+	record->vendor_data_size = sizeof(struct cmn_vendor_data);
+
+	aest_record_dbg(record, "base %llx\n", anode->interface_hdr->address);
+
+	return 0;
+}
+
+// reorgnize cmn node
+static int aest_cmn_probe(struct aest_device *adev, struct aest_hnode *ahnode)
+{
+	acpi_status status;
+	u64 base;
+	int ret = 0;
+	struct acpi_aest_node *anode;
+	char name[9];
+
+	anode = list_first_entry(&ahnode->list, struct acpi_aest_node, list);
+	if (!anode)
+		return -ENODEV;
+
+	if (!cmn_dev_array[anode->vendor->acpi_uid]) {
+		snprintf(name, 9, "%s", anode->vendor->acpi_hid);
+		status = acpi_get_devices(name, aest_cmn_get_dev_by_uid,
+					  &anode->vendor->acpi_uid, NULL);
+		if (ACPI_FAILURE(status)) {
+			aest_dev_err(adev, "Can not find base\n");
+			return_ACPI_STATUS(status);
+		}
+	}
+	base = cmn_dev_array[anode->vendor->acpi_uid];
+	if (!base) {
+		aest_dev_err(adev, "Device base invalid\n");
+		return -ENODEV;
+	}
+
+	adev->type = anode->type;
+	adev->node_cnt = cmn_config->dev_num;
+	adev->nodes = devm_kcalloc(adev->dev, adev->node_cnt,
+				   sizeof(struct aest_node), GFP_KERNEL);
+	if (!adev->nodes)
+		return -ENOMEM;
+	aest_set_name(adev, ahnode);
+
+	list_for_each_entry(anode, &ahnode->list, list) {
+		ret = aest_cmn_reorgnize_node(adev, anode, base);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
+
+int aest_cmn700_probe(struct aest_device *adev, struct aest_hnode *ahnode)
+{
+	cmn_config = &cmn700_config;
+
+	return aest_cmn_probe(adev, ahnode);
+}
diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
index 4d20e54832fd..33e1f32c5892 100644
--- a/drivers/ras/aest/aest-core.c
+++ b/drivers/ras/aest/aest-core.c
@@ -152,6 +152,8 @@ static void init_aest_event(struct aest_event *event,
 	memcpy(&event->regs, regs, sizeof(*regs));
 	event->index = record->index;
 	event->addressing_mode = record->addressing_mode;
+	event->vendor_data_size = record->vendor_data_size;
+	event->vendor_data = record->vendor_data;
 }
 
 static int aest_node_gen_pool_add(struct aest_device *adev,
@@ -344,10 +346,9 @@ void aest_proc_record(struct aest_record *record, void *data, bool fake)
 	record_write(record, ERXSTATUS, regs.err_status);
 }
 
-static void aest_node_foreach_record(void (*func)(struct aest_record *, void *,
-						  bool),
-				     struct aest_node *node, void *data,
-				     unsigned long *bitmap)
+void aest_node_foreach_record(void (*func)(struct aest_record *, void *, bool),
+			      struct aest_node *node, void *data,
+			      unsigned long *bitmap)
 {
 	int i;
 
@@ -362,7 +363,7 @@ static void aest_node_foreach_record(void (*func)(struct aest_record *, void *,
 
 static int aest_proc(struct aest_node *node)
 {
-	int count = 0, i, j, size = node->record_count;
+	int count = 0, i, j, size = node->record_count, record_idx;
 	u64 err_group = 0;
 
 	aest_node_dbg(node, "Poll bitmap %*pb\n", size,
@@ -377,19 +378,21 @@ static int aest_proc(struct aest_node *node)
 		      node->status_reporting);
 	for (i = 0; i < BITS_TO_U64(size); i++) {
 		err_group = readq_relaxed((void *)node->errgsr + i * 8);
-		aest_node_dbg(node, "errgsr[%d]: 0x%llx\n", i, err_group);
-
 		for_each_set_bit(j, (unsigned long *)&err_group,
 				 BITS_PER_LONG) {
+			record_idx =
+				node->errgsr_mapping(i * BITS_PER_LONG + j);
+			aest_node_dbg(node, "errgsr[%d]: bit %d occur error\n",
+				      i, record_idx);
 			/*
 			 * Error group base is only valid in Memory Map node,
 			 * so driver do not need to write select register and
 			 * sync.
 			 */
-			if (test_bit(i * BITS_PER_LONG + j,
-				     node->status_reporting))
+			if (test_bit(record_idx, node->status_reporting))
 				continue;
-			aest_proc_record(&node->records[j], &count, false);
+			aest_proc_record(&node->records[record_idx], &count,
+					 false);
 		}
 	}
 
@@ -401,8 +404,11 @@ static irqreturn_t aest_irq_func(int irq, void *input)
 	struct aest_device *adev = input;
 	int i;
 
-	for (i = 0; i < adev->node_cnt; i++)
+	for (i = 0; i < adev->node_cnt; i++) {
+		if (!adev->nodes[i].record_count)
+			continue;
 		aest_proc(&adev->nodes[i]);
+	}
 
 	return IRQ_HANDLED;
 }
@@ -779,6 +785,7 @@ static int aest_init_node(struct aest_device *adev, struct aest_node *node,
 	node->info = anode;
 	node->type = anode->type;
 	node->version = get_aest_node_ver(node);
+	node->errgsr_mapping = default_errgsr_mapping;
 	node->name = alloc_aest_node_name(node);
 	if (!node->name)
 		return -ENOMEM;
@@ -831,6 +838,7 @@ static int aest_init_node(struct aest_device *adev, struct aest_node *node,
 	if (!node->records)
 		return -ENOMEM;
 
+	node->errgsr_num = DIV_ROUND_UP(node->record_count, BITS_PER_LONG);
 	for (i = 0; i < node->record_count; i++) {
 		ret = aest_init_record(&node->records[i], i, node);
 		if (ret)
@@ -926,11 +934,12 @@ static int aest_setup_irq(struct platform_device *pdev,
 }
 
 static struct aest_vendor_match vendor_match[] = {
-	{  },
+	{ "ARMHC700", &aest_cmn700_probe },
+	{},
 };
 
-static int
-aest_vendor_probe(struct aest_device *adev, struct aest_hnode *ahnode)
+static int aest_vendor_probe(struct aest_device *adev,
+			     struct aest_hnode *ahnode)
 {
 	int i;
 	struct acpi_aest_node *anode;
@@ -939,13 +948,14 @@ aest_vendor_probe(struct aest_device *adev, struct aest_hnode *ahnode)
 	if (!anode)
 		return -ENODEV;
 
-	aest_dev_dbg(adev, "Try to probe vendor node %s\n", anode->vendor->acpi_hid);
+	aest_dev_dbg(adev, "Try to probe vendor node %s\n",
+		     anode->vendor->acpi_hid);
 	for (i = 0; i < ARRAY_SIZE(vendor_match); i++) {
 		if (!strncmp(vendor_match[i].hid, anode->vendor->acpi_hid, 8))
 			return vendor_match[i].probe(adev, ahnode);
 	}
 
-	return -ENODEV;
+	return 0;
 }
 
 static int aest_device_probe(struct platform_device *pdev)
diff --git a/drivers/ras/aest/aest.h b/drivers/ras/aest/aest.h
index 304c03839d31..9d67d79eb4a2 100644
--- a/drivers/ras/aest/aest.h
+++ b/drivers/ras/aest/aest.h
@@ -94,8 +94,16 @@ struct aest_event {
 	/* Vendor node	: hardware ID. */
 	char *hid;
 	u32 index;
+	u64 ce_threshold;
 	int addressing_mode;
 	struct ras_ext_regs regs;
+
+	/*
+	 * This field is used to store vendor specific data for decoding error
+	 * record by EDAC driver.
+	 */
+	void *vendor_data;
+	size_t vendor_data_size;
 };
 
 struct aest_access {
@@ -147,6 +155,9 @@ struct aest_record {
 	enum ras_ce_threshold threshold_type;
 	struct record_count count;
 	struct dentry *debugfs;
+
+	void *vendor_data;
+	size_t vendor_data_size;
 };
 
 struct aest_group {
@@ -208,6 +219,19 @@ struct aest_node {
 	 */
 	unsigned long *status_reporting;
 	int version;
+	/*
+	 * Usually bit[n] in errgsr indicates [n]th error record within this
+	 * error node report error. But some compoent may have different rules.
+	 * For example, CMN700 TRM 4.3.5.12 say:
+	 *	``` Error occurs when the index is even and Fault
+	 *	    occurs when the index is odd. ```
+	 *	Bit[n]: record[n] report ERROR.
+	 *	Bit[n + 1]: record[n] report FAULT.
+	 * errgsr_mapping function is used to map errgsr bit to record index
+	 * for various components.
+	 */
+	int (*errgsr_mapping)(int errgsr_bit);
+	int errgsr_num;
 
 	const struct aest_group *group;
 	struct aest_device *adev;
@@ -366,6 +390,21 @@ static inline bool aest_dev_is_oncore(struct aest_device *adev)
 	return adev->type == ACPI_AEST_PROCESSOR_ERROR_NODE;
 }
 
+static inline int default_errgsr_mapping(int errgsr_bit)
+{
+	return errgsr_bit;
+}
+
+static inline int cmn700_errgsr_mapping(int errgsr_bit)
+{
+	return errgsr_bit / 2;
+}
+
 void aest_dev_init_debugfs(struct aest_device *adev);
 void aest_inject_init_debugfs(struct aest_record *record);
 void aest_proc_record(struct aest_record *record, void *data, bool fake);
+void aest_node_foreach_record(void (*func)(struct aest_record *, void *, bool),
+			      struct aest_node *node, void *data,
+			      unsigned long *bitmap);
+
+int aest_cmn700_probe(struct aest_device *adev, struct aest_hnode *ahnode);
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v4 17/17] trace, ras: add ARM RAS extension trace event
  2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
                   ` (16 preceding siblings ...)
  2025-12-22  9:43 ` [PATCH v4 16/17] ras: AEST: support vendor node CMN700 Ruidong Tian
@ 2025-12-22  9:43 ` Ruidong Tian
  17 siblings, 0 replies; 22+ messages in thread
From: Ruidong Tian @ 2025-12-22  9:43 UTC (permalink / raw)
  To: catalin.marinas, will, lpieralisi, guohanjun, sudeep.holla,
	xueshuai, linux-kernel, linux-acpi, linux-arm-kernel, rafael,
	lenb, tony.luck, bp, yazen.ghannam, misono.tomohiro
  Cc: tianruidong

Add a trace event for hardware errors reported by the ARMv8
RAS extension registers. userspace app can monitor this
trace event and decode error information.

Signed-off-by: Ruidong Tian <tianruidong@linux.alibaba.com>
---
 drivers/ras/aest/aest-core.c |  6 +++
 drivers/ras/ras.c            |  3 ++
 include/ras/ras_event.h      | 71 ++++++++++++++++++++++++++++++++++++
 3 files changed, 80 insertions(+)

diff --git a/drivers/ras/aest/aest-core.c b/drivers/ras/aest/aest-core.c
index 33e1f32c5892..9f06ac0b7c16 100644
--- a/drivers/ras/aest/aest-core.c
+++ b/drivers/ras/aest/aest-core.c
@@ -13,6 +13,8 @@
 #include <linux/genalloc.h>
 #include <linux/ras.h>
 
+#include <ras/ras_event.h>
+
 #include "aest.h"
 
 DEFINE_PER_CPU(struct aest_device, percpu_adev);
@@ -90,6 +92,10 @@ static void aest_print(struct aest_event *event)
 		pr_err("%s  ERR%dMISC3: 0x%llx\n", pfx_seq, index,
 		       regs->err_misc[3]);
 	}
+
+	trace_arm_ras_ext_event(event->type, event->id0, event->id1,
+				event->index, event->hid, &event->regs,
+				event->vendor_data, event->vendor_data_size);
 }
 
 static void aest_handle_memory_failure(u64 addr)
diff --git a/drivers/ras/ras.c b/drivers/ras/ras.c
index 050b49466a18..3c0ba6c02d27 100644
--- a/drivers/ras/ras.c
+++ b/drivers/ras/ras.c
@@ -109,6 +109,9 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(extlog_mem_event);
 EXPORT_TRACEPOINT_SYMBOL_GPL(mc_event);
 EXPORT_TRACEPOINT_SYMBOL_GPL(non_standard_event);
 EXPORT_TRACEPOINT_SYMBOL_GPL(arm_event);
+#ifdef CONFIG_ARM64_RAS_EXTN
+EXPORT_TRACEPOINT_SYMBOL_GPL(arm_ras_ext_event);
+#endif
 
 static int __init parse_ras_param(char *str)
 {
diff --git a/include/ras/ras_event.h b/include/ras/ras_event.h
index eaecc3c5f772..3a4a0c0e4dbe 100644
--- a/include/ras/ras_event.h
+++ b/include/ras/ras_event.h
@@ -377,6 +377,77 @@ TRACE_EVENT(aer_event,
 			"Not available")
 );
 #endif /* CONFIG_PCIEAER */
+
+/*
+ * ARM RAS Extension Events Report
+ *
+ * This event is generated when an error reported by the ARM RAS extension
+ * hardware is detected.
+ */
+
+#ifdef CONFIG_ARM64_RAS_EXTN
+#include <asm/ras.h>
+TRACE_EVENT(arm_ras_ext_event,
+
+	TP_PROTO(const u8 type,
+		 const u32 id0,
+		 const u32 id1,
+		 const u32 index,
+		 char *hid,
+		 struct ras_ext_regs *regs,
+		 const u8 *data,
+		 const u32 len),
+
+	TP_ARGS(type, id0, id1, index, hid, regs, data, len),
+
+	TP_STRUCT__entry(
+		__field(u8,  type)
+		__field(u32, id0)
+		__field(u32, id1)
+		__field(u32, index)
+		__field(char *, hid)
+		__field(u64, err_fr)
+		__field(u64, err_ctlr)
+		__field(u64, err_status)
+		__field(u64, err_addr)
+		__field(u64, err_misc0)
+		__field(u64, err_misc1)
+		__field(u64, err_misc2)
+		__field(u64, err_misc3)
+		__field(u32, len)
+		__dynamic_array(u8, buf, len)
+	),
+
+	TP_fast_assign(
+		__entry->type = type;
+		__entry->id0 = id0;
+		__entry->id1 = id1;
+		__entry->index = index;
+		__entry->hid = hid;
+		__entry->err_fr = regs->err_fr;
+		__entry->err_ctlr = regs->err_ctlr;
+		__entry->err_status = regs->err_status;
+		__entry->err_addr = regs->err_addr;
+		__entry->err_misc0 = regs->err_misc[0];
+		__entry->err_misc1 = regs->err_misc[1];
+		__entry->err_misc2 = regs->err_misc[2];
+		__entry->err_misc3 = regs->err_misc[3];
+		__entry->len = len;
+		memcpy(__get_dynamic_array(buf), data, len);
+	),
+
+	TP_printk("type: %d; id0: %d; id1: %d; index: %d; hid: %s; "
+		  "ERR_FR: %llx; ERR_CTLR: %llx; ERR_STATUS: %llx; "
+		  "ERR_ADDR: %llx; ERR_MISC0: %llx; ERR_MISC1: %llx; "
+		  "ERR_MISC2: %llx; ERR_MISC3: %llx; data len:%d; raw data:%s",
+		  __entry->type, __entry->id0, __entry->id1, __entry->index,
+		  __entry->hid, __entry->err_fr, __entry->err_ctlr,
+		  __entry->err_status, __entry->err_addr, __entry->err_misc0,
+		  __entry->err_misc1, __entry->err_misc2, __entry->err_misc3,
+		  __entry->len,
+		  __print_hex(__get_dynamic_array(buf), __entry->len))
+);
+#endif /* CONFIG_ARM64_RAS_EXTN */
 #endif /* _TRACE_HW_EVENT_MC_H */
 
 /* This part must be outside protection */
-- 
2.51.2.612.gdc70283dfc



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 14/17] ras: ATL: Unify ATL interface for ARM64 and AMD
  2025-12-22  9:43 ` [PATCH v4 14/17] ras: ATL: Unify " Ruidong Tian
@ 2025-12-22 23:34   ` kernel test robot
  2025-12-25  6:58   ` kernel test robot
  1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2025-12-22 23:34 UTC (permalink / raw)
  To: Ruidong Tian, catalin.marinas, will, lpieralisi, guohanjun,
	sudeep.holla, xueshuai, linux-kernel, linux-acpi,
	linux-arm-kernel, rafael, lenb, tony.luck, bp, yazen.ghannam,
	misono.tomohiro
  Cc: oe-kbuild-all, tianruidong

Hi Ruidong,

kernel test robot noticed the following build errors:

[auto build test ERROR on rafael-pm/linux-next]
[also build test ERROR on rafael-pm/bleeding-edge ras/edac-for-next linus/master v6.19-rc2 next-20251219]
[cannot apply to arm64/for-next/core tip/smp/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Ruidong-Tian/ACPI-AEST-Parse-the-AEST-table/20251222-175211
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
patch link:    https://lore.kernel.org/r/20251222094351.38792-16-tianruidong%40linux.alibaba.com
patch subject: [PATCH v4 14/17] ras: ATL: Unify ATL interface for ARM64 and AMD
config: x86_64-rhel-9.4 (https://download.01.org/0day-ci/archive/20251223/202512230007.Vs6IvFVD-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251223/202512230007.Vs6IvFVD-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512230007.Vs6IvFVD-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/ras/amd/fmpm.c: In function 'save_spa':
>> drivers/ras/amd/fmpm.c:336:15: error: implicit declaration of function 'amd_convert_umc_mca_addr_to_sys_addr'; did you mean 'convert_umc_mca_addr_to_sys_addr'? [-Wimplicit-function-declaration]
     336 |         spa = amd_convert_umc_mca_addr_to_sys_addr(&a_err);
         |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |               convert_umc_mca_addr_to_sys_addr
--
   drivers/ras/amd/atl/umc.c: In function '_retire_row_mi300':
>> drivers/ras/amd/atl/umc.c:321:24: error: implicit declaration of function 'amd_convert_umc_mca_addr_to_sys_addr'; did you mean 'convert_umc_mca_addr_to_sys_addr'? [-Wimplicit-function-declaration]
     321 |                 addr = amd_convert_umc_mca_addr_to_sys_addr(a_err);
         |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |                        convert_umc_mca_addr_to_sys_addr


vim +336 drivers/ras/amd/fmpm.c

6f15e617cc9932 Yazen Ghannam 2024-02-13  298  
838850c50884cd Yazen Ghannam 2024-03-01  299  static void save_spa(struct fru_rec *rec, unsigned int entry,
838850c50884cd Yazen Ghannam 2024-03-01  300  		     u64 addr, u64 id, unsigned int cpu)
838850c50884cd Yazen Ghannam 2024-03-01  301  {
838850c50884cd Yazen Ghannam 2024-03-01  302  	unsigned int i, fru_idx, spa_entry;
838850c50884cd Yazen Ghannam 2024-03-01  303  	struct atl_err a_err;
838850c50884cd Yazen Ghannam 2024-03-01  304  	unsigned long spa;
838850c50884cd Yazen Ghannam 2024-03-01  305  
838850c50884cd Yazen Ghannam 2024-03-01  306  	if (entry >= max_nr_entries) {
838850c50884cd Yazen Ghannam 2024-03-01  307  		pr_warn_once("FRU descriptor entry %d out-of-bounds (max: %d)\n",
838850c50884cd Yazen Ghannam 2024-03-01  308  			     entry, max_nr_entries);
838850c50884cd Yazen Ghannam 2024-03-01  309  		return;
838850c50884cd Yazen Ghannam 2024-03-01  310  	}
838850c50884cd Yazen Ghannam 2024-03-01  311  
838850c50884cd Yazen Ghannam 2024-03-01  312  	/* spa_nr_entries is always multiple of max_nr_entries */
838850c50884cd Yazen Ghannam 2024-03-01  313  	for (i = 0; i < spa_nr_entries; i += max_nr_entries) {
838850c50884cd Yazen Ghannam 2024-03-01  314  		fru_idx = i / max_nr_entries;
838850c50884cd Yazen Ghannam 2024-03-01  315  		if (fru_records[fru_idx] == rec)
838850c50884cd Yazen Ghannam 2024-03-01  316  			break;
838850c50884cd Yazen Ghannam 2024-03-01  317  	}
838850c50884cd Yazen Ghannam 2024-03-01  318  
838850c50884cd Yazen Ghannam 2024-03-01  319  	if (i >= spa_nr_entries) {
838850c50884cd Yazen Ghannam 2024-03-01  320  		pr_warn_once("FRU record %d not found\n", i);
838850c50884cd Yazen Ghannam 2024-03-01  321  		return;
838850c50884cd Yazen Ghannam 2024-03-01  322  	}
838850c50884cd Yazen Ghannam 2024-03-01  323  
838850c50884cd Yazen Ghannam 2024-03-01  324  	spa_entry = i + entry;
838850c50884cd Yazen Ghannam 2024-03-01  325  	if (spa_entry >= spa_nr_entries) {
838850c50884cd Yazen Ghannam 2024-03-01  326  		pr_warn_once("spa_entries[] index out-of-bounds\n");
838850c50884cd Yazen Ghannam 2024-03-01  327  		return;
838850c50884cd Yazen Ghannam 2024-03-01  328  	}
838850c50884cd Yazen Ghannam 2024-03-01  329  
838850c50884cd Yazen Ghannam 2024-03-01  330  	memset(&a_err, 0, sizeof(struct atl_err));
838850c50884cd Yazen Ghannam 2024-03-01  331  
838850c50884cd Yazen Ghannam 2024-03-01  332  	a_err.addr = addr;
838850c50884cd Yazen Ghannam 2024-03-01  333  	a_err.ipid = id;
838850c50884cd Yazen Ghannam 2024-03-01  334  	a_err.cpu  = cpu;
838850c50884cd Yazen Ghannam 2024-03-01  335  
838850c50884cd Yazen Ghannam 2024-03-01 @336  	spa = amd_convert_umc_mca_addr_to_sys_addr(&a_err);
838850c50884cd Yazen Ghannam 2024-03-01  337  	if (IS_ERR_VALUE(spa)) {
838850c50884cd Yazen Ghannam 2024-03-01  338  		pr_debug("Failed to get system address\n");
838850c50884cd Yazen Ghannam 2024-03-01  339  		return;
838850c50884cd Yazen Ghannam 2024-03-01  340  	}
838850c50884cd Yazen Ghannam 2024-03-01  341  
838850c50884cd Yazen Ghannam 2024-03-01  342  	spa_entries[spa_entry] = spa;
838850c50884cd Yazen Ghannam 2024-03-01  343  	pr_debug("fru_idx: %u, entry: %u, spa_entry: %u, spa: 0x%016llx\n",
838850c50884cd Yazen Ghannam 2024-03-01  344  		 fru_idx, entry, spa_entry, spa_entries[spa_entry]);
838850c50884cd Yazen Ghannam 2024-03-01  345  }
838850c50884cd Yazen Ghannam 2024-03-01  346  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 10/17] ras: AEST: Introduce AEST driver sysfs interface
  2025-12-22  9:43 ` [PATCH v4 10/17] ras: AEST: Introduce AEST driver sysfs interface Ruidong Tian
@ 2025-12-23  0:57   ` kernel test robot
  0 siblings, 0 replies; 22+ messages in thread
From: kernel test robot @ 2025-12-23  0:57 UTC (permalink / raw)
  To: Ruidong Tian, catalin.marinas, will, lpieralisi, guohanjun,
	sudeep.holla, xueshuai, linux-kernel, linux-acpi,
	linux-arm-kernel, rafael, lenb, tony.luck, bp, yazen.ghannam,
	misono.tomohiro
  Cc: oe-kbuild-all, tianruidong

Hi Ruidong,

kernel test robot noticed the following build warnings:

[auto build test WARNING on rafael-pm/linux-next]
[also build test WARNING on rafael-pm/bleeding-edge ras/edac-for-next next-20251219]
[cannot apply to arm64/for-next/core linus/master tip/smp/core v6.16-rc1]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Ruidong-Tian/ACPI-AEST-Parse-the-AEST-table/20251222-175211
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
patch link:    https://lore.kernel.org/r/20251222094351.38792-11-tianruidong%40linux.alibaba.com
patch subject: [PATCH v4 10/17] ras: AEST: Introduce AEST driver sysfs interface
reproduce: (https://download.01.org/0day-ci/archive/20251223/202512230122.CfXZcF76-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512230122.CfXZcF76-lkp@intel.com/

All warnings (new ones prefixed by >>):

   Using alabaster theme
   ERROR: Cannot find file ./include/linux/pci.h
   WARNING: No kernel-doc for file ./include/linux/pci.h
   ERROR: Cannot find file ./include/linux/mod_devicetable.h
   WARNING: No kernel-doc for file ./include/linux/mod_devicetable.h
>> Documentation/ABI/testing/debugfs-aest:1: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]
   ERROR: Cannot find file ./include/linux/bootconfig.h
   WARNING: No kernel-doc for file ./include/linux/bootconfig.h
   ERROR: Cannot find file ./include/linux/pstore_zone.h
   ERROR: Cannot find file ./include/linux/pstore_zone.h
   WARNING: No kernel-doc for file ./include/linux/pstore_zone.h


vim +1 Documentation/ABI/testing/debugfs-aest

   > 1	What:		/sys/kernel/debug/aest/<name>.<id>/
     2	Date:		Dec 2025
     3	KernelVersion	6.19
     4	Contact:	Ruidong Tian <tianruidong@linux.alibaba.com>
     5	Description:
     6			Directory represented a AEST device, <name> means device type,
     7			like:
     8	
     9				processor
    10				memory
    11				smmu
    12				...
    13			<id> is the unique ID for this device.
    14	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v4 14/17] ras: ATL: Unify ATL interface for ARM64 and AMD
  2025-12-22  9:43 ` [PATCH v4 14/17] ras: ATL: Unify " Ruidong Tian
  2025-12-22 23:34   ` kernel test robot
@ 2025-12-25  6:58   ` kernel test robot
  1 sibling, 0 replies; 22+ messages in thread
From: kernel test robot @ 2025-12-25  6:58 UTC (permalink / raw)
  To: Ruidong Tian, catalin.marinas, will, lpieralisi, guohanjun,
	sudeep.holla, xueshuai, linux-kernel, linux-acpi,
	linux-arm-kernel, rafael, lenb, tony.luck, bp, yazen.ghannam,
	misono.tomohiro
  Cc: oe-kbuild-all, tianruidong

Hi Ruidong,

kernel test robot noticed the following build errors:

[auto build test ERROR on rafael-pm/linux-next]
[also build test ERROR on rafael-pm/bleeding-edge linus/master v6.19-rc2 next-20251219]
[cannot apply to arm64/for-next/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Ruidong-Tian/ACPI-AEST-Parse-the-AEST-table/20251222-215248
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
patch link:    https://lore.kernel.org/r/20251222094351.38792-16-tianruidong%40linux.alibaba.com
patch subject: [PATCH v4 14/17] ras: ATL: Unify ATL interface for ARM64 and AMD
config: x86_64-rhel-9.4 (https://download.01.org/0day-ci/archive/20251225/202512251419.gOeKyBqX-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251225/202512251419.gOeKyBqX-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512251419.gOeKyBqX-lkp@intel.com/

All errors (new ones prefixed by >>):

   drivers/ras/amd/fmpm.c: In function 'save_spa':
>> drivers/ras/amd/fmpm.c:336:15: error: implicit declaration of function 'amd_convert_umc_mca_addr_to_sys_addr'; did you mean 'convert_umc_mca_addr_to_sys_addr'? [-Wimplicit-function-declaration]
     336 |         spa = amd_convert_umc_mca_addr_to_sys_addr(&a_err);
         |               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |               convert_umc_mca_addr_to_sys_addr
--
   drivers/ras/amd/atl/umc.c: In function '_retire_row_mi300':
>> drivers/ras/amd/atl/umc.c:321:24: error: implicit declaration of function 'amd_convert_umc_mca_addr_to_sys_addr'; did you mean 'convert_umc_mca_addr_to_sys_addr'? [-Wimplicit-function-declaration]
     321 |                 addr = amd_convert_umc_mca_addr_to_sys_addr(a_err);
         |                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         |                        convert_umc_mca_addr_to_sys_addr


vim +336 drivers/ras/amd/fmpm.c

6f15e617cc9932 Yazen Ghannam 2024-02-13  298  
838850c50884cd Yazen Ghannam 2024-03-01  299  static void save_spa(struct fru_rec *rec, unsigned int entry,
838850c50884cd Yazen Ghannam 2024-03-01  300  		     u64 addr, u64 id, unsigned int cpu)
838850c50884cd Yazen Ghannam 2024-03-01  301  {
838850c50884cd Yazen Ghannam 2024-03-01  302  	unsigned int i, fru_idx, spa_entry;
838850c50884cd Yazen Ghannam 2024-03-01  303  	struct atl_err a_err;
838850c50884cd Yazen Ghannam 2024-03-01  304  	unsigned long spa;
838850c50884cd Yazen Ghannam 2024-03-01  305  
838850c50884cd Yazen Ghannam 2024-03-01  306  	if (entry >= max_nr_entries) {
838850c50884cd Yazen Ghannam 2024-03-01  307  		pr_warn_once("FRU descriptor entry %d out-of-bounds (max: %d)\n",
838850c50884cd Yazen Ghannam 2024-03-01  308  			     entry, max_nr_entries);
838850c50884cd Yazen Ghannam 2024-03-01  309  		return;
838850c50884cd Yazen Ghannam 2024-03-01  310  	}
838850c50884cd Yazen Ghannam 2024-03-01  311  
838850c50884cd Yazen Ghannam 2024-03-01  312  	/* spa_nr_entries is always multiple of max_nr_entries */
838850c50884cd Yazen Ghannam 2024-03-01  313  	for (i = 0; i < spa_nr_entries; i += max_nr_entries) {
838850c50884cd Yazen Ghannam 2024-03-01  314  		fru_idx = i / max_nr_entries;
838850c50884cd Yazen Ghannam 2024-03-01  315  		if (fru_records[fru_idx] == rec)
838850c50884cd Yazen Ghannam 2024-03-01  316  			break;
838850c50884cd Yazen Ghannam 2024-03-01  317  	}
838850c50884cd Yazen Ghannam 2024-03-01  318  
838850c50884cd Yazen Ghannam 2024-03-01  319  	if (i >= spa_nr_entries) {
838850c50884cd Yazen Ghannam 2024-03-01  320  		pr_warn_once("FRU record %d not found\n", i);
838850c50884cd Yazen Ghannam 2024-03-01  321  		return;
838850c50884cd Yazen Ghannam 2024-03-01  322  	}
838850c50884cd Yazen Ghannam 2024-03-01  323  
838850c50884cd Yazen Ghannam 2024-03-01  324  	spa_entry = i + entry;
838850c50884cd Yazen Ghannam 2024-03-01  325  	if (spa_entry >= spa_nr_entries) {
838850c50884cd Yazen Ghannam 2024-03-01  326  		pr_warn_once("spa_entries[] index out-of-bounds\n");
838850c50884cd Yazen Ghannam 2024-03-01  327  		return;
838850c50884cd Yazen Ghannam 2024-03-01  328  	}
838850c50884cd Yazen Ghannam 2024-03-01  329  
838850c50884cd Yazen Ghannam 2024-03-01  330  	memset(&a_err, 0, sizeof(struct atl_err));
838850c50884cd Yazen Ghannam 2024-03-01  331  
838850c50884cd Yazen Ghannam 2024-03-01  332  	a_err.addr = addr;
838850c50884cd Yazen Ghannam 2024-03-01  333  	a_err.ipid = id;
838850c50884cd Yazen Ghannam 2024-03-01  334  	a_err.cpu  = cpu;
838850c50884cd Yazen Ghannam 2024-03-01  335  
838850c50884cd Yazen Ghannam 2024-03-01 @336  	spa = amd_convert_umc_mca_addr_to_sys_addr(&a_err);
838850c50884cd Yazen Ghannam 2024-03-01  337  	if (IS_ERR_VALUE(spa)) {
838850c50884cd Yazen Ghannam 2024-03-01  338  		pr_debug("Failed to get system address\n");
838850c50884cd Yazen Ghannam 2024-03-01  339  		return;
838850c50884cd Yazen Ghannam 2024-03-01  340  	}
838850c50884cd Yazen Ghannam 2024-03-01  341  
838850c50884cd Yazen Ghannam 2024-03-01  342  	spa_entries[spa_entry] = spa;
838850c50884cd Yazen Ghannam 2024-03-01  343  	pr_debug("fru_idx: %u, entry: %u, spa_entry: %u, spa: 0x%016llx\n",
838850c50884cd Yazen Ghannam 2024-03-01  344  		 fru_idx, entry, spa_entry, spa_entries[spa_entry]);
838850c50884cd Yazen Ghannam 2024-03-01  345  }
838850c50884cd Yazen Ghannam 2024-03-01  346  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2025-12-25  7:00 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-22  9:43 [PATCH v4 00/17] ARM Error Source Table V2 Support Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 01/17] ACPI/AEST: Parse the AEST table Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 02/17] ras: AEST: Add probe/remove for AEST driver Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 03/17] ras: AEST: support different group format Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 04/17] ras: AEST: Unify the read/write interface for system and MMIO register Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 05/17] ras: AEST: Probe RAS system architecture version Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 06/17] ras: AEST: Support RAS Common Fault Injection Model Extension Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 07/17] ras: AEST: Support CE threshold of error record Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 08/17] ras: AEST: Enable and register IRQs Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 09/17] ras: AEST: Add cpuhp callback Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 10/17] ras: AEST: Introduce AEST driver sysfs interface Ruidong Tian
2025-12-23  0:57   ` kernel test robot
2025-12-22  9:43 ` [PATCH v4 11/17] ras: AEST: Add error count tracking and debugfs interface Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 12/17] ras: AEST: Allow configuring CE threshold via debugfs Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 13/17] ras: AEST: Introduce AEST inject interface to test AEST driver Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 14/17] ras: ATL: Unified ATL interface for ARM64 and AMD Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 14/17] ras: ATL: Unify " Ruidong Tian
2025-12-22 23:34   ` kernel test robot
2025-12-25  6:58   ` kernel test robot
2025-12-22  9:43 ` [PATCH v4 15/17] ras: AEST: Add framework to process AEST vendor node Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 16/17] ras: AEST: support vendor node CMN700 Ruidong Tian
2025-12-22  9:43 ` [PATCH v4 17/17] trace, ras: add ARM RAS extension trace event Ruidong Tian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).