From: Vijay Balakrishna <vijayb@linux.microsoft.com>
To: Borislav Petkov <bp@alien8.de>, Tony Luck <tony.luck@intel.com>,
Rob Herring <robh@kernel.org>,
Krzysztof Kozlowski <krzk+dt@kernel.org>,
Conor Dooley <conor+dt@kernel.org>
Cc: James Morse <james.morse@arm.com>,
Mauro Carvalho Chehab <mchehab@kernel.org>,
Robert Richter <rric@kernel.org>,
linux-edac@vger.kernel.org, linux-kernel@vger.kernel.org,
Tyler Hicks <code@tyhicks.com>, Marc Zyngier <maz@kernel.org>,
Sascha Hauer <s.hauer@pengutronix.de>,
Lorenzo Pieralisi <lpieralisi@kernel.org>,
devicetree@vger.kernel.org,
Vijay Balakrishna <vijayb@linux.microsoft.com>
Subject: [v9 PATCH 0/3] Add L1 and L2 error detection for A72
Date: Thu, 15 May 2025 17:06:10 -0700 [thread overview]
Message-ID: <1747353973-4749-1-git-send-email-vijayb@linux.microsoft.com> (raw)
This is an attempt to revive [v5] series. I have attempted to address comments
and suggestions from Marc Zyngier since [v5]. Additionally, I have limited
the support only for A72 processors per [v8] discussion. Testing the driver
on a problematic A72 SoC has led to the detection of Correctable Errors (CEs).
Below are logs captured from the problematic SoC during various boot instances.
[ 876.896022] EDAC DEVICE0: CE: cortex-arm64-edac instance: cpu2 block: L1 count: 1 'L1-D Data RAM correctable error(s) on CPU 2'
[ 3700.978086] EDAC DEVICE0: CE: cortex-arm64-edac instance: cpu2 block: L1 count: 1 'L1-D Data RAM correctable error(s) on CPU 2'
[ 976.956158] EDAC DEVICE0: CE: cortex-arm64-edac instance: cpu2 block: L1 count: 1 'L1-D Data RAM correctable error(s) on CPU 2'
[ 1427.933606] EDAC DEVICE0: CE: cortex-arm64-edac instance: cpu2 block: L1 count: 1 'L1-D Data RAM correctable error(s) on CPU 2'
[ 192.959911] EDAC DEVICE0: CE: cortex-arm64-edac instance: cpu2 block: L1 count: 1 'L1-D Data RAM correctable error(s) on CPU 2'
Testing our product kernel involved adding the 'edac-enabled' property to CPU
nodes in the DTS. For mainline sanity checks, we tested under QEMU by
extracting the default DTB and modifying the DTS to include the 'edac-enabled'
property. We then verified the presence of /sysfs nodes for CE and UE counts
for the emulated A72 CPUs.
Our primary focus is on A72. We have a significant number of A72-based systems
in our fleet, and timely replacements via monitoring CEs will be instrumental
in managing them effectively.
I am eager to hear your suggestions and feedback on this series.
Thanks,
Vijay
[v5] https://lore.kernel.org/all/20210401110615.15326-1-s.hauer@pengutronix.de/#t
[v6] https://lore.kernel.org/all/1744241785-20256-1-git-send-email-vijayb@linux.microsoft.com/
[v7] https://lore.kernel.org/all/1744409319-24912-1-git-send-email-vijayb@linux.microsoft.com/#t
[v8] https://lore.kernel.org/all/1746404860-27069-1-git-send-email-vijayb@linux.microsoft.com/
Changes since v8:
- removed support for A53 and A57
- added entry to MAINTAINERS
- added missing module exit point to enable unload
Changes since v7:
- v5 was based on the internal product kernel, identified following upon review
- correct format specifier to print CPUID/WAY
- removal of unused dynamic attributes for edac_device_alloc_ctl_info()
- driver remove callback return type is void
Changes since v6:
- restore the change made in [v5] to clear CPU/L2 syndrome registers
back to read_errors()
- upon detecting a valid error, clear syndrome registers immediately
to avoid clobbering between the read and write (Marc)
- NULL return check for of_get_cpu_node() (Tyler)
- of_node_put() to avoid refcount issue (Tyler)
- quotes are dropped in yaml file (Krzysztof)
Changes since v5:
- rebase on v6.15-rc1
- the syndrome registers for CPU/L2 memory errors are cleared only upon
detecting an error and an isb() after for synchronization (Marc)
- "edac-enabled" hunk moved to initial patch to avoid breaking virtual
environments (Marc)
- to ensure compatibility across all three families, we are not reporting
"L1 Dirty RAM," documented only in the A53 TRM
- above prompted changing default CPU L1 error meesage from "unknown"
to "Unspecified"
- capturing CPUID/WAY information in L2 memory error log (Marc)
- module license from "GPL v2" to "GPL" (checkpatch.pl warning)
- extend support for A72
Sascha Hauer (2):
drivers/edac: Add L1 and L2 error detection for A72
dt-bindings: arm: cpus: Add edac-enabled property
Vijay Balakrishna (1):
EDAC: Add EDAC driver for Cortex A72
.../devicetree/bindings/arm/cpus.yaml | 6 +
MAINTAINERS | 7 +
drivers/edac/Kconfig | 8 +
drivers/edac/Makefile | 1 +
drivers/edac/edac_a72.c | 233 ++++++++++++++++++
5 files changed, 255 insertions(+)
create mode 100644 drivers/edac/edac_a72.c
base-commit: fee3e843b309444f48157e2188efa6818bae85cf
--
2.49.0
next reply other threads:[~2025-05-16 0:06 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-16 0:06 Vijay Balakrishna [this message]
2025-05-16 0:06 ` [PATCH 1/3] drivers/edac: Add L1 and L2 error detection for A72 Vijay Balakrishna
2025-05-19 8:51 ` Borislav Petkov
2025-05-20 16:09 ` Vijay Balakrishna
2025-05-20 19:54 ` Vijay Balakrishna
2025-05-20 20:04 ` Borislav Petkov
2025-05-20 23:20 ` Vijay Balakrishna
2025-05-20 9:35 ` Jonathan Cameron
2025-05-23 18:26 ` Vijay Balakrishna
2025-05-16 0:06 ` [PATCH 2/3] dt-bindings: arm: cpus: Add edac-enabled property Vijay Balakrishna
2025-05-19 9:02 ` Borislav Petkov
2025-05-21 0:07 ` Rob Herring
2025-05-21 5:56 ` Vijay Balakrishna
2025-05-16 0:06 ` [PATCH 3/3] EDAC: Add EDAC driver for Cortex A72 Vijay Balakrishna
2025-05-19 9:04 ` Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1747353973-4749-1-git-send-email-vijayb@linux.microsoft.com \
--to=vijayb@linux.microsoft.com \
--cc=bp@alien8.de \
--cc=code@tyhicks.com \
--cc=conor+dt@kernel.org \
--cc=devicetree@vger.kernel.org \
--cc=james.morse@arm.com \
--cc=krzk+dt@kernel.org \
--cc=linux-edac@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lpieralisi@kernel.org \
--cc=maz@kernel.org \
--cc=mchehab@kernel.org \
--cc=robh@kernel.org \
--cc=rric@kernel.org \
--cc=s.hauer@pengutronix.de \
--cc=tony.luck@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).