linux-edac.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/2] Add L1 and L2 error detection for A53, A57 and A72
@ 2025-04-09 23:36 Vijay Balakrishna
  2025-04-09 23:36 ` [PATCH 1/2] drivers/edac: " Vijay Balakrishna
  2025-04-09 23:36 ` [PATCH 2/2] dt-bindings: arm: cpus: Add edac-enabled property Vijay Balakrishna
  0 siblings, 2 replies; 20+ messages in thread
From: Vijay Balakrishna @ 2025-04-09 23:36 UTC (permalink / raw)
  To: Borislav Petkov, Tony Luck
  Cc: James Morse, Mauro Carvalho Chehab, Robert Richter, linux-edac,
	linux-kernel, Tyler Hicks, Marc Zyngier, Sascha Hauer

Hello,

This is an attempt to revive [v5] series. I have attempted to address comments
and suggestions from Marc Zyngier since [v5]. Additionally, I have extended
support for A72 processors. Testing on a problematic A72 SoC has led to the
detection of Correctable Errors (CEs). I am eager to hear your suggestions and
feedback on this series.

Thanks,
Vijay

[v5] https://lore.kernel.org/all/20210401110615.15326-1-s.hauer@pengutronix.de/#t

Changes since v5:
- rebase on v6.15-rc1
- the syndrome registers for CPU/L2 memory errors are cleared only upon
  detecting an error and an isb() after for synchronization (Marc)
- "edac-enabled" hunk moved to initial patch to avoid breaking virtual
  environments (Marc)
- to ensure compatibility across all three families, we are not reporting
  "L1 Dirty RAM," documented only in the A53 TRM
- above prompted changing default CPU L1 error meesage from "unknown"
  to "Unspecified" 
- capturing CPUID/WAY information in L2 memory error log (Marc)
- module license from "GPL v2" to "GPL" (checkpatch.pl warning)
- extend support for A72

Changes since v4:
- Rebase on v5.12-rc5

Changes since v3:
- Add edac-enabled property to make EDAC 3support optional

Changes since v2:
- drop usage of virtual dt node (Robh)
- use read_sysreg_s instead of open coded variant (James Morse)
- separate error retrieving from error reporting
- use smp_call_function_single rather than smp_call_function_single_async
- make driver single instance and register all 'cpu' hierarchy up front once

Changes since v1:
- Split dt-binding into separate patch
- Sort local function variables in reverse-xmas tree order
- drop unnecessary comparison and make variable bool

Sascha Hauer (2):
  drivers/edac: Add L1 and L2 error detection for A53, A57 and A72
  dt-bindings: arm: cpus: Add edac-enabled property

 .../devicetree/bindings/arm/cpus.yaml         |   6 +
 drivers/edac/Kconfig                          |   9 +
 drivers/edac/Makefile                         |   1 +
 drivers/edac/cortex_arm64_l1_l2.c             | 225 ++++++++++++++++++
 4 files changed, 241 insertions(+)
 create mode 100644 drivers/edac/cortex_arm64_l1_l2.c


base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
-- 
2.49.0


^ permalink raw reply	[flat|nested] 20+ messages in thread
* [v8 PATCH 0/2] Add L1 and L2 error detection for A53, A57 and A72
@ 2025-05-05  0:27 Vijay Balakrishna
  2025-05-05  0:27 ` [PATCH 2/2] dt-bindings: arm: cpus: Add edac-enabled property Vijay Balakrishna
  0 siblings, 1 reply; 20+ messages in thread
From: Vijay Balakrishna @ 2025-05-05  0:27 UTC (permalink / raw)
  To: Borislav Petkov, Tony Luck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: James Morse, Mauro Carvalho Chehab, Robert Richter, linux-edac,
	linux-kernel, Tyler Hicks, Marc Zyngier, Sascha Hauer,
	Lorenzo Pieralisi, devicetree, Vijay Balakrishna

Hello,

This is an attempt to revive [v5] series. I have attempted to address comments
and suggestions from Marc Zyngier since [v5]. Additionally, I have extended
support for A72 processors. Testing the driver on a problematic A72 SoC
has led to the detection of Correctable Errors (CEs). Below are logs captured
from the problematic SoC during various boot instances.

[  876.896022] EDAC DEVICE0: CE: cortex-arm64-edac instance: cpu2 block: L1 count: 1 'L1-D Data RAM correctable error(s) on CPU 2'

[ 3700.978086] EDAC DEVICE0: CE: cortex-arm64-edac instance: cpu2 block: L1 count: 1 'L1-D Data RAM correctable error(s) on CPU 2'

[  976.956158] EDAC DEVICE0: CE: cortex-arm64-edac instance: cpu2 block: L1 count: 1 'L1-D Data RAM correctable error(s) on CPU 2'

[ 1427.933606] EDAC DEVICE0: CE: cortex-arm64-edac instance: cpu2 block: L1 count: 1 'L1-D Data RAM correctable error(s) on CPU 2'

[  192.959911] EDAC DEVICE0: CE: cortex-arm64-edac instance: cpu2 block: L1 count: 1 'L1-D Data RAM correctable error(s) on CPU 2'

Our primary focus is on A72. We have a significant number of A72-based systems
in our fleet, and timely replacements via monitoring CEs will be instrumental
in managing them effectively.

I am eager to hear your suggestions and feedback on this series.

Thanks,
Vijay

[v5] https://lore.kernel.org/all/20210401110615.15326-1-s.hauer@pengutronix.de/#t
[v6] https://lore.kernel.org/all/1744241785-20256-1-git-send-email-vijayb@linux.microsoft.com/
[v7] https://lore.kernel.org/all/1744409319-24912-1-git-send-email-vijayb@linux.microsoft.com/#t

Changes since v7: 
- v5 was based on the internal product kernel, identified following upon review
- correct format specifier to print CPUID/WAY
- removal of unused dynamic attributes for edac_device_alloc_ctl_info() 
- driver remove callback return type is void

Changes since v6:
- restore the change made in [v5] to clear CPU/L2 syndrome registers
  back to read_errors()
- upon detecting a valid error, clear syndrome registers immediately
  to avoid clobbering between the read and write (Marc)
- NULL return check for of_get_cpu_node() (Tyler)
- of_node_put() to avoid refcount issue (Tyler)
- quotes are dropped in yaml file (Krzysztof)

Changes since v5:
- rebase on v6.15-rc1
- the syndrome registers for CPU/L2 memory errors are cleared only upon
  detecting an error and an isb() after for synchronization (Marc)
- "edac-enabled" hunk moved to initial patch to avoid breaking virtual
  environments (Marc)
- to ensure compatibility across all three families, we are not reporting
  "L1 Dirty RAM," documented only in the A53 TRM
- above prompted changing default CPU L1 error meesage from "unknown"
  to "Unspecified"
- capturing CPUID/WAY information in L2 memory error log (Marc)
- module license from "GPL v2" to "GPL" (checkpatch.pl warning)
- extend support for A72

Changes since v4:
- Rebase on v5.12-rc5

Changes since v3:
- Add edac-enabled property to make EDAC 3support optional

Changes since v2:
- drop usage of virtual dt node (Robh)
- use read_sysreg_s instead of open coded variant (James Morse)
- separate error retrieving from error reporting
- use smp_call_function_single rather than smp_call_function_single_async
- make driver single instance and register all 'cpu' hierarchy up front once

Changes since v1:
- Split dt-binding into separate patch
- Sort local function variables in reverse-xmas tree order
- drop unnecessary comparison and make variable bool

Sascha Hauer (2):
  drivers/edac: Add L1 and L2 error detection for A53, A57 and A72
  dt-bindings: arm: cpus: Add edac-enabled property

 .../devicetree/bindings/arm/cpus.yaml         |   6 +
 drivers/edac/Kconfig                          |   9 +
 drivers/edac/Makefile                         |   1 +
 drivers/edac/cortex_arm64_l1_l2.c             | 229 ++++++++++++++++++
 4 files changed, 245 insertions(+)
 create mode 100644 drivers/edac/cortex_arm64_l1_l2.c


base-commit: 59c9ab3e8cc7f56cd65608f6e938b5ae96eb9cd2
-- 
2.49.0


^ permalink raw reply	[flat|nested] 20+ messages in thread
* [v7 PATCH 0/2] Add L1 and L2 error detection for A53, A57 and A72
@ 2025-04-11 22:08 Vijay Balakrishna
  2025-04-11 22:08 ` [PATCH 2/2] dt-bindings: arm: cpus: Add edac-enabled property Vijay Balakrishna
  0 siblings, 1 reply; 20+ messages in thread
From: Vijay Balakrishna @ 2025-04-11 22:08 UTC (permalink / raw)
  To: Borislav Petkov, Tony Luck, Rob Herring, Krzysztof Kozlowski,
	Conor Dooley
  Cc: James Morse, Mauro Carvalho Chehab, Robert Richter, linux-edac,
	linux-kernel, Tyler Hicks, Marc Zyngier, Sascha Hauer,
	Lorenzo Pieralisi, devicetree, Vijay Balakrishna

Hello,

This is an attempt to revive [v5] series. I have attempted to address comments
and suggestions from Marc Zyngier since [v5]. Additionally, I have extended
support for A72 processors. Testing on a problematic A72 SoC has led to the
detection of Correctable Errors (CEs). I am eager to hear your suggestions and
feedback on this series.

Thanks,
Vijay

[v5] https://lore.kernel.org/all/20210401110615.15326-1-s.hauer@pengutronix.de/#t
[v6] https://lore.kernel.org/all/1744241785-20256-1-git-send-email-vijayb@linux.microsoft.com/

Changes since v6:
- restore the change made in [v5] to clear CPU/L2 syndrome registers
  back to read_errors() (Tyler)
- upon detecting a valid error, clear syndrome registers immediately
  to avoid clobbering between the read and write (Marc)
- NULL return check for of_get_cpu_node() (Tyler)
- of_node_put() to avoid refcount issue (Tyler)
- quotes are dropped in yaml file (Krzysztof)

Changes since v5:
- rebase on v6.15-rc1
- the syndrome registers for CPU/L2 memory errors are cleared only upon
  detecting an error and an isb() after for synchronization (Marc)
- "edac-enabled" hunk moved to initial patch to avoid breaking virtual
  environments (Marc)
- to ensure compatibility across all three families, we are not reporting
  "L1 Dirty RAM," documented only in the A53 TRM
- above prompted changing default CPU L1 error meesage from "unknown"
  to "Unspecified" 
- capturing CPUID/WAY information in L2 memory error log (Marc)
- module license from "GPL v2" to "GPL" (checkpatch.pl warning)
- extend support for A72

Changes since v4:
- Rebase on v5.12-rc5

Changes since v3:
- Add edac-enabled property to make EDAC 3support optional

Changes since v2:
- drop usage of virtual dt node (Robh)
- use read_sysreg_s instead of open coded variant (James Morse)
- separate error retrieving from error reporting
- use smp_call_function_single rather than smp_call_function_single_async
- make driver single instance and register all 'cpu' hierarchy up front once

Changes since v1:
- Split dt-binding into separate patch
- Sort local function variables in reverse-xmas tree order
- drop unnecessary comparison and make variable bool

Sascha Hauer (2):
  drivers/edac: Add L1 and L2 error detection for A53, A57 and A72
  dt-bindings: arm: cpus: Add edac-enabled property

 .../devicetree/bindings/arm/cpus.yaml         |   6 +
 drivers/edac/Kconfig                          |   9 +
 drivers/edac/Makefile                         |   1 +
 drivers/edac/cortex_arm64_l1_l2.c             | 232 ++++++++++++++++++
 4 files changed, 248 insertions(+)
 create mode 100644 drivers/edac/cortex_arm64_l1_l2.c


base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
-- 
2.49.0


^ permalink raw reply	[flat|nested] 20+ messages in thread
* [PATCH v5 0/2] Add L1 and L2 error detection for A53 and A57
@ 2021-04-01 11:06 Sascha Hauer
  2021-04-01 11:06 ` [PATCH 2/2] dt-bindings: arm: cpus: Add edac-enabled property Sascha Hauer
  0 siblings, 1 reply; 20+ messages in thread
From: Sascha Hauer @ 2021-04-01 11:06 UTC (permalink / raw)
  To: linux-edac
  Cc: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck, James Morse,
	Robert Richter, York Sun, kernel, linux-arm-kernel, Rob Herring,
	Mark Rutland, Marc Zyngier, Sascha Hauer

Hi,

Resending this mainly because Marc Zyngier and Mark Rutland raised
concerns about using implementation defined registers and I forgot to Cc
them with the last version. This version, like v4 already, should fix
these concerns. Looking forward to feedback.

Sascha

Changes since v4:
- Rebase on v5.12-rc5

Changes since v3:
- Add edac-enabled property to make EDAC support optional

Changes since v2:
- drop usage of virtual dt node (Robh)
- use read_sysreg_s instead of open coded variant (James Morse)
- separate error retrieving from error reporting
- use smp_call_function_single rather than smp_call_function_single_async
- make driver single instance and register all 'cpu' hierarchy up front once

Changes since v1:
- Split dt-binding into separate patch
- Sort local function variables in reverse-xmas tree order
- drop unnecessary comparison and make variable bool

Sascha Hauer (2):
  drivers/edac: Add L1 and L2 error detection for A53 and A57
  dt-bindings: arm: cpus: Add edac-enabled property

 .../devicetree/bindings/arm/cpus.yaml         |   6 +
 drivers/edac/Kconfig                          |   6 +
 drivers/edac/Makefile                         |   1 +
 drivers/edac/cortex_arm64_l1_l2.c             | 221 ++++++++++++++++++
 4 files changed, 234 insertions(+)
 create mode 100644 drivers/edac/cortex_arm64_l1_l2.c

-- 
2.29.2


^ permalink raw reply	[flat|nested] 20+ messages in thread
* [PATCH iv4 0/2] Add L1 and L2 error detection for A53 and A57
@ 2021-02-01 11:57 Sascha Hauer
  2021-02-01 11:57 ` [PATCH 2/2] dt-bindings: arm: cpus: Add edac-enabled property Sascha Hauer
  0 siblings, 1 reply; 20+ messages in thread
From: Sascha Hauer @ 2021-02-01 11:57 UTC (permalink / raw)
  To: linux-edac
  Cc: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck, James Morse,
	Robert Richter, York Sun, kernel, linux-arm-kernel, Rob Herring,
	Sascha Hauer

Hi All,

As mentioned by Marc and Mark usage of the implementation defined
registers is not generally safe, they can't be used in virtualized
environments or when EL3 already uses the same registers. This is
probably the last attempt to get this upstream, I added an additional
property to the CPU device nodes to be set explicitly when using these
registers is safe and desired.

Sascha

Changes since v3:
- Add edac-enabled property to make EDAC support optional

Changes since v2:
- drop usage of virtual dt node (Robh)
- use read_sysreg_s instead of open coded variant (James Morse)
- separate error retrieving from error reporting
- use smp_call_function_single rather than smp_call_function_single_async
- make driver single instance and register all 'cpu' hierarchy up front once

Changes since v1:
- Split dt-binding into separate patch
- Sort local function variables in reverse-xmas tree order
- drop unnecessary comparison and make variable bool

Sascha Hauer (2):
  drivers/edac: Add L1 and L2 error detection for A53 and A57
  dt-bindings: arm: cpus: Add edac-enabled property

 .../devicetree/bindings/arm/cpus.yaml         |   6 +
 drivers/edac/Kconfig                          |   6 +
 drivers/edac/Makefile                         |   1 +
 drivers/edac/cortex_arm64_l1_l2.c             | 221 ++++++++++++++++++
 4 files changed, 234 insertions(+)
 create mode 100644 drivers/edac/cortex_arm64_l1_l2.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2025-05-12 19:30 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-09 23:36 [PATCH v6 0/2] Add L1 and L2 error detection for A53, A57 and A72 Vijay Balakrishna
2025-04-09 23:36 ` [PATCH 1/2] drivers/edac: " Vijay Balakrishna
2025-04-10 20:04   ` Tyler Hicks (Microsoft)
2025-04-10 22:27     ` Vijay Balakrishna
2025-04-10 22:29     ` Vijay Balakrishna
2025-04-09 23:36 ` [PATCH 2/2] dt-bindings: arm: cpus: Add edac-enabled property Vijay Balakrishna
2025-04-10  6:00   ` Krzysztof Kozlowski
2025-04-10  7:10     ` Marc Zyngier
2025-04-10 14:30       ` Tyler Hicks (Microsoft)
2025-04-10 16:23         ` Marc Zyngier
2025-04-10 16:42           ` Tyler Hicks (Microsoft)
2025-04-11 20:02           ` Borislav Petkov
2025-04-13 10:38             ` Marc Zyngier
  -- strict thread matches above, loose matches on Subject: below --
2025-05-05  0:27 [v8 PATCH 0/2] Add L1 and L2 error detection for A53, A57 and A72 Vijay Balakrishna
2025-05-05  0:27 ` [PATCH 2/2] dt-bindings: arm: cpus: Add edac-enabled property Vijay Balakrishna
2025-05-12 19:30   ` Rob Herring
2025-04-11 22:08 [v7 PATCH 0/2] Add L1 and L2 error detection for A53, A57 and A72 Vijay Balakrishna
2025-04-11 22:08 ` [PATCH 2/2] dt-bindings: arm: cpus: Add edac-enabled property Vijay Balakrishna
2021-04-01 11:06 [PATCH v5 0/2] Add L1 and L2 error detection for A53 and A57 Sascha Hauer
2021-04-01 11:06 ` [PATCH 2/2] dt-bindings: arm: cpus: Add edac-enabled property Sascha Hauer
2021-04-01 15:37   ` Marc Zyngier
2021-02-01 11:57 [PATCH iv4 0/2] Add L1 and L2 error detection for A53 and A57 Sascha Hauer
2021-02-01 11:57 ` [PATCH 2/2] dt-bindings: arm: cpus: Add edac-enabled property Sascha Hauer
2021-02-01 12:00   ` Sascha Hauer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).