* [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics
@ 2025-02-13 15:11 Yangyu Chen
2025-02-13 15:12 ` [PATCH 1/2] perf vendor events arm64: Add Cortex-A720 events/metrics Yangyu Chen
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Yangyu Chen @ 2025-02-13 15:11 UTC (permalink / raw)
To: linux-perf-users
Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Liang Kan, Yoshihiro Furudera,
linux-arm-kernel, linux-kernel, Yangyu Chen
This patchset adds the perf JSON files for the Cortex-A720 and Cortex-A520
processors. Some events have been tested on Raxda Orion 6 with Cix P1 SoC
(8xA720 + 4xA520) running mainline Kernel with ACPI mode.
Yangyu Chen (2):
perf vendor events arm64: Add Cortex-A720 events/metrics
perf vendor events arm64: Add Cortex-A520 events/metrics
.../arch/arm64/arm/cortex-a520/bus.json | 26 ++
.../arch/arm64/arm/cortex-a520/exception.json | 18 +
.../arm64/arm/cortex-a520/fp_operation.json | 14 +
.../arch/arm64/arm/cortex-a520/general.json | 6 +
.../arch/arm64/arm/cortex-a520/l1d_cache.json | 50 ++
.../arch/arm64/arm/cortex-a520/l1i_cache.json | 14 +
.../arch/arm64/arm/cortex-a520/l2_cache.json | 46 ++
.../arch/arm64/arm/cortex-a520/l3_cache.json | 21 +
.../arch/arm64/arm/cortex-a520/ll_cache.json | 10 +
.../arch/arm64/arm/cortex-a520/memory.json | 58 +++
.../arch/arm64/arm/cortex-a520/metrics.json | 373 +++++++++++++++
.../arch/arm64/arm/cortex-a520/pmu.json | 8 +
.../arch/arm64/arm/cortex-a520/retired.json | 90 ++++
.../arm64/arm/cortex-a520/spec_operation.json | 70 +++
.../arch/arm64/arm/cortex-a520/stall.json | 82 ++++
.../arch/arm64/arm/cortex-a520/sve.json | 22 +
.../arch/arm64/arm/cortex-a520/tlb.json | 78 ++++
.../arch/arm64/arm/cortex-a520/trace.json | 32 ++
.../arch/arm64/arm/cortex-a720/bus.json | 18 +
.../arch/arm64/arm/cortex-a720/exception.json | 62 +++
.../arm64/arm/cortex-a720/fp_operation.json | 22 +
.../arch/arm64/arm/cortex-a720/general.json | 10 +
.../arch/arm64/arm/cortex-a720/l1d_cache.json | 50 ++
.../arch/arm64/arm/cortex-a720/l1i_cache.json | 14 +
.../arch/arm64/arm/cortex-a720/l2_cache.json | 62 +++
.../arch/arm64/arm/cortex-a720/l3_cache.json | 22 +
.../arch/arm64/arm/cortex-a720/ll_cache.json | 10 +
.../arch/arm64/arm/cortex-a720/memory.json | 54 +++
.../arch/arm64/arm/cortex-a720/metrics.json | 436 ++++++++++++++++++
.../arch/arm64/arm/cortex-a720/pmu.json | 8 +
.../arch/arm64/arm/cortex-a720/retired.json | 90 ++++
.../arch/arm64/arm/cortex-a720/spe.json | 42 ++
.../arm64/arm/cortex-a720/spec_operation.json | 90 ++++
.../arch/arm64/arm/cortex-a720/stall.json | 82 ++++
.../arch/arm64/arm/cortex-a720/sve.json | 50 ++
.../arch/arm64/arm/cortex-a720/tlb.json | 74 +++
.../arch/arm64/arm/cortex-a720/trace.json | 32 ++
.../arch/arm64/common-and-microarch.json | 15 +
tools/perf/pmu-events/arch/arm64/mapfile.csv | 2 +
39 files changed, 2263 insertions(+)
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json
--
2.47.2
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/2] perf vendor events arm64: Add Cortex-A720 events/metrics
2025-02-13 15:11 [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics Yangyu Chen
@ 2025-02-13 15:12 ` Yangyu Chen
2025-02-13 16:49 ` Ian Rogers
2025-02-13 15:12 ` [PATCH 2/2] perf vendor events arm64: Add Cortex-A520 events/metrics Yangyu Chen
2025-02-14 1:12 ` [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics Namhyung Kim
2 siblings, 1 reply; 16+ messages in thread
From: Yangyu Chen @ 2025-02-13 15:12 UTC (permalink / raw)
To: linux-perf-users
Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Liang Kan, Yoshihiro Furudera,
linux-arm-kernel, linux-kernel, Yangyu Chen
Add JSON files for Cortex-A720 events and metrics. Using the existing
Neoverse N3 JSON files as a template, I manually checked the missing and
extra events/metrics using my script [1] and modified them according to
the Arm Cortex-A720 Core Technical Reference Manual [2].
[1] https://github.com/cyyself/arm-pmu-check/tree/1075bebeb3f1441067448251a387df35af15bf16
[2] https://developer.arm.com/documentation/102530/0002/Performance-Monitors-Extension-support-/Performance-monitors-events
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
---
.../arch/arm64/arm/cortex-a720/bus.json | 18 +
.../arch/arm64/arm/cortex-a720/exception.json | 62 +++
.../arm64/arm/cortex-a720/fp_operation.json | 22 +
.../arch/arm64/arm/cortex-a720/general.json | 10 +
.../arch/arm64/arm/cortex-a720/l1d_cache.json | 50 ++
.../arch/arm64/arm/cortex-a720/l1i_cache.json | 14 +
.../arch/arm64/arm/cortex-a720/l2_cache.json | 62 +++
.../arch/arm64/arm/cortex-a720/l3_cache.json | 22 +
.../arch/arm64/arm/cortex-a720/ll_cache.json | 10 +
.../arch/arm64/arm/cortex-a720/memory.json | 54 +++
.../arch/arm64/arm/cortex-a720/metrics.json | 436 ++++++++++++++++++
.../arch/arm64/arm/cortex-a720/pmu.json | 8 +
.../arch/arm64/arm/cortex-a720/retired.json | 90 ++++
.../arch/arm64/arm/cortex-a720/spe.json | 42 ++
.../arm64/arm/cortex-a720/spec_operation.json | 90 ++++
.../arch/arm64/arm/cortex-a720/stall.json | 82 ++++
.../arch/arm64/arm/cortex-a720/sve.json | 50 ++
.../arch/arm64/arm/cortex-a720/tlb.json | 74 +++
.../arch/arm64/arm/cortex-a720/trace.json | 32 ++
tools/perf/pmu-events/arch/arm64/mapfile.csv | 1 +
20 files changed, 1229 insertions(+)
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json
new file mode 100644
index 000000000000..2e11a8c4a484
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json
@@ -0,0 +1,18 @@
+[
+ {
+ "ArchStdEvent": "BUS_ACCESS",
+ "PublicDescription": "Counts memory transactions issued by the CPU to the external bus, including snoop requests and snoop responses. Each beat of data is counted individually."
+ },
+ {
+ "ArchStdEvent": "BUS_CYCLES",
+ "PublicDescription": "Counts bus cycles in the CPU. Bus cycles represent a clock cycle in which a transaction could be sent or received on the interface from the CPU to the external bus. Since that interface is driven at the same clock speed as the CPU, this event is a duplicate of CPU_CYCLES."
+ },
+ {
+ "ArchStdEvent": "BUS_ACCESS_RD",
+ "PublicDescription": "Counts memory read transactions seen on the external bus. Each beat of data is counted individually."
+ },
+ {
+ "ArchStdEvent": "BUS_ACCESS_WR",
+ "PublicDescription": "Counts memory write transactions seen on the external bus. Each beat of data is counted individually."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json
new file mode 100644
index 000000000000..7126fbf292e0
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json
@@ -0,0 +1,62 @@
+[
+ {
+ "ArchStdEvent": "EXC_TAKEN",
+ "PublicDescription": "Counts any taken architecturally visible exceptions such as IRQ, FIQ, SError, and other synchronous exceptions. Exceptions are counted whether or not they are taken locally."
+ },
+ {
+ "ArchStdEvent": "EXC_RETURN",
+ "PublicDescription": "Counts any architecturally executed exception return instructions. For example: AArch64: ERET"
+ },
+ {
+ "ArchStdEvent": "EXC_UNDEF",
+ "PublicDescription": "Counts the number of synchronous exceptions which are taken locally that are due to attempting to execute an instruction that is UNDEFINED. Attempting to execute instruction bit patterns that have not been allocated. Attempting to execute instructions when they are disabled. Attempting to execute instructions at an inappropriate Exception level. Attempting to execute an instruction when the value of PSTATE.IL is 1."
+ },
+ {
+ "ArchStdEvent": "EXC_SVC",
+ "PublicDescription": "Counts SVC exceptions taken locally."
+ },
+ {
+ "ArchStdEvent": "EXC_PABORT",
+ "PublicDescription": "Counts synchronous exceptions that are taken locally and caused by Instruction Aborts."
+ },
+ {
+ "ArchStdEvent": "EXC_DABORT",
+ "PublicDescription": "Counts exceptions that are taken locally and are caused by data aborts or SErrors. Conditions that could cause those exceptions are attempting to read or write memory where the MMU generates a fault, attempting to read or write memory with a misaligned address, interrupts from the nSEI inputs and internally generated SErrors."
+ },
+ {
+ "ArchStdEvent": "EXC_IRQ",
+ "PublicDescription": "Counts IRQ exceptions including the virtual IRQs that are taken locally."
+ },
+ {
+ "ArchStdEvent": "EXC_FIQ",
+ "PublicDescription": "Counts FIQ exceptions including the virtual FIQs that are taken locally."
+ },
+ {
+ "ArchStdEvent": "EXC_SMC",
+ "PublicDescription": "Counts SMC exceptions take to EL3."
+ },
+ {
+ "ArchStdEvent": "EXC_HVC",
+ "PublicDescription": "Counts HVC exceptions taken to EL2."
+ },
+ {
+ "ArchStdEvent": "EXC_TRAP_PABORT",
+ "PublicDescription": "Counts exceptions which are traps not taken locally and are caused by Instruction Aborts. For example, attempting to execute an instruction with a misaligned PC."
+ },
+ {
+ "ArchStdEvent": "EXC_TRAP_DABORT",
+ "PublicDescription": "Counts exceptions which are traps not taken locally and are caused by Data Aborts or SError interrupts. Conditions that could cause those exceptions are:\n\n1. Attempting to read or write memory where the MMU generates a fault,\n2. Attempting to read or write memory with a misaligned address,\n3. Interrupts from the SEI input.\n4. internally generated SErrors."
+ },
+ {
+ "ArchStdEvent": "EXC_TRAP_OTHER",
+ "PublicDescription": "Counts the number of synchronous trap exceptions which are not taken locally and are not SVC, SMC, HVC, data aborts, Instruction Aborts, or interrupts."
+ },
+ {
+ "ArchStdEvent": "EXC_TRAP_IRQ",
+ "PublicDescription": "Counts IRQ exceptions including the virtual IRQs that are not taken locally."
+ },
+ {
+ "ArchStdEvent": "EXC_TRAP_FIQ",
+ "PublicDescription": "Counts FIQs which are not taken locally but taken from EL0, EL1,\n or EL2 to EL3 (which would be the normal behavior for FIQs when not executing\n in EL3)."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json
new file mode 100644
index 000000000000..cec3435ac766
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json
@@ -0,0 +1,22 @@
+[
+ {
+ "ArchStdEvent": "FP_HP_SPEC",
+ "PublicDescription": "Counts speculatively executed half precision floating point operations."
+ },
+ {
+ "ArchStdEvent": "FP_SP_SPEC",
+ "PublicDescription": "Counts speculatively executed single precision floating point operations."
+ },
+ {
+ "ArchStdEvent": "FP_DP_SPEC",
+ "PublicDescription": "Counts speculatively executed double precision floating point operations."
+ },
+ {
+ "ArchStdEvent": "FP_SCALE_OPS_SPEC",
+ "PublicDescription": "Counts speculatively executed scalable single precision floating point operations."
+ },
+ {
+ "ArchStdEvent": "FP_FIXED_OPS_SPEC",
+ "PublicDescription": "Counts speculatively executed non-scalable single precision floating point operations."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json
new file mode 100644
index 000000000000..c5dcdcf43c58
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json
@@ -0,0 +1,10 @@
+[
+ {
+ "ArchStdEvent": "CPU_CYCLES",
+ "PublicDescription": "Counts CPU clock cycles (not timer cycles). The clock measured by this event is defined as the physical clock driving the CPU logic."
+ },
+ {
+ "ArchStdEvent": "CNT_CYCLES",
+ "PublicDescription": "Increments at a constant frequency equal to the rate of increment of the System Counter, CNTPCT_EL0."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json
new file mode 100644
index 000000000000..a6fee569f4c6
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json
@@ -0,0 +1,50 @@
+[
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL",
+ "PublicDescription": "Counts level 1 data cache refills caused by speculatively executed load or store operations that missed in the level 1 data cache. This event only counts one event per cache line."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE",
+ "PublicDescription": "Counts level 1 data cache accesses from any load/store operations. Atomic operations that resolve in the CPUs caches (near atomic operations) counts as both a write access and read access. Each access to a cache line is counted including the multiple accesses caused by single instructions such as LDM or STM. Each access to other level 1 data or unified memory structures, for example refill buffers, write buffers, and write-back buffers, are also counted."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_WB",
+ "PublicDescription": "Counts write-backs of dirty data from the L1 data cache to the L2 cache. This occurs when either a dirty cache line is evicted from L1 data cache and allocated in the L2 cache or dirty data is written to the L2 and possibly to the next level of cache. This event counts both victim cache line evictions and cache write-backs from snoops or cache maintenance operations. The following cache operations are not counted:\n\n1. Invalidations which do not result in data being transferred out of the L1 (such as evictions of clean data),\n2. Full line writes which write to L2 without writing L1, such as write streaming mode."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_LMISS_RD",
+ "PublicDescription": "Counts cache line refills into the level 1 data cache from any memory read operations, that incurred additional latency."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_RD",
+ "PublicDescription": "Counts level 1 data cache accesses from any load operation. Atomic load operations that resolve in the CPUs caches counts as both a write access and read access."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_WR",
+ "PublicDescription": "Counts level 1 data cache accesses generated by store operations. This event also counts accesses caused by a DC ZVA (data cache zero, specified by virtual address) instruction. Near atomic operations that resolve in the CPUs caches count as a write access and read access."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL_INNER",
+ "PublicDescription": "Counts level 1 data cache refills where the cache line data came from caches inside the immediate cluster of the core."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL_OUTER",
+ "PublicDescription": "Counts level 1 data cache refills for which the cache line data came from outside the immediate cluster of the core, like an SLC in the system interconnect or DRAM."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_INVAL",
+ "PublicDescription": "Counts each explicit invalidation of a cache line in the level 1 data cache caused by:\n\n- Cache Maintenance Operations (CMO) that operate by a virtual address.\n- Broadcast cache coherency operations from another CPU in the system.\n\nThis event does not count for the following conditions:\n\n1. A cache refill invalidates a cache line.\n2. A CMO which is executed on that CPU and invalidates a cache line specified by set/way.\n\nNote that CMOs that operate by set/way cannot be broadcast from one CPU to another."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_RW",
+ "PublicDescription": "Counts level 1 data demand cache accesses from any load or store operation. Near atomic operations that resolve in the CPUs caches counts as both a write access and read access."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_PRF",
+ "BriefDescription": "This event counts fetch counted by either Level 1 data hardware prefetch or Level 1 data software prefetch."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL_PRF",
+ "BriefDescription": "This event counts hardware prefetch counted by L1D_CACHE_PRF that causes a refill of the Level 1 data cache from outside of the Level 1 data cache."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json
new file mode 100644
index 000000000000..633f1030359d
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json
@@ -0,0 +1,14 @@
+[
+ {
+ "ArchStdEvent": "L1I_CACHE_REFILL",
+ "PublicDescription": "Counts cache line refills in the level 1 instruction cache caused by a missed instruction fetch. Instruction fetches may include accessing multiple instructions, but the single cache line allocation is counted once."
+ },
+ {
+ "ArchStdEvent": "L1I_CACHE",
+ "PublicDescription": "Counts instruction fetches which access the level 1 instruction cache. Instruction cache accesses caused by cache maintenance operations are not counted."
+ },
+ {
+ "ArchStdEvent": "L1I_CACHE_LMISS",
+ "PublicDescription": "Counts cache line refills into the level 1 instruction cache, that incurred additional latency."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json
new file mode 100644
index 000000000000..3806fef42b30
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json
@@ -0,0 +1,62 @@
+[
+ {
+ "ArchStdEvent": "L2D_CACHE",
+ "PublicDescription": "Counts accesses to the level 2 cache due to data accesses. Level 2 cache is a unified cache for data and instruction accesses. Accesses are for misses in the first level data cache or translation resolutions due to accesses. This event also counts write back of dirty data from level 1 data cache to the L2 cache."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_REFILL",
+ "PublicDescription": "Counts cache line refills into the level 2 cache. Level 2 cache is a unified cache for data and instruction accesses. Accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_WB",
+ "PublicDescription": "Counts write-backs of data from the L2 cache to outside the CPU. This includes snoops to the L2 (from other CPUs) which return data even if the snoops cause an invalidation. L2 cache line invalidations which do not write data outside the CPU and snoops which return data from an L1 cache are not counted. Data would not be written outside the cache when invalidating a clean cache line."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_ALLOCATE",
+ "PublicDescription": "Counts level 2 cache line allocates that do not fetch data from outside the level 2 data or unified cache."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_RD",
+ "PublicDescription": "Counts level 2 data cache accesses due to memory read operations. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_WR",
+ "PublicDescription": "Counts level 2 cache accesses due to memory write operations. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_REFILL_RD",
+ "PublicDescription": "Counts refills for memory accesses due to memory read operation counted by L2D_CACHE_RD. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_REFILL_WR",
+ "PublicDescription": "Counts refills for memory accesses due to memory write operation counted by L2D_CACHE_WR. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_WB_VICTIM",
+ "PublicDescription": "Counts evictions from the level 2 cache because of a line being allocated into the L2 cache."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_WB_CLEAN",
+ "PublicDescription": "Counts write-backs from the level 2 cache that are a result of either:\n\n1. Cache maintenance operations,\n\n2. Snoop responses or,\n\n3. Direct cache transfers to another CPU due to a forwarding snoop request."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_INVAL",
+ "PublicDescription": "Counts each explicit invalidation of a cache line in the level 2 cache by cache maintenance operations that operate by a virtual address, or by external coherency operations. This event does not count if either:\n\n1. A cache refill invalidates a cache line or,\n2. A Cache Maintenance Operation (CMO), which invalidates a cache line specified by set/way, is executed on that CPU.\n\nCMOs that operate by set/way cannot be broadcast from one CPU to another."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_LMISS_RD",
+ "PublicDescription": "Counts cache line refills into the level 2 unified cache from any memory read operations that incurred additional latency."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_RW",
+ "PublicDescription": "Counts level 2 cache demand accesses from any load/store operations. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_PRF",
+ "PublicDescription": "Counts level 2 data cache accesses from software preload or prefetch instructions or hardware prefetcher."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_REFILL_PRF",
+ "PublicDescription": "Counts refills due to accesses generated as a result of prefetches."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json
new file mode 100644
index 000000000000..4a2e72fc5ada
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json
@@ -0,0 +1,22 @@
+[
+ {
+ "ArchStdEvent": "L3D_CACHE_ALLOCATE",
+ "PublicDescription": "Counts level 3 cache line allocates that do not fetch data from outside the level 3 data or unified cache. For example, allocates due to streaming stores."
+ },
+ {
+ "ArchStdEvent": "L3D_CACHE_REFILL",
+ "PublicDescription": "Counts level 3 accesses that receive data from outside the L3 cache."
+ },
+ {
+ "ArchStdEvent": "L3D_CACHE",
+ "PublicDescription": "Counts level 3 cache accesses. Level 3 cache is a unified cache for data and instruction accesses. Accesses are for misses in the lower level caches or translation resolutions due to accesses."
+ },
+ {
+ "ArchStdEvent": "L3D_CACHE_RD",
+ "PublicDescription": "Counts level 3 cache accesses caused by any memory read operation. Level 3 cache is a unified cache for data and instruction accesses. Accesses are for misses in the lower level caches or translation resolutions due to accesses."
+ },
+ {
+ "ArchStdEvent": "L3D_CACHE_LMISS_RD",
+ "PublicDescription": "Counts any cache line refill into the level 3 cache from memory read operations that incurred additional latency."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json
new file mode 100644
index 000000000000..fd5a2e0099b8
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json
@@ -0,0 +1,10 @@
+[
+ {
+ "ArchStdEvent": "LL_CACHE_RD",
+ "PublicDescription": "Counts read transactions that were returned from outside the core cluster. This event counts for external last level cache when the system register CPUECTLR.EXTLLC bit is set, otherwise it counts for the L3 cache. This event counts read transactions returned from outside the core if those transactions are either hit in the system level cache or missed in the SLC and are returned from any other external sources."
+ },
+ {
+ "ArchStdEvent": "LL_CACHE_MISS_RD",
+ "PublicDescription": "Counts read transactions that were returned from outside the core cluster but missed in the system level cache. This event counts for external last level cache when the system register CPUECTLR.EXTLLC bit is set, otherwise it counts for L3 cache. This event counts read transactions returned from outside the core if those transactions are missed in the System level Cache. The data source of the transaction is indicated by a field in the CHI transaction returning to the CPU. This event does not count reads caused by cache maintenance operations."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json
new file mode 100644
index 000000000000..f19204a5faae
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json
@@ -0,0 +1,54 @@
+[
+ {
+ "ArchStdEvent": "MEM_ACCESS",
+ "PublicDescription": "Counts memory accesses issued by the CPU load store unit, where those accesses are issued due to load or store operations. This event counts memory accesses no matter whether the data is received from any level of cache hierarchy or external memory. If memory accesses are broken up into smaller transactions than what were specified in the load or store instructions, then the event counts those smaller memory transactions."
+ },
+ {
+ "ArchStdEvent": "REMOTE_ACCESS",
+ "PublicDescription": "Counts accesses to another chip, which is implemented as a different CMN mesh in the system. If the CHI bus response back to the core indicates that the data source is from another chip (mesh), then the counter is updated. If no data is returned, even if the system snoops another chip/mesh, then the counter is not updated."
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_RD",
+ "PublicDescription": "Counts memory accesses issued by the CPU due to load operations. The event counts any memory load access, no matter whether the data is received from any level of cache hierarchy or external memory. The event also counts atomic load operations. If memory accesses are broken up by the load/store unit into smaller transactions that are issued by the bus interface, then the event counts those smaller transactions."
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_WR",
+ "PublicDescription": "Counts memory accesses issued by the CPU due to store operations. The event counts any memory store access, no matter whether the data is located in any level of cache or external memory. The event also counts atomic load and store operations. If memory accesses are broken up by the load/store unit into smaller transactions that are issued by the bus interface, then the event counts those smaller transactions."
+ },
+ {
+ "ArchStdEvent": "LDST_ALIGN_LAT",
+ "PublicDescription": "Counts the number of memory read and write accesses in a cycle that incurred additional latency, due to the alignment of the address and the size of data being accessed, which results in store crossing a single cache line."
+ },
+ {
+ "ArchStdEvent": "LD_ALIGN_LAT",
+ "PublicDescription": "Counts the number of memory read accesses in a cycle that incurred additional latency, due to the alignment of the address and size of data being accessed, which results in load crossing a single cache line."
+ },
+ {
+ "ArchStdEvent": "ST_ALIGN_LAT",
+ "PublicDescription": "Counts the number of memory write access in a cycle that incurred additional latency, due to the alignment of the address and size of data being accessed incurred additional latency."
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_CHECKED",
+ "PublicDescription": "Counts the number of memory read and write accesses counted by MEM_ACCESS that are tag checked by the Memory Tagging Extension (MTE). This event is implemented as the sum of MEM_ACCESS_CHECKED_RD and MEM_ACCESS_CHECKED_WR"
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_CHECKED_RD",
+ "PublicDescription": "Counts the number of memory read accesses in a cycle that are tag checked by the Memory Tagging Extension (MTE)."
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_CHECKED_WR",
+ "PublicDescription": "Counts the number of memory write accesses in a cycle that is tag checked by the Memory Tagging Extension (MTE)."
+ },
+ {
+ "ArchStdEvent": "INST_FETCH_PERCYC",
+ "PublicDescription": "Counts number of instruction fetches outstanding per cycle, which will provide an average latency of instruction fetch."
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_RD_PERCYC",
+ "PublicDescription": "Counts the number of outstanding loads or memory read accesses per cycle."
+ },
+ {
+ "ArchStdEvent": "INST_FETCH",
+ "PublicDescription": "Counts Instruction memory accesses that the PE makes."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json
new file mode 100644
index 000000000000..d8e8b5155cfa
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json
@@ -0,0 +1,436 @@
+[
+ {
+ "ArchStdEvent": "backend_bound"
+ },
+ {
+ "MetricName": "backend_busy_bound",
+ "MetricExpr": "STALL_BACKEND_BUSY / STALL_BACKEND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to issue queues being full to accept operations for execution.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_cache_l1d_bound",
+ "MetricExpr": "STALL_BACKEND_L1D / (STALL_BACKEND_L1D + STALL_BACKEND_MEM) * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory access latency issues caused by level 1 data cache misses.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_cache_l2d_bound",
+ "MetricExpr": "STALL_BACKEND_MEM / (STALL_BACKEND_L1D + STALL_BACKEND_MEM) * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory access latency issues caused by level 2 data cache misses.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_core_bound",
+ "MetricExpr": "STALL_BACKEND_CPUBOUND / STALL_BACKEND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to backend core resource constraints not related to instruction fetch latency issues caused by memory access components.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_core_rename_bound",
+ "MetricExpr": "STALL_BACKEND_RENAME / STALL_BACKEND_CPUBOUND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the backend as the rename unit registers are unavailable.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_mem_bound",
+ "MetricExpr": "STALL_BACKEND_MEMBOUND / STALL_BACKEND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to backend core resource constraints related to memory access latency issues caused by memory access components.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_mem_cache_bound",
+ "MetricExpr": "(STALL_BACKEND_L1D + STALL_BACKEND_MEM) / STALL_BACKEND_MEMBOUND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory latency issues caused by data cache misses.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_mem_store_bound",
+ "MetricExpr": "STALL_BACKEND_ST / STALL_BACKEND_MEMBOUND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to memory write pending caused by stores stalled in the pre-commit stage.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_mem_tlb_bound",
+ "MetricExpr": "STALL_BACKEND_TLB / STALL_BACKEND_MEMBOUND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory access latency issues caused by data TLB misses.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_stalled_cycles",
+ "MetricExpr": "STALL_BACKEND / CPU_CYCLES * 100",
+ "BriefDescription": "This metric is the percentage of cycles that were stalled due to resource constraints in the backend unit of the processor.",
+ "MetricGroup": "Cycle_Accounting",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "ArchStdEvent": "bad_speculation",
+ "MetricExpr": "(1 - STALL_SLOT / (10 * CPU_CYCLES)) * (1 - OP_RETIRED / OP_SPEC) * 100 + STALL_FRONTEND_FLUSH / CPU_CYCLES * 100"
+ },
+ {
+ "MetricName": "barrier_percentage",
+ "MetricExpr": "(ISB_SPEC + DSB_SPEC + DMB_SPEC) / INST_SPEC * 100",
+ "BriefDescription": "This metric measures instruction and data barrier operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "branch_direct_ratio",
+ "MetricExpr": "BR_IMMED_RETIRED / BR_RETIRED",
+ "BriefDescription": "This metric measures the ratio of direct branches retired to the total number of branches architecturally executed.",
+ "MetricGroup": "Branch_Effectiveness",
+ "ScaleUnit": "1per branch"
+ },
+ {
+ "MetricName": "branch_indirect_ratio",
+ "MetricExpr": "BR_IND_RETIRED / BR_RETIRED",
+ "BriefDescription": "This metric measures the ratio of indirect branches retired, including function returns, to the total number of branches architecturally executed.",
+ "MetricGroup": "Branch_Effectiveness",
+ "ScaleUnit": "1per branch"
+ },
+ {
+ "MetricName": "branch_misprediction_ratio",
+ "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED",
+ "BriefDescription": "This metric measures the ratio of branches mispredicted to the total number of branches architecturally executed. This gives an indication of the effectiveness of the branch prediction unit.",
+ "MetricGroup": "Miss_Ratio;Branch_Effectiveness",
+ "ScaleUnit": "100percent of branches"
+ },
+ {
+ "MetricName": "branch_mpki",
+ "MetricExpr": "BR_MIS_PRED_RETIRED / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of branch mispredictions per thousand instructions executed.",
+ "MetricGroup": "MPKI;Branch_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "branch_return_ratio",
+ "MetricExpr": "BR_RETURN_RETIRED / BR_RETIRED",
+ "BriefDescription": "This metric measures the ratio of branches retired that are function returns to the total number of branches architecturally executed.",
+ "MetricGroup": "Branch_Effectiveness",
+ "ScaleUnit": "1per branch"
+ },
+ {
+ "MetricName": "crypto_percentage",
+ "MetricExpr": "CRYPTO_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures crypto operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "dtlb_mpki",
+ "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of data TLB Walks per thousand instructions executed.",
+ "MetricGroup": "MPKI;DTLB_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "dtlb_walk_ratio",
+ "MetricExpr": "DTLB_WALK / L1D_TLB",
+ "BriefDescription": "This metric measures the ratio of data TLB Walks to the total number of data TLB accesses. This gives an indication of the effectiveness of the data TLB accesses.",
+ "MetricGroup": "Miss_Ratio;DTLB_Effectiveness",
+ "ScaleUnit": "100percent of TLB accesses"
+ },
+ {
+ "MetricName": "fp16_percentage",
+ "MetricExpr": "FP_HP_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures half-precision floating point operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "FP_Precision_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "fp32_percentage",
+ "MetricExpr": "FP_SP_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures single-precision floating point operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "FP_Precision_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "fp64_percentage",
+ "MetricExpr": "FP_DP_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures double-precision floating point operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "FP_Precision_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "fp_ops_per_cycle",
+ "MetricExpr": "(FP_SCALE_OPS_SPEC + FP_FIXED_OPS_SPEC) / CPU_CYCLES",
+ "BriefDescription": "This metric measures floating point operations per cycle in any precision performed by any instruction. Operations are counted by computation and by vector lanes, fused computations such as multiply-add count as twice per vector lane for example.",
+ "MetricGroup": "FP_Arithmetic_Intensity",
+ "ScaleUnit": "1operations per cycle"
+ },
+ {
+ "MetricName": "frontend_cache_l1i_bound",
+ "MetricExpr": "STALL_FRONTEND_L1I / (STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to memory access latency issues caused by level 1 instruction cache misses.",
+ "MetricGroup": "Topdown_Frontend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "frontend_cache_l2i_bound",
+ "MetricExpr": "STALL_FRONTEND_MEM / (STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to memory access latency issues caused by level 2 instruction cache misses.",
+ "MetricGroup": "Topdown_Frontend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "frontend_core_bound",
+ "MetricExpr": "STALL_FRONTEND_CPUBOUND / STALL_FRONTEND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to frontend core resource constraints not related to instruction fetch latency issues caused by memory access components.",
+ "MetricGroup": "Topdown_Frontend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "frontend_core_flush_bound",
+ "MetricExpr": "STALL_FRONTEND_FLUSH / STALL_FRONTEND_CPUBOUND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend as the processor is recovering from a pipeline flush caused by bad speculation or other machine resteers.",
+ "MetricGroup": "Topdown_Frontend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "frontend_mem_bound",
+ "MetricExpr": "STALL_FRONTEND_MEMBOUND / STALL_FRONTEND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to frontend core resource constraints related to the instruction fetch latency issues caused by memory access components.",
+ "MetricGroup": "Topdown_Frontend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "frontend_mem_cache_bound",
+ "MetricExpr": "(STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) / STALL_FRONTEND_MEMBOUND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to instruction fetch latency issues caused by instruction cache misses.",
+ "MetricGroup": "Topdown_Frontend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "frontend_mem_tlb_bound",
+ "MetricExpr": "STALL_FRONTEND_TLB / STALL_FRONTEND_MEMBOUND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to instruction fetch latency issues caused by instruction TLB misses.",
+ "MetricGroup": "Topdown_Frontend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "frontend_stalled_cycles",
+ "MetricExpr": "STALL_FRONTEND / CPU_CYCLES * 100",
+ "BriefDescription": "This metric is the percentage of cycles that were stalled due to resource constraints in the frontend unit of the processor.",
+ "MetricGroup": "Cycle_Accounting",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "integer_dp_percentage",
+ "MetricExpr": "DP_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures scalar integer operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "ipc",
+ "MetricExpr": "INST_RETIRED / CPU_CYCLES",
+ "BriefDescription": "This metric measures the number of instructions retired per cycle.",
+ "MetricGroup": "General",
+ "ScaleUnit": "1per cycle"
+ },
+ {
+ "MetricName": "itlb_mpki",
+ "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of instruction TLB Walks per thousand instructions executed.",
+ "MetricGroup": "MPKI;ITLB_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "itlb_walk_ratio",
+ "MetricExpr": "ITLB_WALK / L1I_TLB",
+ "BriefDescription": "This metric measures the ratio of instruction TLB Walks to the total number of instruction TLB accesses. This gives an indication of the effectiveness of the instruction TLB accesses.",
+ "MetricGroup": "Miss_Ratio;ITLB_Effectiveness",
+ "ScaleUnit": "100percent of TLB accesses"
+ },
+ {
+ "MetricName": "l1d_cache_miss_ratio",
+ "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
+ "BriefDescription": "This metric measures the ratio of level 1 data cache accesses missed to the total number of level 1 data cache accesses. This gives an indication of the effectiveness of the level 1 data cache.",
+ "MetricGroup": "Miss_Ratio;L1D_Cache_Effectiveness",
+ "ScaleUnit": "100percent of cache accesses"
+ },
+ {
+ "MetricName": "l1d_cache_mpki",
+ "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of level 1 data cache accesses missed per thousand instructions executed.",
+ "MetricGroup": "MPKI;L1D_Cache_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "l1d_tlb_miss_ratio",
+ "MetricExpr": "L1D_TLB_REFILL / L1D_TLB",
+ "BriefDescription": "This metric measures the ratio of level 1 data TLB accesses missed to the total number of level 1 data TLB accesses. This gives an indication of the effectiveness of the level 1 data TLB.",
+ "MetricGroup": "Miss_Ratio;DTLB_Effectiveness",
+ "ScaleUnit": "100percent of TLB accesses"
+ },
+ {
+ "MetricName": "l1d_tlb_mpki",
+ "MetricExpr": "L1D_TLB_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of level 1 data TLB accesses missed per thousand instructions executed.",
+ "MetricGroup": "MPKI;DTLB_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "l1i_cache_miss_ratio",
+ "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
+ "BriefDescription": "This metric measures the ratio of level 1 instruction cache accesses missed to the total number of level 1 instruction cache accesses. This gives an indication of the effectiveness of the level 1 instruction cache.",
+ "MetricGroup": "Miss_Ratio;L1I_Cache_Effectiveness",
+ "ScaleUnit": "100percent of cache accesses"
+ },
+ {
+ "MetricName": "l1i_cache_mpki",
+ "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of level 1 instruction cache accesses missed per thousand instructions executed.",
+ "MetricGroup": "MPKI;L1I_Cache_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "l1i_tlb_miss_ratio",
+ "MetricExpr": "L1I_TLB_REFILL / L1I_TLB",
+ "BriefDescription": "This metric measures the ratio of level 1 instruction TLB accesses missed to the total number of level 1 instruction TLB accesses. This gives an indication of the effectiveness of the level 1 instruction TLB.",
+ "MetricGroup": "Miss_Ratio;ITLB_Effectiveness",
+ "ScaleUnit": "100percent of TLB accesses"
+ },
+ {
+ "MetricName": "l1i_tlb_mpki",
+ "MetricExpr": "L1I_TLB_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of level 1 instruction TLB accesses missed per thousand instructions executed.",
+ "MetricGroup": "MPKI;ITLB_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "l2_cache_miss_ratio",
+ "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
+ "BriefDescription": "This metric measures the ratio of level 2 cache accesses missed to the total number of level 2 cache accesses. This gives an indication of the effectiveness of the level 2 cache, which is a unified cache that stores both data and instruction. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a unified cache.",
+ "MetricGroup": "Miss_Ratio;L2_Cache_Effectiveness",
+ "ScaleUnit": "100percent of cache accesses"
+ },
+ {
+ "MetricName": "l2_cache_mpki",
+ "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of level 2 unified cache accesses missed per thousand instructions executed. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a unified cache.",
+ "MetricGroup": "MPKI;L2_Cache_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "l2_tlb_miss_ratio",
+ "MetricExpr": "L2D_TLB_REFILL / L2D_TLB",
+ "BriefDescription": "This metric measures the ratio of level 2 unified TLB accesses missed to the total number of level 2 unified TLB accesses. This gives an indication of the effectiveness of the level 2 TLB.",
+ "MetricGroup": "Miss_Ratio;ITLB_Effectiveness;DTLB_Effectiveness",
+ "ScaleUnit": "100percent of TLB accesses"
+ },
+ {
+ "MetricName": "l2_tlb_mpki",
+ "MetricExpr": "L2D_TLB_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of level 2 unified TLB accesses missed per thousand instructions executed.",
+ "MetricGroup": "MPKI;ITLB_Effectiveness;DTLB_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "ll_cache_read_hit_ratio",
+ "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
+ "BriefDescription": "This metric measures the ratio of last level cache read accesses hit in the cache to the total number of last level cache accesses. This gives an indication of the effectiveness of the last level cache for read traffic. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a system level cache.",
+ "MetricGroup": "LL_Cache_Effectiveness",
+ "ScaleUnit": "100percent of cache accesses"
+ },
+ {
+ "MetricName": "ll_cache_read_miss_ratio",
+ "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
+ "BriefDescription": "This metric measures the ratio of last level cache read accesses missed to the total number of last level cache accesses. This gives an indication of the effectiveness of the last level cache for read traffic. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a system level cache.",
+ "MetricGroup": "Miss_Ratio;LL_Cache_Effectiveness",
+ "ScaleUnit": "100percent of cache accesses"
+ },
+ {
+ "MetricName": "ll_cache_read_mpki",
+ "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of last level cache read accesses missed per thousand instructions executed.",
+ "MetricGroup": "MPKI;LL_Cache_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "load_percentage",
+ "MetricExpr": "LD_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures load operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "nonsve_fp_ops_per_cycle",
+ "MetricExpr": "FP_FIXED_OPS_SPEC / CPU_CYCLES",
+ "BriefDescription": "This metric measures floating point operations per cycle in any precision performed by an instruction that is not an SVE instruction. Operations are counted by computation and by vector lanes, fused computations such as multiply-add count as twice per vector lane for example.",
+ "MetricGroup": "FP_Arithmetic_Intensity",
+ "ScaleUnit": "1operations per cycle"
+ },
+ {
+ "MetricName": "scalar_fp_percentage",
+ "MetricExpr": "VFP_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures scalar floating point operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "simd_percentage",
+ "MetricExpr": "ASE_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures advanced SIMD operations as a percentage of total operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "store_percentage",
+ "MetricExpr": "ST_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures store operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "sve_all_percentage",
+ "MetricExpr": "SVE_INST_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures scalable vector operations, including loads and stores, as a percentage of operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "sve_fp_ops_per_cycle",
+ "MetricExpr": "FP_SCALE_OPS_SPEC / CPU_CYCLES",
+ "BriefDescription": "This metric measures floating point operations per cycle in any precision performed by SVE instructions. Operations are counted by computation and by vector lanes, fused computations such as multiply-add count as twice per vector lane for example.",
+ "MetricGroup": "FP_Arithmetic_Intensity",
+ "ScaleUnit": "1operations per cycle"
+ },
+ {
+ "MetricName": "sve_predicate_empty_percentage",
+ "MetricExpr": "SVE_PRED_EMPTY_SPEC / SVE_PRED_SPEC * 100",
+ "BriefDescription": "This metric measures scalable vector operations with no active predicates as a percentage of sve predicated operations speculatively executed.",
+ "MetricGroup": "SVE_Effectiveness",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "sve_predicate_full_percentage",
+ "MetricExpr": "SVE_PRED_FULL_SPEC / SVE_PRED_SPEC * 100",
+ "BriefDescription": "This metric measures scalable vector operations with all active predicates as a percentage of sve predicated operations speculatively executed.",
+ "MetricGroup": "SVE_Effectiveness",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "sve_predicate_partial_percentage",
+ "MetricExpr": "SVE_PRED_PARTIAL_SPEC / SVE_PRED_SPEC * 100",
+ "BriefDescription": "This metric measures scalable vector operations with at least one active predicates as a percentage of sve predicated operations speculatively executed.",
+ "MetricGroup": "SVE_Effectiveness",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "sve_predicate_percentage",
+ "MetricExpr": "SVE_PRED_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures scalable vector operations with predicates as a percentage of operations speculatively executed.",
+ "MetricGroup": "SVE_Effectiveness",
+ "ScaleUnit": "1percent of operations"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json
new file mode 100644
index 000000000000..d8b7b9f9e5fa
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json
@@ -0,0 +1,8 @@
+[
+ {
+ "ArchStdEvent": "PMU_OVFS"
+ },
+ {
+ "ArchStdEvent": "PMU_HOVFS"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json
new file mode 100644
index 000000000000..69f9a0b0c7ff
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json
@@ -0,0 +1,90 @@
+[
+ {
+ "ArchStdEvent": "SW_INCR",
+ "PublicDescription": "Counts software writes to the PMSWINC_EL0 (software PMU increment) register. The PMSWINC_EL0 register is a manually updated counter for use by application software.\n\nThis event could be used to measure any user program event, such as accesses to a particular data structure (by writing to the PMSWINC_EL0 register each time the data structure is accessed).\n\nTo use the PMSWINC_EL0 register and event, developers must insert instructions that write to the PMSWINC_EL0 register into the source code.\n\nSince the SW_INCR event records writes to the PMSWINC_EL0 register, there is no need to do a read/increment/write sequence to the PMSWINC_EL0 register."
+ },
+ {
+ "ArchStdEvent": "INST_RETIRED",
+ "PublicDescription": "Counts instructions that have been architecturally executed."
+ },
+ {
+ "ArchStdEvent": "CID_WRITE_RETIRED",
+ "PublicDescription": "Counts architecturally executed writes to the CONTEXTIDR_EL1 register, which usually contain the kernel PID and can be output with hardware trace."
+ },
+ {
+ "ArchStdEvent": "PC_WRITE_RETIRED",
+ "PublicDescription": "Counts branch instructions that caused a change of Program Counter, which effectively causes a change in the control flow of the program."
+ },
+ {
+ "ArchStdEvent": "BR_IMMED_RETIRED",
+ "PublicDescription": "Counts architecturally executed direct branches."
+ },
+ {
+ "ArchStdEvent": "BR_RETURN_RETIRED",
+ "PublicDescription": "Counts architecturally executed procedure returns."
+ },
+ {
+ "ArchStdEvent": "TTBR_WRITE_RETIRED",
+ "PublicDescription": "Counts architectural writes to TTBR0/1_EL1. If virtualization host extensions are enabled (by setting the HCR_EL2.E2H bit to 1), then accesses to TTBR0/1_EL1 that are redirected to TTBR0/1_EL2, or accesses to TTBR0/1_EL12, are counted. TTBRn registers are typically updated when the kernel is swapping user-space threads or applications."
+ },
+ {
+ "ArchStdEvent": "BR_RETIRED",
+ "PublicDescription": "Counts architecturally executed branches, whether the branch is taken or not. Instructions that explicitly write to the PC are also counted. Note that exception generating instructions, exception return instructions and context synchronization instructions are not counted."
+ },
+ {
+ "ArchStdEvent": "BR_MIS_PRED_RETIRED",
+ "PublicDescription": "Counts branches counted by BR_RETIRED which were mispredicted and caused a pipeline flush."
+ },
+ {
+ "ArchStdEvent": "OP_RETIRED",
+ "PublicDescription": "Counts micro-operations that are architecturally executed. This is a count of number of micro-operations retired from the commit queue in a single cycle."
+ },
+ {
+ "ArchStdEvent": "BR_IMMED_TAKEN_RETIRED",
+ "PublicDescription": "Counts architecturally executed immediate branches that were taken."
+ },
+ {
+ "ArchStdEvent": "BR_INDNR_TAKEN_RETIRED",
+ "PublicDescription": "Counts architecturally executed indirect branches excluding procedure returns that were taken."
+ },
+ {
+ "ArchStdEvent": "BR_IMMED_PRED_RETIRED",
+ "PublicDescription": "Counts architecturally executed direct branches that were correctly predicted."
+ },
+ {
+ "ArchStdEvent": "BR_IMMED_MIS_PRED_RETIRED",
+ "PublicDescription": "Counts architecturally executed direct branches that were mispredicted and caused a pipeline flush."
+ },
+ {
+ "ArchStdEvent": "BR_IND_PRED_RETIRED",
+ "PublicDescription": "Counts architecturally executed indirect branches including procedure returns that were correctly predicted."
+ },
+ {
+ "ArchStdEvent": "BR_IND_MIS_PRED_RETIRED",
+ "PublicDescription": "Counts architecturally executed indirect branches including procedure returns that were mispredicted and caused a pipeline flush."
+ },
+ {
+ "ArchStdEvent": "BR_RETURN_PRED_RETIRED",
+ "PublicDescription": "Counts architecturally executed procedure returns that were correctly predicted."
+ },
+ {
+ "ArchStdEvent": "BR_RETURN_MIS_PRED_RETIRED",
+ "PublicDescription": "Counts architecturally executed procedure returns that were mispredicted and caused a pipeline flush."
+ },
+ {
+ "ArchStdEvent": "BR_INDNR_PRED_RETIRED",
+ "PublicDescription": "Counts architecturally executed indirect branches excluding procedure returns that were correctly predicted."
+ },
+ {
+ "ArchStdEvent": "BR_INDNR_MIS_PRED_RETIRED",
+ "PublicDescription": "Counts architecturally executed indirect branches excluding procedure returns that were mispredicted and caused a pipeline flush."
+ },
+ {
+ "ArchStdEvent": "BR_PRED_RETIRED",
+ "PublicDescription": "Counts branch instructions counted by BR_RETIRED which were correctly predicted."
+ },
+ {
+ "ArchStdEvent": "BR_IND_RETIRED",
+ "PublicDescription": "Counts architecturally executed indirect branches including procedure returns."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json
new file mode 100644
index 000000000000..ca0217fa4681
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json
@@ -0,0 +1,42 @@
+[
+ {
+ "ArchStdEvent": "SAMPLE_POP",
+ "PublicDescription": "Counts statistical profiling sample population, the count of all operations that could be sampled but may or may not be chosen for sampling."
+ },
+ {
+ "ArchStdEvent": "SAMPLE_FEED",
+ "PublicDescription": "Counts statistical profiling samples taken for sampling."
+ },
+ {
+ "ArchStdEvent": "SAMPLE_FILTRATE",
+ "PublicDescription": "Counts statistical profiling samples taken which are not removed by filtering."
+ },
+ {
+ "ArchStdEvent": "SAMPLE_COLLISION",
+ "PublicDescription": "Counts statistical profiling samples that have collided with a previous sample and so therefore not taken."
+ },
+ {
+ "ArchStdEvent": "SAMPLE_FEED_BR",
+ "PublicDescription": "Counts statistical profiling samples taken which are branches."
+ },
+ {
+ "ArchStdEvent": "SAMPLE_FEED_LD",
+ "PublicDescription": "Counts statistical profiling samples taken which are loads or load atomic operations."
+ },
+ {
+ "ArchStdEvent": "SAMPLE_FEED_ST",
+ "PublicDescription": "Counts statistical profiling samples taken which are stores or store atomic operations."
+ },
+ {
+ "ArchStdEvent": "SAMPLE_FEED_OP",
+ "PublicDescription": "Counts statistical profiling samples taken which are matching any operation type filters supported."
+ },
+ {
+ "ArchStdEvent": "SAMPLE_FEED_EVENT",
+ "PublicDescription": "Counts statistical profiling samples taken which are matching event packet filter constraints."
+ },
+ {
+ "ArchStdEvent": "SAMPLE_FEED_LAT",
+ "PublicDescription": "Counts statistical profiling samples taken which are exceeding minimum latency set by operation latency filter constraints."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.json
new file mode 100644
index 000000000000..f91eb18d683c
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.json
@@ -0,0 +1,90 @@
+[
+ {
+ "ArchStdEvent": "BR_MIS_PRED",
+ "PublicDescription": "Counts branches which are speculatively executed and mispredicted."
+ },
+ {
+ "ArchStdEvent": "BR_PRED",
+ "PublicDescription": "Counts all speculatively executed branches."
+ },
+ {
+ "ArchStdEvent": "INST_SPEC",
+ "PublicDescription": "Counts operations that have been speculatively executed."
+ },
+ {
+ "ArchStdEvent": "OP_SPEC",
+ "PublicDescription": "Counts micro-operations speculatively executed. This is the count of the number of micro-operations dispatched in a cycle."
+ },
+ {
+ "ArchStdEvent": "STREX_FAIL_SPEC",
+ "PublicDescription": "Counts store-exclusive operations that have been speculatively executed and have not successfully completed the store operation."
+ },
+ {
+ "ArchStdEvent": "STREX_SPEC",
+ "PublicDescription": "Counts store-exclusive operations that have been speculatively executed."
+ },
+ {
+ "ArchStdEvent": "LD_SPEC",
+ "PublicDescription": "Counts speculatively executed load operations including Single Instruction Multiple Data (SIMD) load operations."
+ },
+ {
+ "ArchStdEvent": "ST_SPEC",
+ "PublicDescription": "Counts speculatively executed store operations including Single Instruction Multiple Data (SIMD) store operations."
+ },
+ {
+ "ArchStdEvent": "DP_SPEC",
+ "PublicDescription": "Counts speculatively executed logical or arithmetic instructions such as MOV/MVN operations."
+ },
+ {
+ "ArchStdEvent": "ASE_SPEC",
+ "PublicDescription": "Counts speculatively executed Advanced SIMD operations excluding load, store and move micro-operations that move data to or from SIMD (vector) registers."
+ },
+ {
+ "ArchStdEvent": "VFP_SPEC",
+ "PublicDescription": "Counts speculatively executed floating point operations. This event does not count operations that move data to or from floating point (vector) registers."
+ },
+ {
+ "ArchStdEvent": "PC_WRITE_SPEC",
+ "PublicDescription": "Counts speculatively executed operations which cause software changes of the PC. Those operations include all taken branch operations."
+ },
+ {
+ "ArchStdEvent": "CRYPTO_SPEC",
+ "PublicDescription": "Counts speculatively executed cryptographic operations except for PMULL and VMULL operations."
+ },
+ {
+ "ArchStdEvent": "ISB_SPEC",
+ "PublicDescription": "Counts ISB operations that are executed."
+ },
+ {
+ "ArchStdEvent": "DSB_SPEC",
+ "PublicDescription": "Counts DSB operations that are speculatively issued to Load/Store unit in the CPU."
+ },
+ {
+ "ArchStdEvent": "DMB_SPEC",
+ "PublicDescription": "Counts DMB operations that are speculatively issued to the Load/Store unit in the CPU. This event does not count implied barriers from load acquire/store release operations."
+ },
+ {
+ "ArchStdEvent": "RC_LD_SPEC",
+ "PublicDescription": "Counts any load acquire operations that are speculatively executed. For example: LDAR, LDARH, LDARB"
+ },
+ {
+ "ArchStdEvent": "RC_ST_SPEC",
+ "PublicDescription": "Counts any store release operations that are speculatively executed. For example: STLR, STLRH, STLRB"
+ },
+ {
+ "ArchStdEvent": "ASE_INST_SPEC",
+ "PublicDescription": "Counts speculatively executed Advanced SIMD operations."
+ },
+ {
+ "ArchStdEvent": "CAS_NEAR_PASS",
+ "PublicDescription": "Counts compare and swap instructions that executed locally to the PE and updated the location accessed."
+ },
+ {
+ "ArchStdEvent": "CAS_NEAR_SPEC",
+ "PublicDescription": "Counts compare and swap instructions that executed locally to the PE."
+ },
+ {
+ "ArchStdEvent": "CAS_FAR_SPEC",
+ "PublicDescription": "Counts compare and swap instructions that did not execute locally to the PE."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json
new file mode 100644
index 000000000000..b1eae21bac07
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json
@@ -0,0 +1,82 @@
+[
+ {
+ "ArchStdEvent": "STALL_FRONTEND",
+ "PublicDescription": "Counts cycles when frontend could not send any micro-operations to the rename stage because of frontend resource stalls caused by fetch memory latency or branch prediction flow stalls. STALL_FRONTEND_SLOTS counts SLOTS during the cycle when this event counts."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND",
+ "PublicDescription": "Counts cycles whenever the rename unit is unable to send any micro-operations to the backend of the pipeline because of backend resource constraints. Backend resource constraints can include issue stage fullness, execution stage fullness, or other internal pipeline resource fullness. All the backend slots were empty during the cycle when this event counts."
+ },
+ {
+ "ArchStdEvent": "STALL",
+ "PublicDescription": "Counts cycles when no operations are sent to the rename unit from the frontend or from the rename unit to the backend for any reason (either frontend or backend stall). This event is the sum of STALL_FRONTEND and STALL_BACKEND"
+ },
+ {
+ "ArchStdEvent": "STALL_SLOT_BACKEND",
+ "PublicDescription": "Counts slots per cycle in which no operations are sent from the rename unit to the backend due to backend resource constraints. STALL_BACKEND counts during the cycle when STALL_SLOT_BACKEND counts at least 1."
+ },
+ {
+ "ArchStdEvent": "STALL_SLOT_FRONTEND",
+ "PublicDescription": "Counts slots per cycle in which no operations are sent to the rename unit from the frontend due to frontend resource constraints."
+ },
+ {
+ "ArchStdEvent": "STALL_SLOT",
+ "PublicDescription": "Counts slots per cycle in which no operations are sent to the rename unit from the frontend or from the rename unit to the backend for any reason (either frontend or backend stall). STALL_SLOT is the sum of STALL_SLOT_FRONTEND and STALL_SLOT_BACKEND."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_MEM",
+ "PublicDescription": "Counts cycles when the backend is stalled because there is a pending demand load request in progress in the last level core cache."
+ },
+ {
+ "ArchStdEvent": "STALL_FRONTEND_MEMBOUND",
+ "PublicDescription": "Counts cycles when the frontend could not send any micro-operations to the rename stage due to resource constraints in the memory resources."
+ },
+ {
+ "ArchStdEvent": "STALL_FRONTEND_L1I",
+ "PublicDescription": "Counts cycles when the frontend is stalled because there is an instruction fetch request pending in the level 1 instruction cache."
+ },
+ {
+ "ArchStdEvent": "STALL_FRONTEND_MEM",
+ "PublicDescription": "Counts cycles when the frontend is stalled because there is an instruction fetch request pending in the last level core cache."
+ },
+ {
+ "ArchStdEvent": "STALL_FRONTEND_TLB",
+ "PublicDescription": "Counts when the frontend is stalled on any TLB misses being handled. This event also counts the TLB accesses made by hardware prefetches."
+ },
+ {
+ "ArchStdEvent": "STALL_FRONTEND_CPUBOUND",
+ "PublicDescription": "Counts cycles when the frontend could not send any micro-operations to the rename stage due to resource constraints in the CPU resources excluding memory resources."
+ },
+ {
+ "ArchStdEvent": "STALL_FRONTEND_FLUSH",
+ "PublicDescription": "Counts cycles when the frontend could not send any micro-operations to the rename stage as the frontend is recovering from a machine flush or resteer. Example scenarios that cause a flush include branch mispredictions, taken exceptions, micro-architectural flush etc."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_MEMBOUND",
+ "PublicDescription": "Counts cycles when the backend could not accept any micro-operations due to resource constraints in the memory resources."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_L1D",
+ "PublicDescription": "Counts cycles when the backend is stalled because there is a pending demand load request in progress in the level 1 data cache."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_TLB",
+ "PublicDescription": "Counts cycles when the backend is stalled on any demand TLB misses being handled."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_ST",
+ "PublicDescription": "Counts cycles when the backend is stalled and there is a store that has not reached the pre-commit stage."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_CPUBOUND",
+ "PublicDescription": "Counts cycles when the backend could not accept any micro-operations due to any resource constraints in the CPU excluding memory resources."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_BUSY",
+ "PublicDescription": "Counts cycles when the backend could not accept any micro-operations because the issue queues are full to take any operations for execution."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_RENAME",
+ "PublicDescription": "Counts cycles when backend is stalled even when operations are available from the frontend but at least one is not ready to be sent to the backend because no rename register is available."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json
new file mode 100644
index 000000000000..51dab48cb2ba
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json
@@ -0,0 +1,50 @@
+[
+ {
+ "ArchStdEvent": "SVE_INST_SPEC",
+ "PublicDescription": "Counts speculatively executed operations that are SVE operations."
+ },
+ {
+ "ArchStdEvent": "SVE_PRED_SPEC",
+ "PublicDescription": "Counts speculatively executed predicated SVE operations."
+ },
+ {
+ "ArchStdEvent": "SVE_PRED_EMPTY_SPEC",
+ "PublicDescription": "Counts speculatively executed predicated SVE operations with no active predicate elements."
+ },
+ {
+ "ArchStdEvent": "SVE_PRED_FULL_SPEC",
+ "PublicDescription": "Counts speculatively executed predicated SVE operations with all predicate elements active."
+ },
+ {
+ "ArchStdEvent": "SVE_PRED_PARTIAL_SPEC",
+ "PublicDescription": "Counts speculatively executed predicated SVE operations with at least one but not all active predicate elements."
+ },
+ {
+ "ArchStdEvent": "SVE_PRED_NOT_FULL_SPEC",
+ "PublicDescription": "Counts speculatively executed predicated SVE operations with at least one non active predicate elements."
+ },
+ {
+ "ArchStdEvent": "SVE_LDFF_SPEC",
+ "PublicDescription": "Counts speculatively executed SVE first fault or non-fault load operations."
+ },
+ {
+ "ArchStdEvent": "SVE_LDFF_FAULT_SPEC",
+ "PublicDescription": "Counts speculatively executed SVE first fault or non-fault load operations that clear at least one bit in the FFR."
+ },
+ {
+ "ArchStdEvent": "ASE_SVE_INT8_SPEC",
+ "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type an 8-bit integer."
+ },
+ {
+ "ArchStdEvent": "ASE_SVE_INT16_SPEC",
+ "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 16-bit integer."
+ },
+ {
+ "ArchStdEvent": "ASE_SVE_INT32_SPEC",
+ "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 32-bit integer."
+ },
+ {
+ "ArchStdEvent": "ASE_SVE_INT64_SPEC",
+ "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 64-bit integer."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json
new file mode 100644
index 000000000000..c7aa89c2f19f
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json
@@ -0,0 +1,74 @@
+[
+ {
+ "ArchStdEvent": "L1I_TLB_REFILL",
+ "PublicDescription": "Counts level 1 instruction TLB refills from any Instruction fetch. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB."
+ },
+ {
+ "ArchStdEvent": "L1D_TLB_REFILL",
+ "PublicDescription": "Counts level 1 data TLB accesses that resulted in TLB refills. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event counts for refills caused by preload instructions or hardware prefetch accesses. This event counts regardless of whether the miss hits in L2 or results in a translation table walk. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB. This event will not count on an access from an AT(address translation) instruction."
+ },
+ {
+ "ArchStdEvent": "L1D_TLB",
+ "PublicDescription": "Counts level 1 data TLB accesses caused by any memory load or store operation. Note that load or store instructions can be broken up into multiple memory operations. This event does not count TLB maintenance operations."
+ },
+ {
+ "ArchStdEvent": "L1I_TLB",
+ "PublicDescription": "Counts level 1 instruction TLB accesses, whether the access hits or misses in the TLB. This event counts both demand accesses and prefetch or preload generated accesses."
+ },
+ {
+ "ArchStdEvent": "L2D_TLB_REFILL",
+ "PublicDescription": "Counts level 2 TLB refills caused by memory operations from both data and instruction fetch, except for those caused by TLB maintenance operations and hardware prefetches."
+ },
+ {
+ "ArchStdEvent": "L2D_TLB",
+ "PublicDescription": "Counts level 2 TLB accesses except those caused by TLB maintenance operations."
+ },
+ {
+ "ArchStdEvent": "DTLB_WALK",
+ "PublicDescription": "Counts number of demand data translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that partial translations that cause a translation table walk are also counted. Also note that this event counts walks triggered by software preloads, but not walks triggered by hardware prefetchers, and that this event does not count walks triggered by TLB maintenance operations."
+ },
+ {
+ "ArchStdEvent": "ITLB_WALK",
+ "PublicDescription": "Counts number of instruction translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that partial translations that cause a translation table walk are also counted. Also note that this event does not count walks triggered by TLB maintenance operations."
+ },
+ {
+ "ArchStdEvent": "DTLB_WALK_PERCYC",
+ "PublicDescription": "Counts the number of data translation table walks in progress per cycle."
+ },
+ {
+ "ArchStdEvent": "ITLB_WALK_PERCYC",
+ "PublicDescription": "Counts the number of instruction translation table walks in progress per cycle."
+ },
+ {
+ "ArchStdEvent": "DTLB_HWUPD",
+ "PublicDescription": "Counts number of memory accesses triggered by a data translation table walk and performing an update of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that this event counts accesses triggered by software preloads, but not accesses triggered by hardware prefetchers."
+ },
+ {
+ "ArchStdEvent": "ITLB_HWUPD",
+ "PublicDescription": "Counts number of memory accesses triggered by an instruction translation table walk and performing an update of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD."
+ },
+ {
+ "ArchStdEvent": "DTLB_STEP",
+ "PublicDescription": "Counts number of memory accesses triggered by a demand data translation table walk and performing a read of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that this event counts accesses triggered by software preloads, but not accesses triggered by hardware prefetchers."
+ },
+ {
+ "ArchStdEvent": "ITLB_STEP",
+ "PublicDescription": "Counts number of memory accesses triggered by an instruction translation table walk and performing a read of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD."
+ },
+ {
+ "ArchStdEvent": "DTLB_WALK_LARGE",
+ "PublicDescription": "Counts number of demand data translation table walks caused by a miss in the L2 TLB and yielding a large page. The set of large pages is defined as all pages with a final size higher than or equal to 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. If DTLB_WALK_BLOCK is implemented, then it is an alias for this event in this family. Note that partial translations that cause a translation table walk are also counted. Also note that this event counts walks triggered by software preloads, but not walks triggered by hardware prefetchers, and that this event does not count walks triggered by TLB maintenance operations."
+ },
+ {
+ "ArchStdEvent": "ITLB_WALK_LARGE",
+ "PublicDescription": "Counts number of instruction translation table walks caused by a miss in the L2 TLB and yielding a large page. The set of large pages is defined as all pages with a final size higher than or equal to 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. In this family, this is equal to ITLB_WALK_BLOCK event. Note that partial translations that cause a translation table walk are also counted. Also note that this event does not count walks triggered by TLB maintenance operations."
+ },
+ {
+ "ArchStdEvent": "DTLB_WALK_SMALL",
+ "PublicDescription": "Counts number of data translation table walks caused by a miss in the L2 TLB and yielding a small page. The set of small pages is defined as all pages with a final size lower than 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. If DTLB_WALK_PAGE event is implemented, then it is an alias for this event in this family. Note that partial translations that cause a translation table walk are also counted. Also note that this event counts walks triggered by software preloads, but not walks triggered by hardware prefetchers, and that this event does not count walks triggered by TLB maintenance operations."
+ },
+ {
+ "ArchStdEvent": "ITLB_WALK_SMALL",
+ "PublicDescription": "Counts number of instruction translation table walks caused by a miss in the L2 TLB and yielding a small page. The set of small pages is defined as all pages with a final size lower than 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. In this family, this is equal to ITLB_WALK_PAGE event. Note that partial translations that cause a translation table walk are also counted. Also note that this event does not count walks triggered by TLB maintenance operations."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json
new file mode 100644
index 000000000000..33672a8711d4
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json
@@ -0,0 +1,32 @@
+[
+ {
+ "ArchStdEvent": "TRB_WRAP"
+ },
+ {
+ "ArchStdEvent": "TRB_TRIG"
+ },
+ {
+ "ArchStdEvent": "TRCEXTOUT0"
+ },
+ {
+ "ArchStdEvent": "TRCEXTOUT1"
+ },
+ {
+ "ArchStdEvent": "TRCEXTOUT2"
+ },
+ {
+ "ArchStdEvent": "TRCEXTOUT3"
+ },
+ {
+ "ArchStdEvent": "CTI_TRIGOUT4"
+ },
+ {
+ "ArchStdEvent": "CTI_TRIGOUT5"
+ },
+ {
+ "ArchStdEvent": "CTI_TRIGOUT6"
+ },
+ {
+ "ArchStdEvent": "CTI_TRIGOUT7"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv b/tools/perf/pmu-events/arch/arm64/mapfile.csv
index bb3fa8a33496..ccfcae375750 100644
--- a/tools/perf/pmu-events/arch/arm64/mapfile.csv
+++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv
@@ -33,6 +33,7 @@
0x00000000410fd4c0,v1,arm/cortex-x1,core
0x00000000410fd460,v1,arm/cortex-a510,core
0x00000000410fd470,v1,arm/cortex-a710,core
+0x00000000410fd810,v1,arm/cortex-a720,core
0x00000000410fd480,v1,arm/cortex-x2,core
0x00000000410fd490,v1,arm/neoverse-n2-v2,core
0x00000000410fd4f0,v1,arm/neoverse-n2-v2,core
--
2.47.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/2] perf vendor events arm64: Add Cortex-A520 events/metrics
2025-02-13 15:11 [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics Yangyu Chen
2025-02-13 15:12 ` [PATCH 1/2] perf vendor events arm64: Add Cortex-A720 events/metrics Yangyu Chen
@ 2025-02-13 15:12 ` Yangyu Chen
2025-02-13 16:53 ` Ian Rogers
2025-02-14 1:12 ` [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics Namhyung Kim
2 siblings, 1 reply; 16+ messages in thread
From: Yangyu Chen @ 2025-02-13 15:12 UTC (permalink / raw)
To: linux-perf-users
Cc: John Garry, Will Deacon, James Clark, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Liang Kan, Yoshihiro Furudera,
linux-arm-kernel, linux-kernel, Yangyu Chen
Add JSON files for Cortex-A520 events and metrics. Using the existing
Neoverse N3 JSON files as a template, I manually checked the missing and
extra events/metrics using my script [1] and modified them according to
the Arm Cortex-A520 Core Technical Reference Manual [2].
[1] https://github.com/cyyself/arm-pmu-check/tree/1075bebeb3f1441067448251a387df35af15bf16
[2] https://developer.arm.com/documentation/102517/0004/Performance-Monitors-Extension-support-/Performance-monitors-events/Common-event-PMU-events
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
---
.../arch/arm64/arm/cortex-a520/bus.json | 26 ++
.../arch/arm64/arm/cortex-a520/exception.json | 18 +
.../arm64/arm/cortex-a520/fp_operation.json | 14 +
.../arch/arm64/arm/cortex-a520/general.json | 6 +
.../arch/arm64/arm/cortex-a520/l1d_cache.json | 50 +++
.../arch/arm64/arm/cortex-a520/l1i_cache.json | 14 +
.../arch/arm64/arm/cortex-a520/l2_cache.json | 46 +++
.../arch/arm64/arm/cortex-a520/l3_cache.json | 21 +
.../arch/arm64/arm/cortex-a520/ll_cache.json | 10 +
.../arch/arm64/arm/cortex-a520/memory.json | 58 +++
.../arch/arm64/arm/cortex-a520/metrics.json | 373 ++++++++++++++++++
.../arch/arm64/arm/cortex-a520/pmu.json | 8 +
.../arch/arm64/arm/cortex-a520/retired.json | 90 +++++
.../arm64/arm/cortex-a520/spec_operation.json | 70 ++++
.../arch/arm64/arm/cortex-a520/stall.json | 82 ++++
.../arch/arm64/arm/cortex-a520/sve.json | 22 ++
.../arch/arm64/arm/cortex-a520/tlb.json | 78 ++++
.../arch/arm64/arm/cortex-a520/trace.json | 32 ++
.../arch/arm64/common-and-microarch.json | 15 +
tools/perf/pmu-events/arch/arm64/mapfile.csv | 1 +
20 files changed, 1034 insertions(+)
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json
create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json
new file mode 100644
index 000000000000..884e42ab6a49
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json
@@ -0,0 +1,26 @@
+[
+ {
+ "ArchStdEvent": "BUS_ACCESS",
+ "PublicDescription": "Counts memory transactions issued by the CPU to the external bus, including snoop requests and snoop responses. Each beat of data is counted individually."
+ },
+ {
+ "ArchStdEvent": "BUS_CYCLES",
+ "PublicDescription": "Counts bus cycles in the CPU. Bus cycles represent a clock cycle in which a transaction could be sent or received on the interface from the CPU to the external bus. Since that interface is driven at the same clock speed as the CPU, this event is a duplicate of CPU_CYCLES."
+ },
+ {
+ "ArchStdEvent": "BUS_ACCESS_RD",
+ "PublicDescription": "Counts memory read transactions seen on the external bus. Each beat of data is counted individually."
+ },
+ {
+ "ArchStdEvent": "BUS_ACCESS_WR",
+ "PublicDescription": "Counts memory write transactions seen on the external bus. Each beat of data is counted individually."
+ },
+ {
+ "ArchStdEvent": "BUS_REQ_RD_PERCYC",
+ "PublicDescription": "Bus read transactions in progress."
+ },
+ {
+ "ArchStdEvent": "BUS_REQ_RD",
+ "BriefDescription": "Bus request, read"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json
new file mode 100644
index 000000000000..fbe580e15c2e
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json
@@ -0,0 +1,18 @@
+[
+ {
+ "ArchStdEvent": "EXC_TAKEN",
+ "PublicDescription": "Counts any taken architecturally visible exceptions such as IRQ, FIQ, SError, and other synchronous exceptions. Exceptions are counted whether or not they are taken locally."
+ },
+ {
+ "ArchStdEvent": "EXC_RETURN",
+ "PublicDescription": "Counts any architecturally executed exception return instructions. For example: AArch64: ERET"
+ },
+ {
+ "ArchStdEvent": "EXC_IRQ",
+ "PublicDescription": "Counts IRQ exceptions including the virtual IRQs that are taken locally."
+ },
+ {
+ "ArchStdEvent": "EXC_FIQ",
+ "PublicDescription": "Counts FIQ exceptions including the virtual FIQs that are taken locally."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json
new file mode 100644
index 000000000000..da0c4b05ad5b
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json
@@ -0,0 +1,14 @@
+[
+ {
+ "ArchStdEvent": "FP_HP_SPEC",
+ "PublicDescription": "Counts speculatively executed half precision floating point operations."
+ },
+ {
+ "ArchStdEvent": "FP_SP_SPEC",
+ "PublicDescription": "Counts speculatively executed single precision floating point operations."
+ },
+ {
+ "ArchStdEvent": "FP_DP_SPEC",
+ "PublicDescription": "Counts speculatively executed double precision floating point operations."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json
new file mode 100644
index 000000000000..20fada95ef97
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json
@@ -0,0 +1,6 @@
+[
+ {
+ "ArchStdEvent": "CPU_CYCLES",
+ "PublicDescription": "Counts CPU clock cycles (not timer cycles). The clock measured by this event is defined as the physical clock driving the CPU logic."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json
new file mode 100644
index 000000000000..90e871c8986a
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json
@@ -0,0 +1,50 @@
+[
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL",
+ "PublicDescription": "Counts level 1 data cache refills caused by speculatively executed load or store operations that missed in the level 1 data cache. This event only counts one event per cache line."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE",
+ "PublicDescription": "Counts level 1 data cache accesses from any load/store operations. Atomic operations that resolve in the CPUs caches (near atomic operations) counts as both a write access and read access. Each access to a cache line is counted including the multiple accesses caused by single instructions such as LDM or STM. Each access to other level 1 data or unified memory structures, for example refill buffers, write buffers, and write-back buffers, are also counted."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_WB",
+ "PublicDescription": "Counts write-backs of dirty data from the L1 data cache to the L2 cache. This occurs when either a dirty cache line is evicted from L1 data cache and allocated in the L2 cache or dirty data is written to the L2 and possibly to the next level of cache. This event counts both victim cache line evictions and cache write-backs from snoops or cache maintenance operations. The following cache operations are not counted:\n\n1. Invalidations which do not result in data being transferred out of the L1 (such as evictions of clean data),\n2. Full line writes which write to L2 without writing L1, such as write streaming mode."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_LMISS_RD",
+ "PublicDescription": "Counts cache line refills into the level 1 data cache from any memory read operations, that incurred additional latency."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_RD",
+ "PublicDescription": "Counts level 1 data cache accesses from any load operation. Atomic load operations that resolve in the CPUs caches counts as both a write access and read access."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_WR",
+ "PublicDescription": "Counts level 1 data cache accesses generated by store operations. This event also counts accesses caused by a DC ZVA (data cache zero, specified by virtual address) instruction. Near atomic operations that resolve in the CPUs caches count as a write access and read access."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL_RD",
+ "PublicDescription": "Counts level 1 data cache refills caused by speculatively executed load instructions where the memory read operation misses in the level 1 data cache. This event only counts one event per cache line."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL_WR",
+ "PublicDescription": "Counts level 1 data cache refills caused by speculatively executed store instructions where the memory write operation misses in the level 1 data cache. This event only counts one event per cache line."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL_INNER",
+ "PublicDescription": "Counts level 1 data cache refills where the cache line data came from caches inside the immediate cluster of the core."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL_OUTER",
+ "PublicDescription": "Counts level 1 data cache refills for which the cache line data came from outside the immediate cluster of the core, like an SLC in the system interconnect or DRAM."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_HWPRF",
+ "PublicDescription": "Counts level 1 data cache accesses from any load/store operations generated by the hardware prefetcher."
+ },
+ {
+ "ArchStdEvent": "L1D_CACHE_REFILL_HWPRF",
+ "PublicDescription": "Counts level 1 data cache refills where the cache line is requested by a hardware prefetcher."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json
new file mode 100644
index 000000000000..633f1030359d
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json
@@ -0,0 +1,14 @@
+[
+ {
+ "ArchStdEvent": "L1I_CACHE_REFILL",
+ "PublicDescription": "Counts cache line refills in the level 1 instruction cache caused by a missed instruction fetch. Instruction fetches may include accessing multiple instructions, but the single cache line allocation is counted once."
+ },
+ {
+ "ArchStdEvent": "L1I_CACHE",
+ "PublicDescription": "Counts instruction fetches which access the level 1 instruction cache. Instruction cache accesses caused by cache maintenance operations are not counted."
+ },
+ {
+ "ArchStdEvent": "L1I_CACHE_LMISS",
+ "PublicDescription": "Counts cache line refills into the level 1 instruction cache, that incurred additional latency."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json
new file mode 100644
index 000000000000..9874b1a7c94b
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json
@@ -0,0 +1,46 @@
+[
+ {
+ "ArchStdEvent": "L2D_CACHE",
+ "PublicDescription": "Counts accesses to the level 2 cache due to data accesses. Level 2 cache is a unified cache for data and instruction accesses. Accesses are for misses in the first level data cache or translation resolutions due to accesses. This event also counts write back of dirty data from level 1 data cache to the L2 cache."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_REFILL",
+ "PublicDescription": "Counts cache line refills into the level 2 cache. Level 2 cache is a unified cache for data and instruction accesses. Accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_WB",
+ "PublicDescription": "Counts write-backs of data from the L2 cache to outside the CPU. This includes snoops to the L2 (from other CPUs) which return data even if the snoops cause an invalidation. L2 cache line invalidations which do not write data outside the CPU and snoops which return data from an L1 cache are not counted. Data would not be written outside the cache when invalidating a clean cache line."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_ALLOCATE",
+ "PublicDescription": "Counts level 2 cache line allocates that do not fetch data from outside the level 2 data or unified cache."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_RD",
+ "PublicDescription": "Counts level 2 data cache accesses due to memory read operations. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_WR",
+ "PublicDescription": "Counts level 2 cache accesses due to memory write operations. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_REFILL_RD",
+ "PublicDescription": "Counts refills for memory accesses due to memory read operation counted by L2D_CACHE_RD. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_REFILL_WR",
+ "PublicDescription": "Counts refills for memory accesses due to memory write operation counted by L2D_CACHE_WR. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_LMISS_RD",
+ "PublicDescription": "Counts cache line refills into the level 2 unified cache from any memory read operations that incurred additional latency."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_HWPRF",
+ "PublicDescription": "Counts level 2 data cache accesses generated by L2D hardware prefetchers."
+ },
+ {
+ "ArchStdEvent": "L2D_CACHE_REFILL_HWPRF",
+ "BriefDescription": "This event counts hardware prefetch counted by L2D_CACHE_HWPRF that causes a refill of the Level 2 cache, or any Level 1 data and instruction cache of this PE, from outside of those caches."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json
new file mode 100644
index 000000000000..d5485d71babb
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json
@@ -0,0 +1,21 @@
+[
+ {
+ "ArchStdEvent": "L3D_CACHE",
+ "PublicDescription": "Counts level 3 cache accesses. Level 3 cache is a unified cache for data and instruction accesses. Accesses are for misses in the lower level caches or translation resolutions due to accesses."
+ },
+ {
+ "ArchStdEvent": "L3D_CACHE_RD",
+ "PublicDescription": "Counts level 3 cache accesses caused by any memory read operation. Level 3 cache is a unified cache for data and instruction accesses. Accesses are for misses in the lower level caches or translation resolutions due to accesses."
+ },
+ {
+ "ArchStdEvent": "L3D_CACHE_REFILL_RD"
+ },
+ {
+ "ArchStdEvent": "L3D_CACHE_LMISS_RD",
+ "PublicDescription": "Counts any cache line refill into the level 3 cache from memory read operations that incurred additional latency."
+ },
+ {
+ "ArchStdEvent": "L3D_CACHE_HWPRF",
+ "PublicDescription": "Level 3 data cache hardware prefetch."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json
new file mode 100644
index 000000000000..fd5a2e0099b8
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json
@@ -0,0 +1,10 @@
+[
+ {
+ "ArchStdEvent": "LL_CACHE_RD",
+ "PublicDescription": "Counts read transactions that were returned from outside the core cluster. This event counts for external last level cache when the system register CPUECTLR.EXTLLC bit is set, otherwise it counts for the L3 cache. This event counts read transactions returned from outside the core if those transactions are either hit in the system level cache or missed in the SLC and are returned from any other external sources."
+ },
+ {
+ "ArchStdEvent": "LL_CACHE_MISS_RD",
+ "PublicDescription": "Counts read transactions that were returned from outside the core cluster but missed in the system level cache. This event counts for external last level cache when the system register CPUECTLR.EXTLLC bit is set, otherwise it counts for L3 cache. This event counts read transactions returned from outside the core if those transactions are missed in the System level Cache. The data source of the transaction is indicated by a field in the CHI transaction returning to the CPU. This event does not count reads caused by cache maintenance operations."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json
new file mode 100644
index 000000000000..e7f7914ecd2b
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json
@@ -0,0 +1,58 @@
+[
+ {
+ "ArchStdEvent": "MEM_ACCESS",
+ "PublicDescription": "Counts memory accesses issued by the CPU load store unit, where those accesses are issued due to load or store operations. This event counts memory accesses no matter whether the data is received from any level of cache hierarchy or external memory. If memory accesses are broken up into smaller transactions than what were specified in the load or store instructions, then the event counts those smaller memory transactions."
+ },
+ {
+ "ArchStdEvent": "MEMORY_ERROR",
+ "PublicDescription": "Counts any detected correctable or uncorrectable physical memory errors (ECC or parity) in protected CPUs RAMs. On the core, this event counts errors in the caches (including data and tag rams). Any detected memory error (from either a speculative and abandoned access, or an architecturally executed access) is counted. Note that errors are only detected when the actual protected memory is accessed by an operation."
+ },
+ {
+ "ArchStdEvent": "REMOTE_ACCESS_RD",
+ "PublicDescription": "Counts memory access to another socket in a multi-socket system, read."
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_RD",
+ "PublicDescription": "Counts memory accesses issued by the CPU due to load operations. The event counts any memory load access, no matter whether the data is received from any level of cache hierarchy or external memory. The event also counts atomic load operations. If memory accesses are broken up by the load/store unit into smaller transactions that are issued by the bus interface, then the event counts those smaller transactions."
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_WR",
+ "PublicDescription": "Counts memory accesses issued by the CPU due to store operations. The event counts any memory store access, no matter whether the data is located in any level of cache or external memory. The event also counts atomic load and store operations. If memory accesses are broken up by the load/store unit into smaller transactions that are issued by the bus interface, then the event counts those smaller transactions."
+ },
+ {
+ "ArchStdEvent": "LDST_ALIGN_LAT",
+ "PublicDescription": "Counts the number of memory read and write accesses in a cycle that incurred additional latency, due to the alignment of the address and the size of data being accessed, which results in store crossing a single cache line."
+ },
+ {
+ "ArchStdEvent": "LD_ALIGN_LAT",
+ "PublicDescription": "Counts the number of memory read accesses in a cycle that incurred additional latency, due to the alignment of the address and size of data being accessed, which results in load crossing a single cache line."
+ },
+ {
+ "ArchStdEvent": "ST_ALIGN_LAT",
+ "PublicDescription": "Counts the number of memory write access in a cycle that incurred additional latency, due to the alignment of the address and size of data being accessed incurred additional latency."
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_CHECKED",
+ "PublicDescription": "Counts the number of memory read and write accesses counted by MEM_ACCESS that are tag checked by the Memory Tagging Extension (MTE). This event is implemented as the sum of MEM_ACCESS_CHECKED_RD and MEM_ACCESS_CHECKED_WR"
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_CHECKED_RD",
+ "PublicDescription": "Counts the number of memory read accesses in a cycle that are tag checked by the Memory Tagging Extension (MTE)."
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_CHECKED_WR",
+ "PublicDescription": "Counts the number of memory write accesses in a cycle that is tag checked by the Memory Tagging Extension (MTE)."
+ },
+ {
+ "ArchStdEvent": "INST_FETCH_PERCYC",
+ "PublicDescription": "Counts number of instruction fetches outstanding per cycle, which will provide an average latency of instruction fetch."
+ },
+ {
+ "ArchStdEvent": "MEM_ACCESS_RD_PERCYC",
+ "PublicDescription": "Counts the number of outstanding loads or memory read accesses per cycle."
+ },
+ {
+ "ArchStdEvent": "INST_FETCH",
+ "PublicDescription": "Counts Instruction memory accesses that the PE makes."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json
new file mode 100644
index 000000000000..62cb910c8945
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json
@@ -0,0 +1,373 @@
+[
+ {
+ "ArchStdEvent": "backend_bound"
+ },
+ {
+ "MetricName": "backend_busy_bound",
+ "MetricExpr": "STALL_BACKEND_BUSY / STALL_BACKEND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to issue queues being full to accept operations for execution.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_cache_l1d_bound",
+ "MetricExpr": "STALL_BACKEND_L1D / (STALL_BACKEND_L1D + STALL_BACKEND_MEM) * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory access latency issues caused by level 1 data cache misses.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_cache_l2d_bound",
+ "MetricExpr": "STALL_BACKEND_MEM / (STALL_BACKEND_L1D + STALL_BACKEND_MEM) * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory access latency issues caused by level 2 data cache misses.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_mem_bound",
+ "MetricExpr": "STALL_BACKEND_MEMBOUND / STALL_BACKEND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to backend core resource constraints related to memory access latency issues caused by memory access components.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_mem_cache_bound",
+ "MetricExpr": "(STALL_BACKEND_L1D + STALL_BACKEND_MEM) / STALL_BACKEND_MEMBOUND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory latency issues caused by data cache misses.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_mem_store_bound",
+ "MetricExpr": "STALL_BACKEND_ST / STALL_BACKEND_MEMBOUND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to memory write pending caused by stores stalled in the pre-commit stage.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_mem_tlb_bound",
+ "MetricExpr": "STALL_BACKEND_TLB / STALL_BACKEND_MEMBOUND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory access latency issues caused by data TLB misses.",
+ "MetricGroup": "Topdown_Backend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "backend_stalled_cycles",
+ "MetricExpr": "STALL_BACKEND / CPU_CYCLES * 100",
+ "BriefDescription": "This metric is the percentage of cycles that were stalled due to resource constraints in the backend unit of the processor.",
+ "MetricGroup": "Cycle_Accounting",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "ArchStdEvent": "bad_speculation",
+ "MetricExpr": "(1 - STALL_SLOT / (10 * CPU_CYCLES)) * (1 - OP_RETIRED / OP_SPEC) * 100 + STALL_FRONTEND_FLUSH / CPU_CYCLES * 100"
+ },
+ {
+ "MetricName": "branch_direct_ratio",
+ "MetricExpr": "BR_IMMED_RETIRED / BR_RETIRED",
+ "BriefDescription": "This metric measures the ratio of direct branches retired to the total number of branches architecturally executed.",
+ "MetricGroup": "Branch_Effectiveness",
+ "ScaleUnit": "1per branch"
+ },
+ {
+ "MetricName": "branch_indirect_ratio",
+ "MetricExpr": "BR_IND_RETIRED / BR_RETIRED",
+ "BriefDescription": "This metric measures the ratio of indirect branches retired, including function returns, to the total number of branches architecturally executed.",
+ "MetricGroup": "Branch_Effectiveness",
+ "ScaleUnit": "1per branch"
+ },
+ {
+ "MetricName": "branch_misprediction_ratio",
+ "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED",
+ "BriefDescription": "This metric measures the ratio of branches mispredicted to the total number of branches architecturally executed. This gives an indication of the effectiveness of the branch prediction unit.",
+ "MetricGroup": "Miss_Ratio;Branch_Effectiveness",
+ "ScaleUnit": "100percent of branches"
+ },
+ {
+ "MetricName": "branch_mpki",
+ "MetricExpr": "BR_MIS_PRED_RETIRED / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of branch mispredictions per thousand instructions executed.",
+ "MetricGroup": "MPKI;Branch_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "branch_percentage",
+ "MetricExpr": "PC_WRITE_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures branch operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "branch_return_ratio",
+ "MetricExpr": "BR_RETURN_RETIRED / BR_RETIRED",
+ "BriefDescription": "This metric measures the ratio of branches retired that are function returns to the total number of branches architecturally executed.",
+ "MetricGroup": "Branch_Effectiveness",
+ "ScaleUnit": "1per branch"
+ },
+ {
+ "MetricName": "crypto_percentage",
+ "MetricExpr": "CRYPTO_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures crypto operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "dtlb_mpki",
+ "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of data TLB Walks per thousand instructions executed.",
+ "MetricGroup": "MPKI;DTLB_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "dtlb_walk_ratio",
+ "MetricExpr": "DTLB_WALK / L1D_TLB",
+ "BriefDescription": "This metric measures the ratio of data TLB Walks to the total number of data TLB accesses. This gives an indication of the effectiveness of the data TLB accesses.",
+ "MetricGroup": "Miss_Ratio;DTLB_Effectiveness",
+ "ScaleUnit": "100percent of TLB accesses"
+ },
+ {
+ "MetricName": "fp16_percentage",
+ "MetricExpr": "FP_HP_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures half-precision floating point operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "FP_Precision_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "fp32_percentage",
+ "MetricExpr": "FP_SP_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures single-precision floating point operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "FP_Precision_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "fp64_percentage",
+ "MetricExpr": "FP_DP_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures double-precision floating point operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "FP_Precision_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "frontend_cache_l1i_bound",
+ "MetricExpr": "STALL_FRONTEND_L1I / (STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to memory access latency issues caused by level 1 instruction cache misses.",
+ "MetricGroup": "Topdown_Frontend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "frontend_cache_l2i_bound",
+ "MetricExpr": "STALL_FRONTEND_MEM / (STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to memory access latency issues caused by level 2 instruction cache misses.",
+ "MetricGroup": "Topdown_Frontend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "frontend_core_bound",
+ "MetricExpr": "STALL_FRONTEND_CPUBOUND / STALL_FRONTEND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to frontend core resource constraints not related to instruction fetch latency issues caused by memory access components.",
+ "MetricGroup": "Topdown_Frontend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "frontend_core_flush_bound",
+ "MetricExpr": "STALL_FRONTEND_FLUSH / STALL_FRONTEND_CPUBOUND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend as the processor is recovering from a pipeline flush caused by bad speculation or other machine resteers.",
+ "MetricGroup": "Topdown_Frontend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "frontend_mem_bound",
+ "MetricExpr": "STALL_FRONTEND_MEMBOUND / STALL_FRONTEND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to frontend core resource constraints related to the instruction fetch latency issues caused by memory access components.",
+ "MetricGroup": "Topdown_Frontend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "frontend_mem_cache_bound",
+ "MetricExpr": "(STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) / STALL_FRONTEND_MEMBOUND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to instruction fetch latency issues caused by instruction cache misses.",
+ "MetricGroup": "Topdown_Frontend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "frontend_mem_tlb_bound",
+ "MetricExpr": "STALL_FRONTEND_TLB / STALL_FRONTEND_MEMBOUND * 100",
+ "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to instruction fetch latency issues caused by instruction TLB misses.",
+ "MetricGroup": "Topdown_Frontend",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "frontend_stalled_cycles",
+ "MetricExpr": "STALL_FRONTEND / CPU_CYCLES * 100",
+ "BriefDescription": "This metric is the percentage of cycles that were stalled due to resource constraints in the frontend unit of the processor.",
+ "MetricGroup": "Cycle_Accounting",
+ "ScaleUnit": "1percent of cycles"
+ },
+ {
+ "MetricName": "integer_dp_percentage",
+ "MetricExpr": "DP_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures scalar integer operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "ipc",
+ "MetricExpr": "INST_RETIRED / CPU_CYCLES",
+ "BriefDescription": "This metric measures the number of instructions retired per cycle.",
+ "MetricGroup": "General",
+ "ScaleUnit": "1per cycle"
+ },
+ {
+ "MetricName": "itlb_mpki",
+ "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of instruction TLB Walks per thousand instructions executed.",
+ "MetricGroup": "MPKI;ITLB_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "itlb_walk_ratio",
+ "MetricExpr": "ITLB_WALK / L1I_TLB",
+ "BriefDescription": "This metric measures the ratio of instruction TLB Walks to the total number of instruction TLB accesses. This gives an indication of the effectiveness of the instruction TLB accesses.",
+ "MetricGroup": "Miss_Ratio;ITLB_Effectiveness",
+ "ScaleUnit": "100percent of TLB accesses"
+ },
+ {
+ "MetricName": "l1d_cache_miss_ratio",
+ "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
+ "BriefDescription": "This metric measures the ratio of level 1 data cache accesses missed to the total number of level 1 data cache accesses. This gives an indication of the effectiveness of the level 1 data cache.",
+ "MetricGroup": "Miss_Ratio;L1D_Cache_Effectiveness",
+ "ScaleUnit": "100percent of cache accesses"
+ },
+ {
+ "MetricName": "l1d_cache_mpki",
+ "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of level 1 data cache accesses missed per thousand instructions executed.",
+ "MetricGroup": "MPKI;L1D_Cache_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "l1d_tlb_miss_ratio",
+ "MetricExpr": "L1D_TLB_REFILL / L1D_TLB",
+ "BriefDescription": "This metric measures the ratio of level 1 data TLB accesses missed to the total number of level 1 data TLB accesses. This gives an indication of the effectiveness of the level 1 data TLB.",
+ "MetricGroup": "Miss_Ratio;DTLB_Effectiveness",
+ "ScaleUnit": "100percent of TLB accesses"
+ },
+ {
+ "MetricName": "l1d_tlb_mpki",
+ "MetricExpr": "L1D_TLB_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of level 1 data TLB accesses missed per thousand instructions executed.",
+ "MetricGroup": "MPKI;DTLB_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "l1i_cache_miss_ratio",
+ "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
+ "BriefDescription": "This metric measures the ratio of level 1 instruction cache accesses missed to the total number of level 1 instruction cache accesses. This gives an indication of the effectiveness of the level 1 instruction cache.",
+ "MetricGroup": "Miss_Ratio;L1I_Cache_Effectiveness",
+ "ScaleUnit": "100percent of cache accesses"
+ },
+ {
+ "MetricName": "l1i_cache_mpki",
+ "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of level 1 instruction cache accesses missed per thousand instructions executed.",
+ "MetricGroup": "MPKI;L1I_Cache_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "l1i_tlb_miss_ratio",
+ "MetricExpr": "L1I_TLB_REFILL / L1I_TLB",
+ "BriefDescription": "This metric measures the ratio of level 1 instruction TLB accesses missed to the total number of level 1 instruction TLB accesses. This gives an indication of the effectiveness of the level 1 instruction TLB.",
+ "MetricGroup": "Miss_Ratio;ITLB_Effectiveness",
+ "ScaleUnit": "100percent of TLB accesses"
+ },
+ {
+ "MetricName": "l1i_tlb_mpki",
+ "MetricExpr": "L1I_TLB_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of level 1 instruction TLB accesses missed per thousand instructions executed.",
+ "MetricGroup": "MPKI;ITLB_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "l2_cache_miss_ratio",
+ "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
+ "BriefDescription": "This metric measures the ratio of level 2 cache accesses missed to the total number of level 2 cache accesses. This gives an indication of the effectiveness of the level 2 cache, which is a unified cache that stores both data and instruction. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a unified cache.",
+ "MetricGroup": "Miss_Ratio;L2_Cache_Effectiveness",
+ "ScaleUnit": "100percent of cache accesses"
+ },
+ {
+ "MetricName": "l2_cache_mpki",
+ "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of level 2 unified cache accesses missed per thousand instructions executed. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a unified cache.",
+ "MetricGroup": "MPKI;L2_Cache_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "l2_tlb_miss_ratio",
+ "MetricExpr": "L2D_TLB_REFILL / L2D_TLB",
+ "BriefDescription": "This metric measures the ratio of level 2 unified TLB accesses missed to the total number of level 2 unified TLB accesses. This gives an indication of the effectiveness of the level 2 TLB.",
+ "MetricGroup": "Miss_Ratio;ITLB_Effectiveness;DTLB_Effectiveness",
+ "ScaleUnit": "100percent of TLB accesses"
+ },
+ {
+ "MetricName": "l2_tlb_mpki",
+ "MetricExpr": "L2D_TLB_REFILL / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of level 2 unified TLB accesses missed per thousand instructions executed.",
+ "MetricGroup": "MPKI;ITLB_Effectiveness;DTLB_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "ll_cache_read_hit_ratio",
+ "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
+ "BriefDescription": "This metric measures the ratio of last level cache read accesses hit in the cache to the total number of last level cache accesses. This gives an indication of the effectiveness of the last level cache for read traffic. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a system level cache.",
+ "MetricGroup": "LL_Cache_Effectiveness",
+ "ScaleUnit": "100percent of cache accesses"
+ },
+ {
+ "MetricName": "ll_cache_read_miss_ratio",
+ "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
+ "BriefDescription": "This metric measures the ratio of last level cache read accesses missed to the total number of last level cache accesses. This gives an indication of the effectiveness of the last level cache for read traffic. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a system level cache.",
+ "MetricGroup": "Miss_Ratio;LL_Cache_Effectiveness",
+ "ScaleUnit": "100percent of cache accesses"
+ },
+ {
+ "MetricName": "ll_cache_read_mpki",
+ "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
+ "BriefDescription": "This metric measures the number of last level cache read accesses missed per thousand instructions executed.",
+ "MetricGroup": "MPKI;LL_Cache_Effectiveness",
+ "ScaleUnit": "1MPKI"
+ },
+ {
+ "MetricName": "load_percentage",
+ "MetricExpr": "LD_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures load operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "scalar_fp_percentage",
+ "MetricExpr": "VFP_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures scalar floating point operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "simd_percentage",
+ "MetricExpr": "ASE_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures advanced SIMD operations as a percentage of total operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "store_percentage",
+ "MetricExpr": "ST_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures store operations as a percentage of operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ },
+ {
+ "MetricName": "sve_all_percentage",
+ "MetricExpr": "SVE_INST_SPEC / INST_SPEC * 100",
+ "BriefDescription": "This metric measures scalable vector operations, including loads and stores, as a percentage of operations speculatively executed.",
+ "MetricGroup": "Operation_Mix",
+ "ScaleUnit": "1percent of operations"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json
new file mode 100644
index 000000000000..d8b7b9f9e5fa
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json
@@ -0,0 +1,8 @@
+[
+ {
+ "ArchStdEvent": "PMU_OVFS"
+ },
+ {
+ "ArchStdEvent": "PMU_HOVFS"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json
new file mode 100644
index 000000000000..152f15c1253c
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json
@@ -0,0 +1,90 @@
+[
+ {
+ "ArchStdEvent": "SW_INCR",
+ "PublicDescription": "Counts software writes to the PMSWINC_EL0 (software PMU increment) register. The PMSWINC_EL0 register is a manually updated counter for use by application software.\n\nThis event could be used to measure any user program event, such as accesses to a particular data structure (by writing to the PMSWINC_EL0 register each time the data structure is accessed).\n\nTo use the PMSWINC_EL0 register and event, developers must insert instructions that write to the PMSWINC_EL0 register into the source code.\n\nSince the SW_INCR event records writes to the PMSWINC_EL0 register, there is no need to do a read/increment/write sequence to the PMSWINC_EL0 register."
+ },
+ {
+ "ArchStdEvent": "LD_RETIRED",
+ "PublicDescription": "Counts instruction architecturally executed, Condition code check pass, load."
+ },
+ {
+ "ArchStdEvent": "ST_RETIRED",
+ "PublicDescription": "Counts instruction architecturally executed, Condition code check pass, store."
+ },
+ {
+ "ArchStdEvent": "INST_RETIRED",
+ "PublicDescription": "Counts instructions that have been architecturally executed."
+ },
+ {
+ "ArchStdEvent": "CID_WRITE_RETIRED",
+ "PublicDescription": "Counts architecturally executed writes to the CONTEXTIDR_EL1 register, which usually contain the kernel PID and can be output with hardware trace."
+ },
+ {
+ "ArchStdEvent": "PC_WRITE_RETIRED",
+ "PublicDescription": "Counts branch instructions that caused a change of Program Counter, which effectively causes a change in the control flow of the program."
+ },
+ {
+ "ArchStdEvent": "BR_IMMED_RETIRED",
+ "PublicDescription": "Counts architecturally executed direct branches."
+ },
+ {
+ "ArchStdEvent": "BR_RETURN_RETIRED",
+ "PublicDescription": "Counts architecturally executed procedure returns."
+ },
+ {
+ "ArchStdEvent": "TTBR_WRITE_RETIRED",
+ "PublicDescription": "Counts architectural writes to TTBR0/1_EL1. If virtualization host extensions are enabled (by setting the HCR_EL2.E2H bit to 1), then accesses to TTBR0/1_EL1 that are redirected to TTBR0/1_EL2, or accesses to TTBR0/1_EL12, are counted. TTBRn registers are typically updated when the kernel is swapping user-space threads or applications."
+ },
+ {
+ "ArchStdEvent": "BR_RETIRED",
+ "PublicDescription": "Counts architecturally executed branches, whether the branch is taken or not. Instructions that explicitly write to the PC are also counted. Note that exception generating instructions, exception return instructions and context synchronization instructions are not counted."
+ },
+ {
+ "ArchStdEvent": "BR_MIS_PRED_RETIRED",
+ "PublicDescription": "Counts branches counted by BR_RETIRED which were mispredicted and caused a pipeline flush."
+ },
+ {
+ "ArchStdEvent": "OP_RETIRED",
+ "PublicDescription": "Counts micro-operations that are architecturally executed. This is a count of number of micro-operations retired from the commit queue in a single cycle."
+ },
+ {
+ "ArchStdEvent": "SVE_INST_RETIRED",
+ "PublicDescription": "Counts architecturally executed SVE instructions."
+ },
+ {
+ "ArchStdEvent": "BR_INDNR_TAKEN_RETIRED",
+ "PublicDescription": "Counts architecturally executed indirect branches excluding procedure returns that were taken."
+ },
+ {
+ "ArchStdEvent": "BR_IMMED_PRED_RETIRED",
+ "PublicDescription": "Counts architecturally executed direct branches that were correctly predicted."
+ },
+ {
+ "ArchStdEvent": "BR_IMMED_MIS_PRED_RETIRED",
+ "PublicDescription": "Counts architecturally executed direct branches that were mispredicted and caused a pipeline flush."
+ },
+ {
+ "ArchStdEvent": "BR_RETURN_PRED_RETIRED",
+ "PublicDescription": "Counts architecturally executed procedure returns that were correctly predicted."
+ },
+ {
+ "ArchStdEvent": "BR_RETURN_MIS_PRED_RETIRED",
+ "PublicDescription": "Counts architecturally executed procedure returns that were mispredicted and caused a pipeline flush."
+ },
+ {
+ "ArchStdEvent": "BR_INDNR_PRED_RETIRED",
+ "PublicDescription": "Counts architecturally executed indirect branches excluding procedure returns that were correctly predicted."
+ },
+ {
+ "ArchStdEvent": "BR_INDNR_MIS_PRED_RETIRED",
+ "PublicDescription": "Counts architecturally executed indirect branches excluding procedure returns that were mispredicted and caused a pipeline flush."
+ },
+ {
+ "ArchStdEvent": "BR_PRED_RETIRED",
+ "PublicDescription": "Counts branch instructions counted by BR_RETIRED which were correctly predicted."
+ },
+ {
+ "ArchStdEvent": "BR_IND_RETIRED",
+ "PublicDescription": "Counts architecturally executed indirect branches including procedure returns."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.json
new file mode 100644
index 000000000000..40c29be53cc0
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.json
@@ -0,0 +1,70 @@
+[
+ {
+ "ArchStdEvent": "BR_MIS_PRED",
+ "PublicDescription": "Counts branches which are speculatively executed and mispredicted."
+ },
+ {
+ "ArchStdEvent": "BR_PRED",
+ "PublicDescription": "Counts all speculatively executed branches."
+ },
+ {
+ "ArchStdEvent": "INST_SPEC",
+ "PublicDescription": "Counts operations that have been speculatively executed."
+ },
+ {
+ "ArchStdEvent": "OP_SPEC",
+ "PublicDescription": "Counts micro-operations speculatively executed. This is the count of the number of micro-operations dispatched in a cycle."
+ },
+ {
+ "ArchStdEvent": "STREX_FAIL_SPEC",
+ "PublicDescription": "Counts store-exclusive operations that have been speculatively executed and have not successfully completed the store operation."
+ },
+ {
+ "ArchStdEvent": "STREX_SPEC",
+ "PublicDescription": "Counts store-exclusive operations that have been speculatively executed."
+ },
+ {
+ "ArchStdEvent": "LD_SPEC",
+ "PublicDescription": "Counts speculatively executed load operations including Single Instruction Multiple Data (SIMD) load operations."
+ },
+ {
+ "ArchStdEvent": "ST_SPEC",
+ "PublicDescription": "Counts speculatively executed store operations including Single Instruction Multiple Data (SIMD) store operations."
+ },
+ {
+ "ArchStdEvent": "LDST_SPEC",
+ "PublicDescription": "Counts speculatively executed load and store operations."
+ },
+ {
+ "ArchStdEvent": "DP_SPEC",
+ "PublicDescription": "Counts speculatively executed logical or arithmetic instructions such as MOV/MVN operations."
+ },
+ {
+ "ArchStdEvent": "ASE_SPEC",
+ "PublicDescription": "Counts speculatively executed Advanced SIMD operations excluding load, store and move micro-operations that move data to or from SIMD (vector) registers."
+ },
+ {
+ "ArchStdEvent": "VFP_SPEC",
+ "PublicDescription": "Counts speculatively executed floating point operations. This event does not count operations that move data to or from floating point (vector) registers."
+ },
+ {
+ "ArchStdEvent": "PC_WRITE_SPEC",
+ "PublicDescription": "Counts speculatively executed operations which cause software changes of the PC. Those operations include all taken branch operations."
+ },
+ {
+ "ArchStdEvent": "CRYPTO_SPEC",
+ "PublicDescription": "Counts speculatively executed cryptographic operations except for PMULL and VMULL operations."
+ },
+ {
+ "ArchStdEvent": "BR_IMMED_SPEC",
+ "PublicDescription": "Counts direct branch operations which are speculatively executed."
+ },
+ {
+ "ArchStdEvent": "BR_RETURN_SPEC",
+ "PublicDescription": "Counts procedure return operations (RET, RETAA and RETAB) which are speculatively executed."
+ },
+ {
+ "ArchStdEvent": "BR_INDIRECT_SPEC",
+ "PublicDescription": "Counts indirect branch operations including procedure returns, which are speculatively executed. This includes operations that force a software change of the PC, other than exception-generating operations and direct branch instructions. Some examples of the instructions counted by this event include BR Xn, RET, etc..."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json
new file mode 100644
index 000000000000..d65aeb4b8808
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json
@@ -0,0 +1,82 @@
+[
+ {
+ "ArchStdEvent": "STALL_FRONTEND",
+ "PublicDescription": "Counts cycles when frontend could not send any micro-operations to the rename stage because of frontend resource stalls caused by fetch memory latency or branch prediction flow stalls. STALL_FRONTEND_SLOTS counts SLOTS during the cycle when this event counts."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND",
+ "PublicDescription": "Counts cycles whenever the rename unit is unable to send any micro-operations to the backend of the pipeline because of backend resource constraints. Backend resource constraints can include issue stage fullness, execution stage fullness, or other internal pipeline resource fullness. All the backend slots were empty during the cycle when this event counts."
+ },
+ {
+ "ArchStdEvent": "STALL",
+ "PublicDescription": "Counts cycles when no operations are sent to the rename unit from the frontend or from the rename unit to the backend for any reason (either frontend or backend stall). This event is the sum of STALL_FRONTEND and STALL_BACKEND"
+ },
+ {
+ "ArchStdEvent": "STALL_SLOT_BACKEND",
+ "PublicDescription": "Counts slots per cycle in which no operations are sent from the rename unit to the backend due to backend resource constraints. STALL_BACKEND counts during the cycle when STALL_SLOT_BACKEND counts at least 1."
+ },
+ {
+ "ArchStdEvent": "STALL_SLOT_FRONTEND",
+ "PublicDescription": "Counts slots per cycle in which no operations are sent to the rename unit from the frontend due to frontend resource constraints."
+ },
+ {
+ "ArchStdEvent": "STALL_SLOT",
+ "PublicDescription": "Counts slots per cycle in which no operations are sent to the rename unit from the frontend or from the rename unit to the backend for any reason (either frontend or backend stall). STALL_SLOT is the sum of STALL_SLOT_FRONTEND and STALL_SLOT_BACKEND."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_MEM",
+ "PublicDescription": "Counts cycles when the backend is stalled because there is a pending demand load request in progress in the last level core cache."
+ },
+ {
+ "ArchStdEvent": "STALL_FRONTEND_MEMBOUND",
+ "PublicDescription": "Counts cycles when the frontend could not send any micro-operations to the rename stage due to resource constraints in the memory resources."
+ },
+ {
+ "ArchStdEvent": "STALL_FRONTEND_L1I",
+ "PublicDescription": "Counts cycles when the frontend is stalled because there is an instruction fetch request pending in the level 1 instruction cache."
+ },
+ {
+ "ArchStdEvent": "STALL_FRONTEND_MEM",
+ "PublicDescription": "Counts cycles when the frontend is stalled because there is an instruction fetch request pending in the last level core cache."
+ },
+ {
+ "ArchStdEvent": "STALL_FRONTEND_TLB",
+ "PublicDescription": "Counts when the frontend is stalled on any TLB misses being handled. This event also counts the TLB accesses made by hardware prefetches."
+ },
+ {
+ "ArchStdEvent": "STALL_FRONTEND_CPUBOUND",
+ "PublicDescription": "Counts cycles when the frontend could not send any micro-operations to the rename stage due to resource constraints in the CPU resources excluding memory resources."
+ },
+ {
+ "ArchStdEvent": "STALL_FRONTEND_FLOW",
+ "PublicDescription": "Counts cycles when the frontend could not send any micro-operations to the rename stage due to resource constraints in the branch prediction unit."
+ },
+ {
+ "ArchStdEvent": "STALL_FRONTEND_FLUSH",
+ "PublicDescription": "Counts cycles when the frontend could not send any micro-operations to the rename stage as the frontend is recovering from a machine flush or resteer. Example scenarios that cause a flush include branch mispredictions, taken exceptions, micro-architectural flush etc."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_MEMBOUND",
+ "PublicDescription": "Counts cycles when the backend could not accept any micro-operations due to resource constraints in the memory resources."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_L1D",
+ "PublicDescription": "Counts cycles when the backend is stalled because there is a pending demand load request in progress in the level 1 data cache."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_TLB",
+ "PublicDescription": "Counts cycles when the backend is stalled on any demand TLB misses being handled."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_ST",
+ "PublicDescription": "Counts cycles when the backend is stalled and there is a store that has not reached the pre-commit stage."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_BUSY",
+ "PublicDescription": "Counts cycles when the backend could not accept any micro-operations because the issue queues are full to take any operations for execution."
+ },
+ {
+ "ArchStdEvent": "STALL_BACKEND_ILOCK",
+ "PublicDescription": "Counts cycles when the backend could not accept any micro-operations due to resource constraints imposed by input dependency."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json
new file mode 100644
index 000000000000..21810ce5de8d
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json
@@ -0,0 +1,22 @@
+[
+ {
+ "ArchStdEvent": "SVE_INST_SPEC",
+ "PublicDescription": "Counts speculatively executed operations that are SVE operations."
+ },
+ {
+ "ArchStdEvent": "ASE_SVE_INT8_SPEC",
+ "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type an 8-bit integer."
+ },
+ {
+ "ArchStdEvent": "ASE_SVE_INT16_SPEC",
+ "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 16-bit integer."
+ },
+ {
+ "ArchStdEvent": "ASE_SVE_INT32_SPEC",
+ "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 32-bit integer."
+ },
+ {
+ "ArchStdEvent": "ASE_SVE_INT64_SPEC",
+ "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 64-bit integer."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json
new file mode 100644
index 000000000000..1de56300e581
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json
@@ -0,0 +1,78 @@
+[
+ {
+ "ArchStdEvent": "L1I_TLB_REFILL",
+ "PublicDescription": "Counts level 1 instruction TLB refills from any Instruction fetch. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB."
+ },
+ {
+ "ArchStdEvent": "L1D_TLB_REFILL",
+ "PublicDescription": "Counts level 1 data TLB accesses that resulted in TLB refills. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event counts for refills caused by preload instructions or hardware prefetch accesses. This event counts regardless of whether the miss hits in L2 or results in a translation table walk. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB. This event will not count on an access from an AT(address translation) instruction."
+ },
+ {
+ "ArchStdEvent": "L1D_TLB",
+ "PublicDescription": "Counts level 1 data TLB accesses caused by any memory load or store operation. Note that load or store instructions can be broken up into multiple memory operations. This event does not count TLB maintenance operations."
+ },
+ {
+ "ArchStdEvent": "L1I_TLB",
+ "PublicDescription": "Counts level 1 instruction TLB accesses, whether the access hits or misses in the TLB. This event counts both demand accesses and prefetch or preload generated accesses."
+ },
+ {
+ "ArchStdEvent": "L2D_TLB_REFILL",
+ "PublicDescription": "Counts level 2 TLB refills caused by memory operations from both data and instruction fetch, except for those caused by TLB maintenance operations and hardware prefetches."
+ },
+ {
+ "ArchStdEvent": "L2D_TLB",
+ "PublicDescription": "Counts level 2 TLB accesses except those caused by TLB maintenance operations."
+ },
+ {
+ "ArchStdEvent": "DTLB_WALK",
+ "PublicDescription": "Counts number of demand data translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that partial translations that cause a translation table walk are also counted. Also note that this event counts walks triggered by software preloads, but not walks triggered by hardware prefetchers, and that this event does not count walks triggered by TLB maintenance operations."
+ },
+ {
+ "ArchStdEvent": "ITLB_WALK",
+ "PublicDescription": "Counts number of instruction translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that partial translations that cause a translation table walk are also counted. Also note that this event does not count walks triggered by TLB maintenance operations."
+ },
+ {
+ "ArchStdEvent": "DTLB_WALK_PERCYC",
+ "PublicDescription": "Counts the number of data translation table walks in progress per cycle."
+ },
+ {
+ "ArchStdEvent": "ITLB_WALK_PERCYC",
+ "PublicDescription": "Counts the number of instruction translation table walks in progress per cycle."
+ },
+ {
+ "ArchStdEvent": "DTLB_HWUPD",
+ "PublicDescription": "Counts number of memory accesses triggered by a data translation table walk and performing an update of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that this event counts accesses triggered by software preloads, but not accesses triggered by hardware prefetchers."
+ },
+ {
+ "ArchStdEvent": "ITLB_HWUPD",
+ "PublicDescription": "Counts number of memory accesses triggered by an instruction translation table walk and performing an update of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD."
+ },
+ {
+ "ArchStdEvent": "DTLB_STEP",
+ "PublicDescription": "Counts number of memory accesses triggered by a demand data translation table walk and performing a read of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that this event counts accesses triggered by software preloads, but not accesses triggered by hardware prefetchers."
+ },
+ {
+ "ArchStdEvent": "ITLB_STEP",
+ "PublicDescription": "Counts number of memory accesses triggered by an instruction translation table walk and performing a read of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD."
+ },
+ {
+ "ArchStdEvent": "DTLB_WALK_LARGE",
+ "PublicDescription": "Counts number of demand data translation table walks caused by a miss in the L2 TLB and yielding a large page. The set of large pages is defined as all pages with a final size higher than or equal to 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. If DTLB_WALK_BLOCK is implemented, then it is an alias for this event in this family. Note that partial translations that cause a translation table walk are also counted. Also note that this event counts walks triggered by software preloads, but not walks triggered by hardware prefetchers, and that this event does not count walks triggered by TLB maintenance operations."
+ },
+ {
+ "ArchStdEvent": "ITLB_WALK_LARGE",
+ "PublicDescription": "Counts number of instruction translation table walks caused by a miss in the L2 TLB and yielding a large page. The set of large pages is defined as all pages with a final size higher than or equal to 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. In this family, this is equal to ITLB_WALK_BLOCK event. Note that partial translations that cause a translation table walk are also counted. Also note that this event does not count walks triggered by TLB maintenance operations."
+ },
+ {
+ "ArchStdEvent": "DTLB_WALK_SMALL",
+ "PublicDescription": "Counts number of data translation table walks caused by a miss in the L2 TLB and yielding a small page. The set of small pages is defined as all pages with a final size lower than 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. If DTLB_WALK_PAGE event is implemented, then it is an alias for this event in this family. Note that partial translations that cause a translation table walk are also counted. Also note that this event counts walks triggered by software preloads, but not walks triggered by hardware prefetchers, and that this event does not count walks triggered by TLB maintenance operations."
+ },
+ {
+ "ArchStdEvent": "ITLB_WALK_SMALL",
+ "PublicDescription": "Counts number of instruction translation table walks caused by a miss in the L2 TLB and yielding a small page. The set of small pages is defined as all pages with a final size lower than 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. In this family, this is equal to ITLB_WALK_PAGE event. Note that partial translations that cause a translation table walk are also counted. Also note that this event does not count walks triggered by TLB maintenance operations."
+ },
+ {
+ "ArchStdEvent": "DTLB_WALK_RW",
+ "PublicDescription": "Counts number of demand data translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that partial translations that cause a translation table walk are also counted. Also note that this event does not count walks triggered by TLB maintenance operations."
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json
new file mode 100644
index 000000000000..33672a8711d4
--- /dev/null
+++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json
@@ -0,0 +1,32 @@
+[
+ {
+ "ArchStdEvent": "TRB_WRAP"
+ },
+ {
+ "ArchStdEvent": "TRB_TRIG"
+ },
+ {
+ "ArchStdEvent": "TRCEXTOUT0"
+ },
+ {
+ "ArchStdEvent": "TRCEXTOUT1"
+ },
+ {
+ "ArchStdEvent": "TRCEXTOUT2"
+ },
+ {
+ "ArchStdEvent": "TRCEXTOUT3"
+ },
+ {
+ "ArchStdEvent": "CTI_TRIGOUT4"
+ },
+ {
+ "ArchStdEvent": "CTI_TRIGOUT5"
+ },
+ {
+ "ArchStdEvent": "CTI_TRIGOUT6"
+ },
+ {
+ "ArchStdEvent": "CTI_TRIGOUT7"
+ }
+]
diff --git a/tools/perf/pmu-events/arch/arm64/common-and-microarch.json b/tools/perf/pmu-events/arch/arm64/common-and-microarch.json
index e40be37addf8..3e774c1e1413 100644
--- a/tools/perf/pmu-events/arch/arm64/common-and-microarch.json
+++ b/tools/perf/pmu-events/arch/arm64/common-and-microarch.json
@@ -1339,6 +1339,11 @@
"EventName": "INST_FETCH",
"BriefDescription": "Instruction memory access"
},
+ {
+ "EventCode": "0x8125",
+ "EventName": "BUS_REQ_RD_PERCYC",
+ "BriefDescription": "Bus read transactions in progress"
+ },
{
"EventCode": "0x8128",
"EventName": "DTLB_WALK_PERCYC",
@@ -1539,6 +1544,11 @@
"EventName": "L2D_CACHE_HWPRF",
"BriefDescription": "Level 2 data cache hardware prefetch."
},
+ {
+ "EventCode": "0x8156",
+ "EventName": "L3D_CACHE_HWPRF",
+ "BriefDescription": "Level 3 data cache hardware prefetch."
+ },
{
"EventCode": "0x8158",
"EventName": "STALL_FRONTEND_MEMBOUND",
@@ -1674,6 +1684,11 @@
"EventName": "DTLB_WALK_PAGE",
"BriefDescription": "Data TLB page translation table walk."
},
+ {
+ "EventCode": "0x818D",
+ "EventName": "BUS_REQ_RD",
+ "BriefDescription": "Bus request, read"
+ },
{
"EventCode": "0x818B",
"EventName": "ITLB_WALK_PAGE",
diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv b/tools/perf/pmu-events/arch/arm64/mapfile.csv
index ccfcae375750..6b98632636e1 100644
--- a/tools/perf/pmu-events/arch/arm64/mapfile.csv
+++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv
@@ -32,6 +32,7 @@
0x00000000410fd440,v1,arm/cortex-x1,core
0x00000000410fd4c0,v1,arm/cortex-x1,core
0x00000000410fd460,v1,arm/cortex-a510,core
+0x00000000410fd800,v1,arm/cortex-a520,core
0x00000000410fd470,v1,arm/cortex-a710,core
0x00000000410fd810,v1,arm/cortex-a720,core
0x00000000410fd480,v1,arm/cortex-x2,core
--
2.47.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/2] perf vendor events arm64: Add Cortex-A720 events/metrics
2025-02-13 15:12 ` [PATCH 1/2] perf vendor events arm64: Add Cortex-A720 events/metrics Yangyu Chen
@ 2025-02-13 16:49 ` Ian Rogers
0 siblings, 0 replies; 16+ messages in thread
From: Ian Rogers @ 2025-02-13 16:49 UTC (permalink / raw)
To: Yangyu Chen
Cc: linux-perf-users, John Garry, Will Deacon, James Clark,
Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, Liang Kan,
Yoshihiro Furudera, linux-arm-kernel, linux-kernel
On Thu, Feb 13, 2025 at 7:12 AM Yangyu Chen <cyy@cyyself.name> wrote:
>
> Add JSON files for Cortex-A720 events and metrics. Using the existing
> Neoverse N3 JSON files as a template, I manually checked the missing and
> extra events/metrics using my script [1] and modified them according to
> the Arm Cortex-A720 Core Technical Reference Manual [2].
>
> [1] https://github.com/cyyself/arm-pmu-check/tree/1075bebeb3f1441067448251a387df35af15bf16
> [2] https://developer.arm.com/documentation/102530/0002/Performance-Monitors-Extension-support-/Performance-monitors-events
>
> Signed-off-by: Yangyu Chen <cyy@cyyself.name>
> ---
> .../arch/arm64/arm/cortex-a720/bus.json | 18 +
> .../arch/arm64/arm/cortex-a720/exception.json | 62 +++
> .../arm64/arm/cortex-a720/fp_operation.json | 22 +
> .../arch/arm64/arm/cortex-a720/general.json | 10 +
> .../arch/arm64/arm/cortex-a720/l1d_cache.json | 50 ++
> .../arch/arm64/arm/cortex-a720/l1i_cache.json | 14 +
> .../arch/arm64/arm/cortex-a720/l2_cache.json | 62 +++
> .../arch/arm64/arm/cortex-a720/l3_cache.json | 22 +
> .../arch/arm64/arm/cortex-a720/ll_cache.json | 10 +
> .../arch/arm64/arm/cortex-a720/memory.json | 54 +++
> .../arch/arm64/arm/cortex-a720/metrics.json | 436 ++++++++++++++++++
> .../arch/arm64/arm/cortex-a720/pmu.json | 8 +
> .../arch/arm64/arm/cortex-a720/retired.json | 90 ++++
> .../arch/arm64/arm/cortex-a720/spe.json | 42 ++
> .../arm64/arm/cortex-a720/spec_operation.json | 90 ++++
> .../arch/arm64/arm/cortex-a720/stall.json | 82 ++++
> .../arch/arm64/arm/cortex-a720/sve.json | 50 ++
> .../arch/arm64/arm/cortex-a720/tlb.json | 74 +++
> .../arch/arm64/arm/cortex-a720/trace.json | 32 ++
> tools/perf/pmu-events/arch/arm64/mapfile.csv | 1 +
> 20 files changed, 1229 insertions(+)
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json
>
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json
> new file mode 100644
> index 000000000000..2e11a8c4a484
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json
> @@ -0,0 +1,18 @@
> +[
> + {
> + "ArchStdEvent": "BUS_ACCESS",
> + "PublicDescription": "Counts memory transactions issued by the CPU to the external bus, including snoop requests and snoop responses. Each beat of data is counted individually."
> + },
> + {
> + "ArchStdEvent": "BUS_CYCLES",
> + "PublicDescription": "Counts bus cycles in the CPU. Bus cycles represent a clock cycle in which a transaction could be sent or received on the interface from the CPU to the external bus. Since that interface is driven at the same clock speed as the CPU, this event is a duplicate of CPU_CYCLES."
> + },
> + {
> + "ArchStdEvent": "BUS_ACCESS_RD",
> + "PublicDescription": "Counts memory read transactions seen on the external bus. Each beat of data is counted individually."
> + },
> + {
> + "ArchStdEvent": "BUS_ACCESS_WR",
> + "PublicDescription": "Counts memory write transactions seen on the external bus. Each beat of data is counted individually."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json
> new file mode 100644
> index 000000000000..7126fbf292e0
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json
> @@ -0,0 +1,62 @@
> +[
> + {
> + "ArchStdEvent": "EXC_TAKEN",
> + "PublicDescription": "Counts any taken architecturally visible exceptions such as IRQ, FIQ, SError, and other synchronous exceptions. Exceptions are counted whether or not they are taken locally."
> + },
> + {
> + "ArchStdEvent": "EXC_RETURN",
> + "PublicDescription": "Counts any architecturally executed exception return instructions. For example: AArch64: ERET"
> + },
> + {
> + "ArchStdEvent": "EXC_UNDEF",
> + "PublicDescription": "Counts the number of synchronous exceptions which are taken locally that are due to attempting to execute an instruction that is UNDEFINED. Attempting to execute instruction bit patterns that have not been allocated. Attempting to execute instructions when they are disabled. Attempting to execute instructions at an inappropriate Exception level. Attempting to execute an instruction when the value of PSTATE.IL is 1."
> + },
> + {
> + "ArchStdEvent": "EXC_SVC",
> + "PublicDescription": "Counts SVC exceptions taken locally."
> + },
> + {
> + "ArchStdEvent": "EXC_PABORT",
> + "PublicDescription": "Counts synchronous exceptions that are taken locally and caused by Instruction Aborts."
> + },
> + {
> + "ArchStdEvent": "EXC_DABORT",
> + "PublicDescription": "Counts exceptions that are taken locally and are caused by data aborts or SErrors. Conditions that could cause those exceptions are attempting to read or write memory where the MMU generates a fault, attempting to read or write memory with a misaligned address, interrupts from the nSEI inputs and internally generated SErrors."
> + },
> + {
> + "ArchStdEvent": "EXC_IRQ",
> + "PublicDescription": "Counts IRQ exceptions including the virtual IRQs that are taken locally."
> + },
> + {
> + "ArchStdEvent": "EXC_FIQ",
> + "PublicDescription": "Counts FIQ exceptions including the virtual FIQs that are taken locally."
> + },
> + {
> + "ArchStdEvent": "EXC_SMC",
> + "PublicDescription": "Counts SMC exceptions take to EL3."
> + },
> + {
> + "ArchStdEvent": "EXC_HVC",
> + "PublicDescription": "Counts HVC exceptions taken to EL2."
> + },
> + {
> + "ArchStdEvent": "EXC_TRAP_PABORT",
> + "PublicDescription": "Counts exceptions which are traps not taken locally and are caused by Instruction Aborts. For example, attempting to execute an instruction with a misaligned PC."
> + },
> + {
> + "ArchStdEvent": "EXC_TRAP_DABORT",
> + "PublicDescription": "Counts exceptions which are traps not taken locally and are caused by Data Aborts or SError interrupts. Conditions that could cause those exceptions are:\n\n1. Attempting to read or write memory where the MMU generates a fault,\n2. Attempting to read or write memory with a misaligned address,\n3. Interrupts from the SEI input.\n4. internally generated SErrors."
> + },
> + {
> + "ArchStdEvent": "EXC_TRAP_OTHER",
> + "PublicDescription": "Counts the number of synchronous trap exceptions which are not taken locally and are not SVC, SMC, HVC, data aborts, Instruction Aborts, or interrupts."
> + },
> + {
> + "ArchStdEvent": "EXC_TRAP_IRQ",
> + "PublicDescription": "Counts IRQ exceptions including the virtual IRQs that are not taken locally."
> + },
> + {
> + "ArchStdEvent": "EXC_TRAP_FIQ",
> + "PublicDescription": "Counts FIQs which are not taken locally but taken from EL0, EL1,\n or EL2 to EL3 (which would be the normal behavior for FIQs when not executing\n in EL3)."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json
> new file mode 100644
> index 000000000000..cec3435ac766
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json
> @@ -0,0 +1,22 @@
> +[
> + {
> + "ArchStdEvent": "FP_HP_SPEC",
> + "PublicDescription": "Counts speculatively executed half precision floating point operations."
> + },
> + {
> + "ArchStdEvent": "FP_SP_SPEC",
> + "PublicDescription": "Counts speculatively executed single precision floating point operations."
> + },
> + {
> + "ArchStdEvent": "FP_DP_SPEC",
> + "PublicDescription": "Counts speculatively executed double precision floating point operations."
> + },
> + {
> + "ArchStdEvent": "FP_SCALE_OPS_SPEC",
> + "PublicDescription": "Counts speculatively executed scalable single precision floating point operations."
> + },
> + {
> + "ArchStdEvent": "FP_FIXED_OPS_SPEC",
> + "PublicDescription": "Counts speculatively executed non-scalable single precision floating point operations."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json
> new file mode 100644
> index 000000000000..c5dcdcf43c58
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json
> @@ -0,0 +1,10 @@
> +[
> + {
> + "ArchStdEvent": "CPU_CYCLES",
> + "PublicDescription": "Counts CPU clock cycles (not timer cycles). The clock measured by this event is defined as the physical clock driving the CPU logic."
> + },
> + {
> + "ArchStdEvent": "CNT_CYCLES",
> + "PublicDescription": "Increments at a constant frequency equal to the rate of increment of the System Counter, CNTPCT_EL0."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json
> new file mode 100644
> index 000000000000..a6fee569f4c6
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json
> @@ -0,0 +1,50 @@
> +[
> + {
> + "ArchStdEvent": "L1D_CACHE_REFILL",
> + "PublicDescription": "Counts level 1 data cache refills caused by speculatively executed load or store operations that missed in the level 1 data cache. This event only counts one event per cache line."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE",
> + "PublicDescription": "Counts level 1 data cache accesses from any load/store operations. Atomic operations that resolve in the CPUs caches (near atomic operations) counts as both a write access and read access. Each access to a cache line is counted including the multiple accesses caused by single instructions such as LDM or STM. Each access to other level 1 data or unified memory structures, for example refill buffers, write buffers, and write-back buffers, are also counted."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_WB",
> + "PublicDescription": "Counts write-backs of dirty data from the L1 data cache to the L2 cache. This occurs when either a dirty cache line is evicted from L1 data cache and allocated in the L2 cache or dirty data is written to the L2 and possibly to the next level of cache. This event counts both victim cache line evictions and cache write-backs from snoops or cache maintenance operations. The following cache operations are not counted:\n\n1. Invalidations which do not result in data being transferred out of the L1 (such as evictions of clean data),\n2. Full line writes which write to L2 without writing L1, such as write streaming mode."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_LMISS_RD",
> + "PublicDescription": "Counts cache line refills into the level 1 data cache from any memory read operations, that incurred additional latency."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_RD",
> + "PublicDescription": "Counts level 1 data cache accesses from any load operation. Atomic load operations that resolve in the CPUs caches counts as both a write access and read access."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_WR",
> + "PublicDescription": "Counts level 1 data cache accesses generated by store operations. This event also counts accesses caused by a DC ZVA (data cache zero, specified by virtual address) instruction. Near atomic operations that resolve in the CPUs caches count as a write access and read access."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_REFILL_INNER",
> + "PublicDescription": "Counts level 1 data cache refills where the cache line data came from caches inside the immediate cluster of the core."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_REFILL_OUTER",
> + "PublicDescription": "Counts level 1 data cache refills for which the cache line data came from outside the immediate cluster of the core, like an SLC in the system interconnect or DRAM."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_INVAL",
> + "PublicDescription": "Counts each explicit invalidation of a cache line in the level 1 data cache caused by:\n\n- Cache Maintenance Operations (CMO) that operate by a virtual address.\n- Broadcast cache coherency operations from another CPU in the system.\n\nThis event does not count for the following conditions:\n\n1. A cache refill invalidates a cache line.\n2. A CMO which is executed on that CPU and invalidates a cache line specified by set/way.\n\nNote that CMOs that operate by set/way cannot be broadcast from one CPU to another."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_RW",
> + "PublicDescription": "Counts level 1 data demand cache accesses from any load or store operation. Near atomic operations that resolve in the CPUs caches counts as both a write access and read access."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_PRF",
> + "BriefDescription": "This event counts fetch counted by either Level 1 data hardware prefetch or Level 1 data software prefetch."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_REFILL_PRF",
> + "BriefDescription": "This event counts hardware prefetch counted by L1D_CACHE_PRF that causes a refill of the Level 1 data cache from outside of the Level 1 data cache."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json
> new file mode 100644
> index 000000000000..633f1030359d
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json
> @@ -0,0 +1,14 @@
> +[
> + {
> + "ArchStdEvent": "L1I_CACHE_REFILL",
> + "PublicDescription": "Counts cache line refills in the level 1 instruction cache caused by a missed instruction fetch. Instruction fetches may include accessing multiple instructions, but the single cache line allocation is counted once."
> + },
> + {
> + "ArchStdEvent": "L1I_CACHE",
> + "PublicDescription": "Counts instruction fetches which access the level 1 instruction cache. Instruction cache accesses caused by cache maintenance operations are not counted."
> + },
> + {
> + "ArchStdEvent": "L1I_CACHE_LMISS",
> + "PublicDescription": "Counts cache line refills into the level 1 instruction cache, that incurred additional latency."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json
> new file mode 100644
> index 000000000000..3806fef42b30
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json
> @@ -0,0 +1,62 @@
> +[
> + {
> + "ArchStdEvent": "L2D_CACHE",
> + "PublicDescription": "Counts accesses to the level 2 cache due to data accesses. Level 2 cache is a unified cache for data and instruction accesses. Accesses are for misses in the first level data cache or translation resolutions due to accesses. This event also counts write back of dirty data from level 1 data cache to the L2 cache."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_REFILL",
> + "PublicDescription": "Counts cache line refills into the level 2 cache. Level 2 cache is a unified cache for data and instruction accesses. Accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_WB",
> + "PublicDescription": "Counts write-backs of data from the L2 cache to outside the CPU. This includes snoops to the L2 (from other CPUs) which return data even if the snoops cause an invalidation. L2 cache line invalidations which do not write data outside the CPU and snoops which return data from an L1 cache are not counted. Data would not be written outside the cache when invalidating a clean cache line."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_ALLOCATE",
> + "PublicDescription": "Counts level 2 cache line allocates that do not fetch data from outside the level 2 data or unified cache."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_RD",
> + "PublicDescription": "Counts level 2 data cache accesses due to memory read operations. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_WR",
> + "PublicDescription": "Counts level 2 cache accesses due to memory write operations. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_REFILL_RD",
> + "PublicDescription": "Counts refills for memory accesses due to memory read operation counted by L2D_CACHE_RD. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_REFILL_WR",
> + "PublicDescription": "Counts refills for memory accesses due to memory write operation counted by L2D_CACHE_WR. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_WB_VICTIM",
> + "PublicDescription": "Counts evictions from the level 2 cache because of a line being allocated into the L2 cache."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_WB_CLEAN",
> + "PublicDescription": "Counts write-backs from the level 2 cache that are a result of either:\n\n1. Cache maintenance operations,\n\n2. Snoop responses or,\n\n3. Direct cache transfers to another CPU due to a forwarding snoop request."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_INVAL",
> + "PublicDescription": "Counts each explicit invalidation of a cache line in the level 2 cache by cache maintenance operations that operate by a virtual address, or by external coherency operations. This event does not count if either:\n\n1. A cache refill invalidates a cache line or,\n2. A Cache Maintenance Operation (CMO), which invalidates a cache line specified by set/way, is executed on that CPU.\n\nCMOs that operate by set/way cannot be broadcast from one CPU to another."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_LMISS_RD",
> + "PublicDescription": "Counts cache line refills into the level 2 unified cache from any memory read operations that incurred additional latency."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_RW",
> + "PublicDescription": "Counts level 2 cache demand accesses from any load/store operations. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_PRF",
> + "PublicDescription": "Counts level 2 data cache accesses from software preload or prefetch instructions or hardware prefetcher."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_REFILL_PRF",
> + "PublicDescription": "Counts refills due to accesses generated as a result of prefetches."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json
> new file mode 100644
> index 000000000000..4a2e72fc5ada
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json
> @@ -0,0 +1,22 @@
> +[
> + {
> + "ArchStdEvent": "L3D_CACHE_ALLOCATE",
> + "PublicDescription": "Counts level 3 cache line allocates that do not fetch data from outside the level 3 data or unified cache. For example, allocates due to streaming stores."
> + },
> + {
> + "ArchStdEvent": "L3D_CACHE_REFILL",
> + "PublicDescription": "Counts level 3 accesses that receive data from outside the L3 cache."
> + },
> + {
> + "ArchStdEvent": "L3D_CACHE",
> + "PublicDescription": "Counts level 3 cache accesses. Level 3 cache is a unified cache for data and instruction accesses. Accesses are for misses in the lower level caches or translation resolutions due to accesses."
> + },
> + {
> + "ArchStdEvent": "L3D_CACHE_RD",
> + "PublicDescription": "Counts level 3 cache accesses caused by any memory read operation. Level 3 cache is a unified cache for data and instruction accesses. Accesses are for misses in the lower level caches or translation resolutions due to accesses."
> + },
> + {
> + "ArchStdEvent": "L3D_CACHE_LMISS_RD",
> + "PublicDescription": "Counts any cache line refill into the level 3 cache from memory read operations that incurred additional latency."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json
> new file mode 100644
> index 000000000000..fd5a2e0099b8
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json
> @@ -0,0 +1,10 @@
> +[
> + {
> + "ArchStdEvent": "LL_CACHE_RD",
> + "PublicDescription": "Counts read transactions that were returned from outside the core cluster. This event counts for external last level cache when the system register CPUECTLR.EXTLLC bit is set, otherwise it counts for the L3 cache. This event counts read transactions returned from outside the core if those transactions are either hit in the system level cache or missed in the SLC and are returned from any other external sources."
> + },
> + {
> + "ArchStdEvent": "LL_CACHE_MISS_RD",
> + "PublicDescription": "Counts read transactions that were returned from outside the core cluster but missed in the system level cache. This event counts for external last level cache when the system register CPUECTLR.EXTLLC bit is set, otherwise it counts for L3 cache. This event counts read transactions returned from outside the core if those transactions are missed in the System level Cache. The data source of the transaction is indicated by a field in the CHI transaction returning to the CPU. This event does not count reads caused by cache maintenance operations."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json
> new file mode 100644
> index 000000000000..f19204a5faae
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json
> @@ -0,0 +1,54 @@
> +[
> + {
> + "ArchStdEvent": "MEM_ACCESS",
> + "PublicDescription": "Counts memory accesses issued by the CPU load store unit, where those accesses are issued due to load or store operations. This event counts memory accesses no matter whether the data is received from any level of cache hierarchy or external memory. If memory accesses are broken up into smaller transactions than what were specified in the load or store instructions, then the event counts those smaller memory transactions."
> + },
> + {
> + "ArchStdEvent": "REMOTE_ACCESS",
> + "PublicDescription": "Counts accesses to another chip, which is implemented as a different CMN mesh in the system. If the CHI bus response back to the core indicates that the data source is from another chip (mesh), then the counter is updated. If no data is returned, even if the system snoops another chip/mesh, then the counter is not updated."
> + },
> + {
> + "ArchStdEvent": "MEM_ACCESS_RD",
> + "PublicDescription": "Counts memory accesses issued by the CPU due to load operations. The event counts any memory load access, no matter whether the data is received from any level of cache hierarchy or external memory. The event also counts atomic load operations. If memory accesses are broken up by the load/store unit into smaller transactions that are issued by the bus interface, then the event counts those smaller transactions."
> + },
> + {
> + "ArchStdEvent": "MEM_ACCESS_WR",
> + "PublicDescription": "Counts memory accesses issued by the CPU due to store operations. The event counts any memory store access, no matter whether the data is located in any level of cache or external memory. The event also counts atomic load and store operations. If memory accesses are broken up by the load/store unit into smaller transactions that are issued by the bus interface, then the event counts those smaller transactions."
> + },
> + {
> + "ArchStdEvent": "LDST_ALIGN_LAT",
> + "PublicDescription": "Counts the number of memory read and write accesses in a cycle that incurred additional latency, due to the alignment of the address and the size of data being accessed, which results in store crossing a single cache line."
> + },
> + {
> + "ArchStdEvent": "LD_ALIGN_LAT",
> + "PublicDescription": "Counts the number of memory read accesses in a cycle that incurred additional latency, due to the alignment of the address and size of data being accessed, which results in load crossing a single cache line."
> + },
> + {
> + "ArchStdEvent": "ST_ALIGN_LAT",
> + "PublicDescription": "Counts the number of memory write access in a cycle that incurred additional latency, due to the alignment of the address and size of data being accessed incurred additional latency."
> + },
> + {
> + "ArchStdEvent": "MEM_ACCESS_CHECKED",
> + "PublicDescription": "Counts the number of memory read and write accesses counted by MEM_ACCESS that are tag checked by the Memory Tagging Extension (MTE). This event is implemented as the sum of MEM_ACCESS_CHECKED_RD and MEM_ACCESS_CHECKED_WR"
> + },
> + {
> + "ArchStdEvent": "MEM_ACCESS_CHECKED_RD",
> + "PublicDescription": "Counts the number of memory read accesses in a cycle that are tag checked by the Memory Tagging Extension (MTE)."
> + },
> + {
> + "ArchStdEvent": "MEM_ACCESS_CHECKED_WR",
> + "PublicDescription": "Counts the number of memory write accesses in a cycle that is tag checked by the Memory Tagging Extension (MTE)."
> + },
> + {
> + "ArchStdEvent": "INST_FETCH_PERCYC",
> + "PublicDescription": "Counts number of instruction fetches outstanding per cycle, which will provide an average latency of instruction fetch."
> + },
> + {
> + "ArchStdEvent": "MEM_ACCESS_RD_PERCYC",
> + "PublicDescription": "Counts the number of outstanding loads or memory read accesses per cycle."
> + },
> + {
> + "ArchStdEvent": "INST_FETCH",
> + "PublicDescription": "Counts Instruction memory accesses that the PE makes."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json
> new file mode 100644
> index 000000000000..d8e8b5155cfa
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json
> @@ -0,0 +1,436 @@
> +[
> + {
> + "ArchStdEvent": "backend_bound"
> + },
> + {
> + "MetricName": "backend_busy_bound",
> + "MetricExpr": "STALL_BACKEND_BUSY / STALL_BACKEND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to issue queues being full to accept operations for execution.",
> + "MetricGroup": "Topdown_Backend",
On Intel, topdown metrics are shown by default when no options are
passed to `perf stat`. This achieved by putting "Default" into the
MetricGroup. I wanted to raise this in case it was an oversight not to
add this metric to the Default metric group like
"Topdown_Backend;Default".
> + "ScaleUnit": "1percent of cycles"
Alternatively you can remove the "* 100" in the MetricExpr and have
the ScaleUnit be "100percent of cycles".
> + },
> + {
> + "MetricName": "backend_cache_l1d_bound",
> + "MetricExpr": "STALL_BACKEND_L1D / (STALL_BACKEND_L1D + STALL_BACKEND_MEM) * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory access latency issues caused by level 1 data cache misses.",
> + "MetricGroup": "Topdown_Backend",
> + "ScaleUnit": "1percent of cycles"
> + },
As above, I'll not repeat for each metric.
> + {
> + "MetricName": "backend_cache_l2d_bound",
> + "MetricExpr": "STALL_BACKEND_MEM / (STALL_BACKEND_L1D + STALL_BACKEND_MEM) * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory access latency issues caused by level 2 data cache misses.",
> + "MetricGroup": "Topdown_Backend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "backend_core_bound",
> + "MetricExpr": "STALL_BACKEND_CPUBOUND / STALL_BACKEND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to backend core resource constraints not related to instruction fetch latency issues caused by memory access components.",
> + "MetricGroup": "Topdown_Backend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "backend_core_rename_bound",
> + "MetricExpr": "STALL_BACKEND_RENAME / STALL_BACKEND_CPUBOUND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the backend as the rename unit registers are unavailable.",
> + "MetricGroup": "Topdown_Backend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "backend_mem_bound",
> + "MetricExpr": "STALL_BACKEND_MEMBOUND / STALL_BACKEND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to backend core resource constraints related to memory access latency issues caused by memory access components.",
> + "MetricGroup": "Topdown_Backend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "backend_mem_cache_bound",
> + "MetricExpr": "(STALL_BACKEND_L1D + STALL_BACKEND_MEM) / STALL_BACKEND_MEMBOUND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory latency issues caused by data cache misses.",
> + "MetricGroup": "Topdown_Backend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "backend_mem_store_bound",
> + "MetricExpr": "STALL_BACKEND_ST / STALL_BACKEND_MEMBOUND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to memory write pending caused by stores stalled in the pre-commit stage.",
> + "MetricGroup": "Topdown_Backend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "backend_mem_tlb_bound",
> + "MetricExpr": "STALL_BACKEND_TLB / STALL_BACKEND_MEMBOUND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory access latency issues caused by data TLB misses.",
> + "MetricGroup": "Topdown_Backend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "backend_stalled_cycles",
> + "MetricExpr": "STALL_BACKEND / CPU_CYCLES * 100",
> + "BriefDescription": "This metric is the percentage of cycles that were stalled due to resource constraints in the backend unit of the processor.",
> + "MetricGroup": "Cycle_Accounting",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "ArchStdEvent": "bad_speculation",
> + "MetricExpr": "(1 - STALL_SLOT / (10 * CPU_CYCLES)) * (1 - OP_RETIRED / OP_SPEC) * 100 + STALL_FRONTEND_FLUSH / CPU_CYCLES * 100"
> + },
> + {
> + "MetricName": "barrier_percentage",
> + "MetricExpr": "(ISB_SPEC + DSB_SPEC + DMB_SPEC) / INST_SPEC * 100",
> + "BriefDescription": "This metric measures instruction and data barrier operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "branch_direct_ratio",
> + "MetricExpr": "BR_IMMED_RETIRED / BR_RETIRED",
> + "BriefDescription": "This metric measures the ratio of direct branches retired to the total number of branches architecturally executed.",
> + "MetricGroup": "Branch_Effectiveness",
> + "ScaleUnit": "1per branch"
> + },
> + {
> + "MetricName": "branch_indirect_ratio",
> + "MetricExpr": "BR_IND_RETIRED / BR_RETIRED",
> + "BriefDescription": "This metric measures the ratio of indirect branches retired, including function returns, to the total number of branches architecturally executed.",
> + "MetricGroup": "Branch_Effectiveness",
> + "ScaleUnit": "1per branch"
> + },
> + {
> + "MetricName": "branch_misprediction_ratio",
> + "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED",
> + "BriefDescription": "This metric measures the ratio of branches mispredicted to the total number of branches architecturally executed. This gives an indication of the effectiveness of the branch prediction unit.",
> + "MetricGroup": "Miss_Ratio;Branch_Effectiveness",
> + "ScaleUnit": "100percent of branches"
> + },
> + {
> + "MetricName": "branch_mpki",
> + "MetricExpr": "BR_MIS_PRED_RETIRED / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of branch mispredictions per thousand instructions executed.",
> + "MetricGroup": "MPKI;Branch_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "branch_return_ratio",
> + "MetricExpr": "BR_RETURN_RETIRED / BR_RETIRED",
> + "BriefDescription": "This metric measures the ratio of branches retired that are function returns to the total number of branches architecturally executed.",
> + "MetricGroup": "Branch_Effectiveness",
> + "ScaleUnit": "1per branch"
> + },
> + {
> + "MetricName": "crypto_percentage",
> + "MetricExpr": "CRYPTO_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures crypto operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "dtlb_mpki",
> + "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of data TLB Walks per thousand instructions executed.",
> + "MetricGroup": "MPKI;DTLB_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "dtlb_walk_ratio",
> + "MetricExpr": "DTLB_WALK / L1D_TLB",
> + "BriefDescription": "This metric measures the ratio of data TLB Walks to the total number of data TLB accesses. This gives an indication of the effectiveness of the data TLB accesses.",
> + "MetricGroup": "Miss_Ratio;DTLB_Effectiveness",
> + "ScaleUnit": "100percent of TLB accesses"
> + },
> + {
> + "MetricName": "fp16_percentage",
> + "MetricExpr": "FP_HP_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures half-precision floating point operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "FP_Precision_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "fp32_percentage",
> + "MetricExpr": "FP_SP_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures single-precision floating point operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "FP_Precision_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "fp64_percentage",
> + "MetricExpr": "FP_DP_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures double-precision floating point operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "FP_Precision_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "fp_ops_per_cycle",
> + "MetricExpr": "(FP_SCALE_OPS_SPEC + FP_FIXED_OPS_SPEC) / CPU_CYCLES",
> + "BriefDescription": "This metric measures floating point operations per cycle in any precision performed by any instruction. Operations are counted by computation and by vector lanes, fused computations such as multiply-add count as twice per vector lane for example.",
> + "MetricGroup": "FP_Arithmetic_Intensity",
> + "ScaleUnit": "1operations per cycle"
> + },
> + {
> + "MetricName": "frontend_cache_l1i_bound",
> + "MetricExpr": "STALL_FRONTEND_L1I / (STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to memory access latency issues caused by level 1 instruction cache misses.",
> + "MetricGroup": "Topdown_Frontend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "frontend_cache_l2i_bound",
> + "MetricExpr": "STALL_FRONTEND_MEM / (STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to memory access latency issues caused by level 2 instruction cache misses.",
> + "MetricGroup": "Topdown_Frontend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "frontend_core_bound",
> + "MetricExpr": "STALL_FRONTEND_CPUBOUND / STALL_FRONTEND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to frontend core resource constraints not related to instruction fetch latency issues caused by memory access components.",
> + "MetricGroup": "Topdown_Frontend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "frontend_core_flush_bound",
> + "MetricExpr": "STALL_FRONTEND_FLUSH / STALL_FRONTEND_CPUBOUND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend as the processor is recovering from a pipeline flush caused by bad speculation or other machine resteers.",
> + "MetricGroup": "Topdown_Frontend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "frontend_mem_bound",
> + "MetricExpr": "STALL_FRONTEND_MEMBOUND / STALL_FRONTEND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to frontend core resource constraints related to the instruction fetch latency issues caused by memory access components.",
> + "MetricGroup": "Topdown_Frontend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "frontend_mem_cache_bound",
> + "MetricExpr": "(STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) / STALL_FRONTEND_MEMBOUND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to instruction fetch latency issues caused by instruction cache misses.",
> + "MetricGroup": "Topdown_Frontend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "frontend_mem_tlb_bound",
> + "MetricExpr": "STALL_FRONTEND_TLB / STALL_FRONTEND_MEMBOUND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to instruction fetch latency issues caused by instruction TLB misses.",
> + "MetricGroup": "Topdown_Frontend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "frontend_stalled_cycles",
> + "MetricExpr": "STALL_FRONTEND / CPU_CYCLES * 100",
> + "BriefDescription": "This metric is the percentage of cycles that were stalled due to resource constraints in the frontend unit of the processor.",
> + "MetricGroup": "Cycle_Accounting",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "integer_dp_percentage",
> + "MetricExpr": "DP_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures scalar integer operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "ipc",
> + "MetricExpr": "INST_RETIRED / CPU_CYCLES",
> + "BriefDescription": "This metric measures the number of instructions retired per cycle.",
> + "MetricGroup": "General",
> + "ScaleUnit": "1per cycle"
> + },
> + {
> + "MetricName": "itlb_mpki",
> + "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of instruction TLB Walks per thousand instructions executed.",
> + "MetricGroup": "MPKI;ITLB_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "itlb_walk_ratio",
> + "MetricExpr": "ITLB_WALK / L1I_TLB",
> + "BriefDescription": "This metric measures the ratio of instruction TLB Walks to the total number of instruction TLB accesses. This gives an indication of the effectiveness of the instruction TLB accesses.",
> + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness",
> + "ScaleUnit": "100percent of TLB accesses"
> + },
> + {
> + "MetricName": "l1d_cache_miss_ratio",
> + "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
> + "BriefDescription": "This metric measures the ratio of level 1 data cache accesses missed to the total number of level 1 data cache accesses. This gives an indication of the effectiveness of the level 1 data cache.",
> + "MetricGroup": "Miss_Ratio;L1D_Cache_Effectiveness",
> + "ScaleUnit": "100percent of cache accesses"
> + },
> + {
> + "MetricName": "l1d_cache_mpki",
> + "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of level 1 data cache accesses missed per thousand instructions executed.",
> + "MetricGroup": "MPKI;L1D_Cache_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "l1d_tlb_miss_ratio",
> + "MetricExpr": "L1D_TLB_REFILL / L1D_TLB",
> + "BriefDescription": "This metric measures the ratio of level 1 data TLB accesses missed to the total number of level 1 data TLB accesses. This gives an indication of the effectiveness of the level 1 data TLB.",
> + "MetricGroup": "Miss_Ratio;DTLB_Effectiveness",
> + "ScaleUnit": "100percent of TLB accesses"
> + },
> + {
> + "MetricName": "l1d_tlb_mpki",
> + "MetricExpr": "L1D_TLB_REFILL / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of level 1 data TLB accesses missed per thousand instructions executed.",
> + "MetricGroup": "MPKI;DTLB_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "l1i_cache_miss_ratio",
> + "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
> + "BriefDescription": "This metric measures the ratio of level 1 instruction cache accesses missed to the total number of level 1 instruction cache accesses. This gives an indication of the effectiveness of the level 1 instruction cache.",
> + "MetricGroup": "Miss_Ratio;L1I_Cache_Effectiveness",
> + "ScaleUnit": "100percent of cache accesses"
> + },
> + {
> + "MetricName": "l1i_cache_mpki",
> + "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of level 1 instruction cache accesses missed per thousand instructions executed.",
> + "MetricGroup": "MPKI;L1I_Cache_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "l1i_tlb_miss_ratio",
> + "MetricExpr": "L1I_TLB_REFILL / L1I_TLB",
> + "BriefDescription": "This metric measures the ratio of level 1 instruction TLB accesses missed to the total number of level 1 instruction TLB accesses. This gives an indication of the effectiveness of the level 1 instruction TLB.",
> + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness",
> + "ScaleUnit": "100percent of TLB accesses"
> + },
> + {
> + "MetricName": "l1i_tlb_mpki",
> + "MetricExpr": "L1I_TLB_REFILL / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of level 1 instruction TLB accesses missed per thousand instructions executed.",
> + "MetricGroup": "MPKI;ITLB_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "l2_cache_miss_ratio",
> + "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
> + "BriefDescription": "This metric measures the ratio of level 2 cache accesses missed to the total number of level 2 cache accesses. This gives an indication of the effectiveness of the level 2 cache, which is a unified cache that stores both data and instruction. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a unified cache.",
> + "MetricGroup": "Miss_Ratio;L2_Cache_Effectiveness",
> + "ScaleUnit": "100percent of cache accesses"
> + },
> + {
> + "MetricName": "l2_cache_mpki",
> + "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of level 2 unified cache accesses missed per thousand instructions executed. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a unified cache.",
> + "MetricGroup": "MPKI;L2_Cache_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "l2_tlb_miss_ratio",
> + "MetricExpr": "L2D_TLB_REFILL / L2D_TLB",
> + "BriefDescription": "This metric measures the ratio of level 2 unified TLB accesses missed to the total number of level 2 unified TLB accesses. This gives an indication of the effectiveness of the level 2 TLB.",
> + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness;DTLB_Effectiveness",
> + "ScaleUnit": "100percent of TLB accesses"
> + },
> + {
> + "MetricName": "l2_tlb_mpki",
> + "MetricExpr": "L2D_TLB_REFILL / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of level 2 unified TLB accesses missed per thousand instructions executed.",
> + "MetricGroup": "MPKI;ITLB_Effectiveness;DTLB_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "ll_cache_read_hit_ratio",
> + "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
> + "BriefDescription": "This metric measures the ratio of last level cache read accesses hit in the cache to the total number of last level cache accesses. This gives an indication of the effectiveness of the last level cache for read traffic. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a system level cache.",
> + "MetricGroup": "LL_Cache_Effectiveness",
> + "ScaleUnit": "100percent of cache accesses"
> + },
> + {
> + "MetricName": "ll_cache_read_miss_ratio",
> + "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
> + "BriefDescription": "This metric measures the ratio of last level cache read accesses missed to the total number of last level cache accesses. This gives an indication of the effectiveness of the last level cache for read traffic. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a system level cache.",
> + "MetricGroup": "Miss_Ratio;LL_Cache_Effectiveness",
> + "ScaleUnit": "100percent of cache accesses"
> + },
> + {
> + "MetricName": "ll_cache_read_mpki",
> + "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of last level cache read accesses missed per thousand instructions executed.",
> + "MetricGroup": "MPKI;LL_Cache_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "load_percentage",
> + "MetricExpr": "LD_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures load operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "nonsve_fp_ops_per_cycle",
> + "MetricExpr": "FP_FIXED_OPS_SPEC / CPU_CYCLES",
> + "BriefDescription": "This metric measures floating point operations per cycle in any precision performed by an instruction that is not an SVE instruction. Operations are counted by computation and by vector lanes, fused computations such as multiply-add count as twice per vector lane for example.",
> + "MetricGroup": "FP_Arithmetic_Intensity",
> + "ScaleUnit": "1operations per cycle"
> + },
> + {
> + "MetricName": "scalar_fp_percentage",
> + "MetricExpr": "VFP_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures scalar floating point operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "simd_percentage",
> + "MetricExpr": "ASE_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures advanced SIMD operations as a percentage of total operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "store_percentage",
> + "MetricExpr": "ST_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures store operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "sve_all_percentage",
> + "MetricExpr": "SVE_INST_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures scalable vector operations, including loads and stores, as a percentage of operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "sve_fp_ops_per_cycle",
> + "MetricExpr": "FP_SCALE_OPS_SPEC / CPU_CYCLES",
> + "BriefDescription": "This metric measures floating point operations per cycle in any precision performed by SVE instructions. Operations are counted by computation and by vector lanes, fused computations such as multiply-add count as twice per vector lane for example.",
> + "MetricGroup": "FP_Arithmetic_Intensity",
> + "ScaleUnit": "1operations per cycle"
> + },
> + {
> + "MetricName": "sve_predicate_empty_percentage",
> + "MetricExpr": "SVE_PRED_EMPTY_SPEC / SVE_PRED_SPEC * 100",
> + "BriefDescription": "This metric measures scalable vector operations with no active predicates as a percentage of sve predicated operations speculatively executed.",
> + "MetricGroup": "SVE_Effectiveness",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "sve_predicate_full_percentage",
> + "MetricExpr": "SVE_PRED_FULL_SPEC / SVE_PRED_SPEC * 100",
> + "BriefDescription": "This metric measures scalable vector operations with all active predicates as a percentage of sve predicated operations speculatively executed.",
> + "MetricGroup": "SVE_Effectiveness",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "sve_predicate_partial_percentage",
> + "MetricExpr": "SVE_PRED_PARTIAL_SPEC / SVE_PRED_SPEC * 100",
> + "BriefDescription": "This metric measures scalable vector operations with at least one active predicates as a percentage of sve predicated operations speculatively executed.",
> + "MetricGroup": "SVE_Effectiveness",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "sve_predicate_percentage",
> + "MetricExpr": "SVE_PRED_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures scalable vector operations with predicates as a percentage of operations speculatively executed.",
> + "MetricGroup": "SVE_Effectiveness",
> + "ScaleUnit": "1percent of operations"
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json
> new file mode 100644
> index 000000000000..d8b7b9f9e5fa
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json
> @@ -0,0 +1,8 @@
> +[
> + {
> + "ArchStdEvent": "PMU_OVFS"
> + },
> + {
> + "ArchStdEvent": "PMU_HOVFS"
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json
> new file mode 100644
> index 000000000000..69f9a0b0c7ff
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json
> @@ -0,0 +1,90 @@
> +[
> + {
> + "ArchStdEvent": "SW_INCR",
> + "PublicDescription": "Counts software writes to the PMSWINC_EL0 (software PMU increment) register. The PMSWINC_EL0 register is a manually updated counter for use by application software.\n\nThis event could be used to measure any user program event, such as accesses to a particular data structure (by writing to the PMSWINC_EL0 register each time the data structure is accessed).\n\nTo use the PMSWINC_EL0 register and event, developers must insert instructions that write to the PMSWINC_EL0 register into the source code.\n\nSince the SW_INCR event records writes to the PMSWINC_EL0 register, there is no need to do a read/increment/write sequence to the PMSWINC_EL0 register."
> + },
> + {
> + "ArchStdEvent": "INST_RETIRED",
> + "PublicDescription": "Counts instructions that have been architecturally executed."
> + },
> + {
> + "ArchStdEvent": "CID_WRITE_RETIRED",
> + "PublicDescription": "Counts architecturally executed writes to the CONTEXTIDR_EL1 register, which usually contain the kernel PID and can be output with hardware trace."
> + },
> + {
> + "ArchStdEvent": "PC_WRITE_RETIRED",
> + "PublicDescription": "Counts branch instructions that caused a change of Program Counter, which effectively causes a change in the control flow of the program."
> + },
> + {
> + "ArchStdEvent": "BR_IMMED_RETIRED",
> + "PublicDescription": "Counts architecturally executed direct branches."
> + },
> + {
> + "ArchStdEvent": "BR_RETURN_RETIRED",
> + "PublicDescription": "Counts architecturally executed procedure returns."
> + },
> + {
> + "ArchStdEvent": "TTBR_WRITE_RETIRED",
> + "PublicDescription": "Counts architectural writes to TTBR0/1_EL1. If virtualization host extensions are enabled (by setting the HCR_EL2.E2H bit to 1), then accesses to TTBR0/1_EL1 that are redirected to TTBR0/1_EL2, or accesses to TTBR0/1_EL12, are counted. TTBRn registers are typically updated when the kernel is swapping user-space threads or applications."
> + },
> + {
> + "ArchStdEvent": "BR_RETIRED",
> + "PublicDescription": "Counts architecturally executed branches, whether the branch is taken or not. Instructions that explicitly write to the PC are also counted. Note that exception generating instructions, exception return instructions and context synchronization instructions are not counted."
> + },
> + {
> + "ArchStdEvent": "BR_MIS_PRED_RETIRED",
> + "PublicDescription": "Counts branches counted by BR_RETIRED which were mispredicted and caused a pipeline flush."
> + },
> + {
> + "ArchStdEvent": "OP_RETIRED",
> + "PublicDescription": "Counts micro-operations that are architecturally executed. This is a count of number of micro-operations retired from the commit queue in a single cycle."
> + },
> + {
> + "ArchStdEvent": "BR_IMMED_TAKEN_RETIRED",
> + "PublicDescription": "Counts architecturally executed immediate branches that were taken."
> + },
> + {
> + "ArchStdEvent": "BR_INDNR_TAKEN_RETIRED",
> + "PublicDescription": "Counts architecturally executed indirect branches excluding procedure returns that were taken."
> + },
> + {
> + "ArchStdEvent": "BR_IMMED_PRED_RETIRED",
> + "PublicDescription": "Counts architecturally executed direct branches that were correctly predicted."
> + },
> + {
> + "ArchStdEvent": "BR_IMMED_MIS_PRED_RETIRED",
> + "PublicDescription": "Counts architecturally executed direct branches that were mispredicted and caused a pipeline flush."
> + },
> + {
> + "ArchStdEvent": "BR_IND_PRED_RETIRED",
> + "PublicDescription": "Counts architecturally executed indirect branches including procedure returns that were correctly predicted."
> + },
> + {
> + "ArchStdEvent": "BR_IND_MIS_PRED_RETIRED",
> + "PublicDescription": "Counts architecturally executed indirect branches including procedure returns that were mispredicted and caused a pipeline flush."
> + },
> + {
> + "ArchStdEvent": "BR_RETURN_PRED_RETIRED",
> + "PublicDescription": "Counts architecturally executed procedure returns that were correctly predicted."
> + },
> + {
> + "ArchStdEvent": "BR_RETURN_MIS_PRED_RETIRED",
> + "PublicDescription": "Counts architecturally executed procedure returns that were mispredicted and caused a pipeline flush."
> + },
> + {
> + "ArchStdEvent": "BR_INDNR_PRED_RETIRED",
> + "PublicDescription": "Counts architecturally executed indirect branches excluding procedure returns that were correctly predicted."
> + },
> + {
> + "ArchStdEvent": "BR_INDNR_MIS_PRED_RETIRED",
> + "PublicDescription": "Counts architecturally executed indirect branches excluding procedure returns that were mispredicted and caused a pipeline flush."
> + },
> + {
> + "ArchStdEvent": "BR_PRED_RETIRED",
> + "PublicDescription": "Counts branch instructions counted by BR_RETIRED which were correctly predicted."
> + },
> + {
> + "ArchStdEvent": "BR_IND_RETIRED",
> + "PublicDescription": "Counts architecturally executed indirect branches including procedure returns."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json
> new file mode 100644
> index 000000000000..ca0217fa4681
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json
> @@ -0,0 +1,42 @@
> +[
> + {
> + "ArchStdEvent": "SAMPLE_POP",
> + "PublicDescription": "Counts statistical profiling sample population, the count of all operations that could be sampled but may or may not be chosen for sampling."
> + },
> + {
> + "ArchStdEvent": "SAMPLE_FEED",
> + "PublicDescription": "Counts statistical profiling samples taken for sampling."
> + },
> + {
> + "ArchStdEvent": "SAMPLE_FILTRATE",
> + "PublicDescription": "Counts statistical profiling samples taken which are not removed by filtering."
> + },
> + {
> + "ArchStdEvent": "SAMPLE_COLLISION",
> + "PublicDescription": "Counts statistical profiling samples that have collided with a previous sample and so therefore not taken."
> + },
> + {
> + "ArchStdEvent": "SAMPLE_FEED_BR",
> + "PublicDescription": "Counts statistical profiling samples taken which are branches."
> + },
> + {
> + "ArchStdEvent": "SAMPLE_FEED_LD",
> + "PublicDescription": "Counts statistical profiling samples taken which are loads or load atomic operations."
> + },
> + {
> + "ArchStdEvent": "SAMPLE_FEED_ST",
> + "PublicDescription": "Counts statistical profiling samples taken which are stores or store atomic operations."
> + },
> + {
> + "ArchStdEvent": "SAMPLE_FEED_OP",
> + "PublicDescription": "Counts statistical profiling samples taken which are matching any operation type filters supported."
> + },
> + {
> + "ArchStdEvent": "SAMPLE_FEED_EVENT",
> + "PublicDescription": "Counts statistical profiling samples taken which are matching event packet filter constraints."
> + },
> + {
> + "ArchStdEvent": "SAMPLE_FEED_LAT",
> + "PublicDescription": "Counts statistical profiling samples taken which are exceeding minimum latency set by operation latency filter constraints."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.json
> new file mode 100644
> index 000000000000..f91eb18d683c
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.json
> @@ -0,0 +1,90 @@
> +[
> + {
> + "ArchStdEvent": "BR_MIS_PRED",
> + "PublicDescription": "Counts branches which are speculatively executed and mispredicted."
> + },
> + {
> + "ArchStdEvent": "BR_PRED",
> + "PublicDescription": "Counts all speculatively executed branches."
> + },
> + {
> + "ArchStdEvent": "INST_SPEC",
> + "PublicDescription": "Counts operations that have been speculatively executed."
> + },
> + {
> + "ArchStdEvent": "OP_SPEC",
> + "PublicDescription": "Counts micro-operations speculatively executed. This is the count of the number of micro-operations dispatched in a cycle."
> + },
> + {
> + "ArchStdEvent": "STREX_FAIL_SPEC",
> + "PublicDescription": "Counts store-exclusive operations that have been speculatively executed and have not successfully completed the store operation."
> + },
> + {
> + "ArchStdEvent": "STREX_SPEC",
> + "PublicDescription": "Counts store-exclusive operations that have been speculatively executed."
> + },
> + {
> + "ArchStdEvent": "LD_SPEC",
> + "PublicDescription": "Counts speculatively executed load operations including Single Instruction Multiple Data (SIMD) load operations."
> + },
> + {
> + "ArchStdEvent": "ST_SPEC",
> + "PublicDescription": "Counts speculatively executed store operations including Single Instruction Multiple Data (SIMD) store operations."
> + },
> + {
> + "ArchStdEvent": "DP_SPEC",
> + "PublicDescription": "Counts speculatively executed logical or arithmetic instructions such as MOV/MVN operations."
> + },
> + {
> + "ArchStdEvent": "ASE_SPEC",
> + "PublicDescription": "Counts speculatively executed Advanced SIMD operations excluding load, store and move micro-operations that move data to or from SIMD (vector) registers."
> + },
> + {
> + "ArchStdEvent": "VFP_SPEC",
> + "PublicDescription": "Counts speculatively executed floating point operations. This event does not count operations that move data to or from floating point (vector) registers."
> + },
> + {
> + "ArchStdEvent": "PC_WRITE_SPEC",
> + "PublicDescription": "Counts speculatively executed operations which cause software changes of the PC. Those operations include all taken branch operations."
> + },
> + {
> + "ArchStdEvent": "CRYPTO_SPEC",
> + "PublicDescription": "Counts speculatively executed cryptographic operations except for PMULL and VMULL operations."
> + },
> + {
> + "ArchStdEvent": "ISB_SPEC",
> + "PublicDescription": "Counts ISB operations that are executed."
> + },
> + {
> + "ArchStdEvent": "DSB_SPEC",
> + "PublicDescription": "Counts DSB operations that are speculatively issued to Load/Store unit in the CPU."
> + },
> + {
> + "ArchStdEvent": "DMB_SPEC",
> + "PublicDescription": "Counts DMB operations that are speculatively issued to the Load/Store unit in the CPU. This event does not count implied barriers from load acquire/store release operations."
> + },
> + {
> + "ArchStdEvent": "RC_LD_SPEC",
> + "PublicDescription": "Counts any load acquire operations that are speculatively executed. For example: LDAR, LDARH, LDARB"
> + },
> + {
> + "ArchStdEvent": "RC_ST_SPEC",
> + "PublicDescription": "Counts any store release operations that are speculatively executed. For example: STLR, STLRH, STLRB"
> + },
> + {
> + "ArchStdEvent": "ASE_INST_SPEC",
> + "PublicDescription": "Counts speculatively executed Advanced SIMD operations."
> + },
> + {
> + "ArchStdEvent": "CAS_NEAR_PASS",
> + "PublicDescription": "Counts compare and swap instructions that executed locally to the PE and updated the location accessed."
> + },
> + {
> + "ArchStdEvent": "CAS_NEAR_SPEC",
> + "PublicDescription": "Counts compare and swap instructions that executed locally to the PE."
> + },
> + {
> + "ArchStdEvent": "CAS_FAR_SPEC",
> + "PublicDescription": "Counts compare and swap instructions that did not execute locally to the PE."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json
> new file mode 100644
> index 000000000000..b1eae21bac07
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json
> @@ -0,0 +1,82 @@
> +[
> + {
> + "ArchStdEvent": "STALL_FRONTEND",
> + "PublicDescription": "Counts cycles when frontend could not send any micro-operations to the rename stage because of frontend resource stalls caused by fetch memory latency or branch prediction flow stalls. STALL_FRONTEND_SLOTS counts SLOTS during the cycle when this event counts."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND",
> + "PublicDescription": "Counts cycles whenever the rename unit is unable to send any micro-operations to the backend of the pipeline because of backend resource constraints. Backend resource constraints can include issue stage fullness, execution stage fullness, or other internal pipeline resource fullness. All the backend slots were empty during the cycle when this event counts."
> + },
> + {
> + "ArchStdEvent": "STALL",
> + "PublicDescription": "Counts cycles when no operations are sent to the rename unit from the frontend or from the rename unit to the backend for any reason (either frontend or backend stall). This event is the sum of STALL_FRONTEND and STALL_BACKEND"
> + },
> + {
> + "ArchStdEvent": "STALL_SLOT_BACKEND",
> + "PublicDescription": "Counts slots per cycle in which no operations are sent from the rename unit to the backend due to backend resource constraints. STALL_BACKEND counts during the cycle when STALL_SLOT_BACKEND counts at least 1."
> + },
> + {
> + "ArchStdEvent": "STALL_SLOT_FRONTEND",
> + "PublicDescription": "Counts slots per cycle in which no operations are sent to the rename unit from the frontend due to frontend resource constraints."
> + },
> + {
> + "ArchStdEvent": "STALL_SLOT",
> + "PublicDescription": "Counts slots per cycle in which no operations are sent to the rename unit from the frontend or from the rename unit to the backend for any reason (either frontend or backend stall). STALL_SLOT is the sum of STALL_SLOT_FRONTEND and STALL_SLOT_BACKEND."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND_MEM",
> + "PublicDescription": "Counts cycles when the backend is stalled because there is a pending demand load request in progress in the last level core cache."
> + },
> + {
> + "ArchStdEvent": "STALL_FRONTEND_MEMBOUND",
> + "PublicDescription": "Counts cycles when the frontend could not send any micro-operations to the rename stage due to resource constraints in the memory resources."
> + },
> + {
> + "ArchStdEvent": "STALL_FRONTEND_L1I",
> + "PublicDescription": "Counts cycles when the frontend is stalled because there is an instruction fetch request pending in the level 1 instruction cache."
> + },
> + {
> + "ArchStdEvent": "STALL_FRONTEND_MEM",
> + "PublicDescription": "Counts cycles when the frontend is stalled because there is an instruction fetch request pending in the last level core cache."
> + },
> + {
> + "ArchStdEvent": "STALL_FRONTEND_TLB",
> + "PublicDescription": "Counts when the frontend is stalled on any TLB misses being handled. This event also counts the TLB accesses made by hardware prefetches."
> + },
> + {
> + "ArchStdEvent": "STALL_FRONTEND_CPUBOUND",
> + "PublicDescription": "Counts cycles when the frontend could not send any micro-operations to the rename stage due to resource constraints in the CPU resources excluding memory resources."
> + },
> + {
> + "ArchStdEvent": "STALL_FRONTEND_FLUSH",
> + "PublicDescription": "Counts cycles when the frontend could not send any micro-operations to the rename stage as the frontend is recovering from a machine flush or resteer. Example scenarios that cause a flush include branch mispredictions, taken exceptions, micro-architectural flush etc."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND_MEMBOUND",
> + "PublicDescription": "Counts cycles when the backend could not accept any micro-operations due to resource constraints in the memory resources."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND_L1D",
> + "PublicDescription": "Counts cycles when the backend is stalled because there is a pending demand load request in progress in the level 1 data cache."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND_TLB",
> + "PublicDescription": "Counts cycles when the backend is stalled on any demand TLB misses being handled."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND_ST",
> + "PublicDescription": "Counts cycles when the backend is stalled and there is a store that has not reached the pre-commit stage."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND_CPUBOUND",
> + "PublicDescription": "Counts cycles when the backend could not accept any micro-operations due to any resource constraints in the CPU excluding memory resources."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND_BUSY",
> + "PublicDescription": "Counts cycles when the backend could not accept any micro-operations because the issue queues are full to take any operations for execution."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND_RENAME",
> + "PublicDescription": "Counts cycles when backend is stalled even when operations are available from the frontend but at least one is not ready to be sent to the backend because no rename register is available."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json
> new file mode 100644
> index 000000000000..51dab48cb2ba
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json
> @@ -0,0 +1,50 @@
> +[
> + {
> + "ArchStdEvent": "SVE_INST_SPEC",
> + "PublicDescription": "Counts speculatively executed operations that are SVE operations."
> + },
> + {
> + "ArchStdEvent": "SVE_PRED_SPEC",
> + "PublicDescription": "Counts speculatively executed predicated SVE operations."
> + },
> + {
> + "ArchStdEvent": "SVE_PRED_EMPTY_SPEC",
> + "PublicDescription": "Counts speculatively executed predicated SVE operations with no active predicate elements."
> + },
> + {
> + "ArchStdEvent": "SVE_PRED_FULL_SPEC",
> + "PublicDescription": "Counts speculatively executed predicated SVE operations with all predicate elements active."
> + },
> + {
> + "ArchStdEvent": "SVE_PRED_PARTIAL_SPEC",
> + "PublicDescription": "Counts speculatively executed predicated SVE operations with at least one but not all active predicate elements."
> + },
> + {
> + "ArchStdEvent": "SVE_PRED_NOT_FULL_SPEC",
> + "PublicDescription": "Counts speculatively executed predicated SVE operations with at least one non active predicate elements."
> + },
> + {
> + "ArchStdEvent": "SVE_LDFF_SPEC",
> + "PublicDescription": "Counts speculatively executed SVE first fault or non-fault load operations."
> + },
> + {
> + "ArchStdEvent": "SVE_LDFF_FAULT_SPEC",
> + "PublicDescription": "Counts speculatively executed SVE first fault or non-fault load operations that clear at least one bit in the FFR."
> + },
> + {
> + "ArchStdEvent": "ASE_SVE_INT8_SPEC",
> + "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type an 8-bit integer."
> + },
> + {
> + "ArchStdEvent": "ASE_SVE_INT16_SPEC",
> + "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 16-bit integer."
> + },
> + {
> + "ArchStdEvent": "ASE_SVE_INT32_SPEC",
> + "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 32-bit integer."
> + },
> + {
> + "ArchStdEvent": "ASE_SVE_INT64_SPEC",
> + "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 64-bit integer."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json
> new file mode 100644
> index 000000000000..c7aa89c2f19f
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json
> @@ -0,0 +1,74 @@
> +[
> + {
> + "ArchStdEvent": "L1I_TLB_REFILL",
> + "PublicDescription": "Counts level 1 instruction TLB refills from any Instruction fetch. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB."
> + },
> + {
> + "ArchStdEvent": "L1D_TLB_REFILL",
> + "PublicDescription": "Counts level 1 data TLB accesses that resulted in TLB refills. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event counts for refills caused by preload instructions or hardware prefetch accesses. This event counts regardless of whether the miss hits in L2 or results in a translation table walk. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB. This event will not count on an access from an AT(address translation) instruction."
> + },
> + {
> + "ArchStdEvent": "L1D_TLB",
> + "PublicDescription": "Counts level 1 data TLB accesses caused by any memory load or store operation. Note that load or store instructions can be broken up into multiple memory operations. This event does not count TLB maintenance operations."
> + },
> + {
> + "ArchStdEvent": "L1I_TLB",
> + "PublicDescription": "Counts level 1 instruction TLB accesses, whether the access hits or misses in the TLB. This event counts both demand accesses and prefetch or preload generated accesses."
> + },
> + {
> + "ArchStdEvent": "L2D_TLB_REFILL",
> + "PublicDescription": "Counts level 2 TLB refills caused by memory operations from both data and instruction fetch, except for those caused by TLB maintenance operations and hardware prefetches."
> + },
> + {
> + "ArchStdEvent": "L2D_TLB",
> + "PublicDescription": "Counts level 2 TLB accesses except those caused by TLB maintenance operations."
> + },
> + {
> + "ArchStdEvent": "DTLB_WALK",
> + "PublicDescription": "Counts number of demand data translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that partial translations that cause a translation table walk are also counted. Also note that this event counts walks triggered by software preloads, but not walks triggered by hardware prefetchers, and that this event does not count walks triggered by TLB maintenance operations."
> + },
> + {
> + "ArchStdEvent": "ITLB_WALK",
> + "PublicDescription": "Counts number of instruction translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that partial translations that cause a translation table walk are also counted. Also note that this event does not count walks triggered by TLB maintenance operations."
> + },
> + {
> + "ArchStdEvent": "DTLB_WALK_PERCYC",
> + "PublicDescription": "Counts the number of data translation table walks in progress per cycle."
> + },
> + {
> + "ArchStdEvent": "ITLB_WALK_PERCYC",
> + "PublicDescription": "Counts the number of instruction translation table walks in progress per cycle."
> + },
> + {
> + "ArchStdEvent": "DTLB_HWUPD",
> + "PublicDescription": "Counts number of memory accesses triggered by a data translation table walk and performing an update of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that this event counts accesses triggered by software preloads, but not accesses triggered by hardware prefetchers."
> + },
> + {
> + "ArchStdEvent": "ITLB_HWUPD",
> + "PublicDescription": "Counts number of memory accesses triggered by an instruction translation table walk and performing an update of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD."
> + },
> + {
> + "ArchStdEvent": "DTLB_STEP",
> + "PublicDescription": "Counts number of memory accesses triggered by a demand data translation table walk and performing a read of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that this event counts accesses triggered by software preloads, but not accesses triggered by hardware prefetchers."
> + },
> + {
> + "ArchStdEvent": "ITLB_STEP",
> + "PublicDescription": "Counts number of memory accesses triggered by an instruction translation table walk and performing a read of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD."
> + },
> + {
> + "ArchStdEvent": "DTLB_WALK_LARGE",
> + "PublicDescription": "Counts number of demand data translation table walks caused by a miss in the L2 TLB and yielding a large page. The set of large pages is defined as all pages with a final size higher than or equal to 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. If DTLB_WALK_BLOCK is implemented, then it is an alias for this event in this family. Note that partial translations that cause a translation table walk are also counted. Also note that this event counts walks triggered by software preloads, but not walks triggered by hardware prefetchers, and that this event does not count walks triggered by TLB maintenance operations."
> + },
> + {
> + "ArchStdEvent": "ITLB_WALK_LARGE",
> + "PublicDescription": "Counts number of instruction translation table walks caused by a miss in the L2 TLB and yielding a large page. The set of large pages is defined as all pages with a final size higher than or equal to 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. In this family, this is equal to ITLB_WALK_BLOCK event. Note that partial translations that cause a translation table walk are also counted. Also note that this event does not count walks triggered by TLB maintenance operations."
> + },
> + {
> + "ArchStdEvent": "DTLB_WALK_SMALL",
> + "PublicDescription": "Counts number of data translation table walks caused by a miss in the L2 TLB and yielding a small page. The set of small pages is defined as all pages with a final size lower than 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. If DTLB_WALK_PAGE event is implemented, then it is an alias for this event in this family. Note that partial translations that cause a translation table walk are also counted. Also note that this event counts walks triggered by software preloads, but not walks triggered by hardware prefetchers, and that this event does not count walks triggered by TLB maintenance operations."
> + },
> + {
> + "ArchStdEvent": "ITLB_WALK_SMALL",
> + "PublicDescription": "Counts number of instruction translation table walks caused by a miss in the L2 TLB and yielding a small page. The set of small pages is defined as all pages with a final size lower than 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. In this family, this is equal to ITLB_WALK_PAGE event. Note that partial translations that cause a translation table walk are also counted. Also note that this event does not count walks triggered by TLB maintenance operations."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json
> new file mode 100644
> index 000000000000..33672a8711d4
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json
> @@ -0,0 +1,32 @@
> +[
> + {
> + "ArchStdEvent": "TRB_WRAP"
> + },
> + {
> + "ArchStdEvent": "TRB_TRIG"
> + },
> + {
> + "ArchStdEvent": "TRCEXTOUT0"
> + },
> + {
> + "ArchStdEvent": "TRCEXTOUT1"
> + },
> + {
> + "ArchStdEvent": "TRCEXTOUT2"
> + },
> + {
> + "ArchStdEvent": "TRCEXTOUT3"
> + },
> + {
> + "ArchStdEvent": "CTI_TRIGOUT4"
> + },
> + {
> + "ArchStdEvent": "CTI_TRIGOUT5"
> + },
> + {
> + "ArchStdEvent": "CTI_TRIGOUT6"
> + },
> + {
> + "ArchStdEvent": "CTI_TRIGOUT7"
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv b/tools/perf/pmu-events/arch/arm64/mapfile.csv
> index bb3fa8a33496..ccfcae375750 100644
> --- a/tools/perf/pmu-events/arch/arm64/mapfile.csv
> +++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv
> @@ -33,6 +33,7 @@
> 0x00000000410fd4c0,v1,arm/cortex-x1,core
> 0x00000000410fd460,v1,arm/cortex-a510,core
> 0x00000000410fd470,v1,arm/cortex-a710,core
> +0x00000000410fd810,v1,arm/cortex-a720,core
> 0x00000000410fd480,v1,arm/cortex-x2,core
> 0x00000000410fd490,v1,arm/neoverse-n2-v2,core
> 0x00000000410fd4f0,v1,arm/neoverse-n2-v2,core
Aside the notes on the metrics, this all looks good. If those weren't
an oversight then please add my reviewed-by tag.
Thanks,
Ian
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 2/2] perf vendor events arm64: Add Cortex-A520 events/metrics
2025-02-13 15:12 ` [PATCH 2/2] perf vendor events arm64: Add Cortex-A520 events/metrics Yangyu Chen
@ 2025-02-13 16:53 ` Ian Rogers
0 siblings, 0 replies; 16+ messages in thread
From: Ian Rogers @ 2025-02-13 16:53 UTC (permalink / raw)
To: Yangyu Chen
Cc: linux-perf-users, John Garry, Will Deacon, James Clark,
Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, Liang Kan,
Yoshihiro Furudera, linux-arm-kernel, linux-kernel
On Thu, Feb 13, 2025 at 7:19 AM Yangyu Chen <cyy@cyyself.name> wrote:
>
> Add JSON files for Cortex-A520 events and metrics. Using the existing
> Neoverse N3 JSON files as a template, I manually checked the missing and
> extra events/metrics using my script [1] and modified them according to
> the Arm Cortex-A520 Core Technical Reference Manual [2].
Thanks for this! Similar notes to the other patch. On the testing
front, if automation would be possible then new tests would be great.
For example, making sure the sum of topdown metrics is 100%. There are
similar tests for Intel here:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/stat_metrics_values.sh?h=perf-tools-next
Thanks,
Ian
> [1] https://github.com/cyyself/arm-pmu-check/tree/1075bebeb3f1441067448251a387df35af15bf16
> [2] https://developer.arm.com/documentation/102517/0004/Performance-Monitors-Extension-support-/Performance-monitors-events/Common-event-PMU-events
>
> Signed-off-by: Yangyu Chen <cyy@cyyself.name>
> ---
> .../arch/arm64/arm/cortex-a520/bus.json | 26 ++
> .../arch/arm64/arm/cortex-a520/exception.json | 18 +
> .../arm64/arm/cortex-a520/fp_operation.json | 14 +
> .../arch/arm64/arm/cortex-a520/general.json | 6 +
> .../arch/arm64/arm/cortex-a520/l1d_cache.json | 50 +++
> .../arch/arm64/arm/cortex-a520/l1i_cache.json | 14 +
> .../arch/arm64/arm/cortex-a520/l2_cache.json | 46 +++
> .../arch/arm64/arm/cortex-a520/l3_cache.json | 21 +
> .../arch/arm64/arm/cortex-a520/ll_cache.json | 10 +
> .../arch/arm64/arm/cortex-a520/memory.json | 58 +++
> .../arch/arm64/arm/cortex-a520/metrics.json | 373 ++++++++++++++++++
> .../arch/arm64/arm/cortex-a520/pmu.json | 8 +
> .../arch/arm64/arm/cortex-a520/retired.json | 90 +++++
> .../arm64/arm/cortex-a520/spec_operation.json | 70 ++++
> .../arch/arm64/arm/cortex-a520/stall.json | 82 ++++
> .../arch/arm64/arm/cortex-a520/sve.json | 22 ++
> .../arch/arm64/arm/cortex-a520/tlb.json | 78 ++++
> .../arch/arm64/arm/cortex-a520/trace.json | 32 ++
> .../arch/arm64/common-and-microarch.json | 15 +
> tools/perf/pmu-events/arch/arm64/mapfile.csv | 1 +
> 20 files changed, 1034 insertions(+)
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json
>
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json
> new file mode 100644
> index 000000000000..884e42ab6a49
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json
> @@ -0,0 +1,26 @@
> +[
> + {
> + "ArchStdEvent": "BUS_ACCESS",
> + "PublicDescription": "Counts memory transactions issued by the CPU to the external bus, including snoop requests and snoop responses. Each beat of data is counted individually."
> + },
> + {
> + "ArchStdEvent": "BUS_CYCLES",
> + "PublicDescription": "Counts bus cycles in the CPU. Bus cycles represent a clock cycle in which a transaction could be sent or received on the interface from the CPU to the external bus. Since that interface is driven at the same clock speed as the CPU, this event is a duplicate of CPU_CYCLES."
> + },
> + {
> + "ArchStdEvent": "BUS_ACCESS_RD",
> + "PublicDescription": "Counts memory read transactions seen on the external bus. Each beat of data is counted individually."
> + },
> + {
> + "ArchStdEvent": "BUS_ACCESS_WR",
> + "PublicDescription": "Counts memory write transactions seen on the external bus. Each beat of data is counted individually."
> + },
> + {
> + "ArchStdEvent": "BUS_REQ_RD_PERCYC",
> + "PublicDescription": "Bus read transactions in progress."
> + },
> + {
> + "ArchStdEvent": "BUS_REQ_RD",
> + "BriefDescription": "Bus request, read"
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json
> new file mode 100644
> index 000000000000..fbe580e15c2e
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json
> @@ -0,0 +1,18 @@
> +[
> + {
> + "ArchStdEvent": "EXC_TAKEN",
> + "PublicDescription": "Counts any taken architecturally visible exceptions such as IRQ, FIQ, SError, and other synchronous exceptions. Exceptions are counted whether or not they are taken locally."
> + },
> + {
> + "ArchStdEvent": "EXC_RETURN",
> + "PublicDescription": "Counts any architecturally executed exception return instructions. For example: AArch64: ERET"
> + },
> + {
> + "ArchStdEvent": "EXC_IRQ",
> + "PublicDescription": "Counts IRQ exceptions including the virtual IRQs that are taken locally."
> + },
> + {
> + "ArchStdEvent": "EXC_FIQ",
> + "PublicDescription": "Counts FIQ exceptions including the virtual FIQs that are taken locally."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json
> new file mode 100644
> index 000000000000..da0c4b05ad5b
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json
> @@ -0,0 +1,14 @@
> +[
> + {
> + "ArchStdEvent": "FP_HP_SPEC",
> + "PublicDescription": "Counts speculatively executed half precision floating point operations."
> + },
> + {
> + "ArchStdEvent": "FP_SP_SPEC",
> + "PublicDescription": "Counts speculatively executed single precision floating point operations."
> + },
> + {
> + "ArchStdEvent": "FP_DP_SPEC",
> + "PublicDescription": "Counts speculatively executed double precision floating point operations."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json
> new file mode 100644
> index 000000000000..20fada95ef97
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json
> @@ -0,0 +1,6 @@
> +[
> + {
> + "ArchStdEvent": "CPU_CYCLES",
> + "PublicDescription": "Counts CPU clock cycles (not timer cycles). The clock measured by this event is defined as the physical clock driving the CPU logic."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json
> new file mode 100644
> index 000000000000..90e871c8986a
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json
> @@ -0,0 +1,50 @@
> +[
> + {
> + "ArchStdEvent": "L1D_CACHE_REFILL",
> + "PublicDescription": "Counts level 1 data cache refills caused by speculatively executed load or store operations that missed in the level 1 data cache. This event only counts one event per cache line."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE",
> + "PublicDescription": "Counts level 1 data cache accesses from any load/store operations. Atomic operations that resolve in the CPUs caches (near atomic operations) counts as both a write access and read access. Each access to a cache line is counted including the multiple accesses caused by single instructions such as LDM or STM. Each access to other level 1 data or unified memory structures, for example refill buffers, write buffers, and write-back buffers, are also counted."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_WB",
> + "PublicDescription": "Counts write-backs of dirty data from the L1 data cache to the L2 cache. This occurs when either a dirty cache line is evicted from L1 data cache and allocated in the L2 cache or dirty data is written to the L2 and possibly to the next level of cache. This event counts both victim cache line evictions and cache write-backs from snoops or cache maintenance operations. The following cache operations are not counted:\n\n1. Invalidations which do not result in data being transferred out of the L1 (such as evictions of clean data),\n2. Full line writes which write to L2 without writing L1, such as write streaming mode."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_LMISS_RD",
> + "PublicDescription": "Counts cache line refills into the level 1 data cache from any memory read operations, that incurred additional latency."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_RD",
> + "PublicDescription": "Counts level 1 data cache accesses from any load operation. Atomic load operations that resolve in the CPUs caches counts as both a write access and read access."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_WR",
> + "PublicDescription": "Counts level 1 data cache accesses generated by store operations. This event also counts accesses caused by a DC ZVA (data cache zero, specified by virtual address) instruction. Near atomic operations that resolve in the CPUs caches count as a write access and read access."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_REFILL_RD",
> + "PublicDescription": "Counts level 1 data cache refills caused by speculatively executed load instructions where the memory read operation misses in the level 1 data cache. This event only counts one event per cache line."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_REFILL_WR",
> + "PublicDescription": "Counts level 1 data cache refills caused by speculatively executed store instructions where the memory write operation misses in the level 1 data cache. This event only counts one event per cache line."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_REFILL_INNER",
> + "PublicDescription": "Counts level 1 data cache refills where the cache line data came from caches inside the immediate cluster of the core."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_REFILL_OUTER",
> + "PublicDescription": "Counts level 1 data cache refills for which the cache line data came from outside the immediate cluster of the core, like an SLC in the system interconnect or DRAM."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_HWPRF",
> + "PublicDescription": "Counts level 1 data cache accesses from any load/store operations generated by the hardware prefetcher."
> + },
> + {
> + "ArchStdEvent": "L1D_CACHE_REFILL_HWPRF",
> + "PublicDescription": "Counts level 1 data cache refills where the cache line is requested by a hardware prefetcher."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json
> new file mode 100644
> index 000000000000..633f1030359d
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json
> @@ -0,0 +1,14 @@
> +[
> + {
> + "ArchStdEvent": "L1I_CACHE_REFILL",
> + "PublicDescription": "Counts cache line refills in the level 1 instruction cache caused by a missed instruction fetch. Instruction fetches may include accessing multiple instructions, but the single cache line allocation is counted once."
> + },
> + {
> + "ArchStdEvent": "L1I_CACHE",
> + "PublicDescription": "Counts instruction fetches which access the level 1 instruction cache. Instruction cache accesses caused by cache maintenance operations are not counted."
> + },
> + {
> + "ArchStdEvent": "L1I_CACHE_LMISS",
> + "PublicDescription": "Counts cache line refills into the level 1 instruction cache, that incurred additional latency."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json
> new file mode 100644
> index 000000000000..9874b1a7c94b
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json
> @@ -0,0 +1,46 @@
> +[
> + {
> + "ArchStdEvent": "L2D_CACHE",
> + "PublicDescription": "Counts accesses to the level 2 cache due to data accesses. Level 2 cache is a unified cache for data and instruction accesses. Accesses are for misses in the first level data cache or translation resolutions due to accesses. This event also counts write back of dirty data from level 1 data cache to the L2 cache."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_REFILL",
> + "PublicDescription": "Counts cache line refills into the level 2 cache. Level 2 cache is a unified cache for data and instruction accesses. Accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_WB",
> + "PublicDescription": "Counts write-backs of data from the L2 cache to outside the CPU. This includes snoops to the L2 (from other CPUs) which return data even if the snoops cause an invalidation. L2 cache line invalidations which do not write data outside the CPU and snoops which return data from an L1 cache are not counted. Data would not be written outside the cache when invalidating a clean cache line."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_ALLOCATE",
> + "PublicDescription": "Counts level 2 cache line allocates that do not fetch data from outside the level 2 data or unified cache."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_RD",
> + "PublicDescription": "Counts level 2 data cache accesses due to memory read operations. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_WR",
> + "PublicDescription": "Counts level 2 cache accesses due to memory write operations. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_REFILL_RD",
> + "PublicDescription": "Counts refills for memory accesses due to memory read operation counted by L2D_CACHE_RD. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_REFILL_WR",
> + "PublicDescription": "Counts refills for memory accesses due to memory write operation counted by L2D_CACHE_WR. Level 2 cache is a unified cache for data and instruction accesses, accesses are for misses in the level 1 data cache or translation resolutions due to accesses."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_LMISS_RD",
> + "PublicDescription": "Counts cache line refills into the level 2 unified cache from any memory read operations that incurred additional latency."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_HWPRF",
> + "PublicDescription": "Counts level 2 data cache accesses generated by L2D hardware prefetchers."
> + },
> + {
> + "ArchStdEvent": "L2D_CACHE_REFILL_HWPRF",
> + "BriefDescription": "This event counts hardware prefetch counted by L2D_CACHE_HWPRF that causes a refill of the Level 2 cache, or any Level 1 data and instruction cache of this PE, from outside of those caches."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json
> new file mode 100644
> index 000000000000..d5485d71babb
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json
> @@ -0,0 +1,21 @@
> +[
> + {
> + "ArchStdEvent": "L3D_CACHE",
> + "PublicDescription": "Counts level 3 cache accesses. Level 3 cache is a unified cache for data and instruction accesses. Accesses are for misses in the lower level caches or translation resolutions due to accesses."
> + },
> + {
> + "ArchStdEvent": "L3D_CACHE_RD",
> + "PublicDescription": "Counts level 3 cache accesses caused by any memory read operation. Level 3 cache is a unified cache for data and instruction accesses. Accesses are for misses in the lower level caches or translation resolutions due to accesses."
> + },
> + {
> + "ArchStdEvent": "L3D_CACHE_REFILL_RD"
> + },
> + {
> + "ArchStdEvent": "L3D_CACHE_LMISS_RD",
> + "PublicDescription": "Counts any cache line refill into the level 3 cache from memory read operations that incurred additional latency."
> + },
> + {
> + "ArchStdEvent": "L3D_CACHE_HWPRF",
> + "PublicDescription": "Level 3 data cache hardware prefetch."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json
> new file mode 100644
> index 000000000000..fd5a2e0099b8
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json
> @@ -0,0 +1,10 @@
> +[
> + {
> + "ArchStdEvent": "LL_CACHE_RD",
> + "PublicDescription": "Counts read transactions that were returned from outside the core cluster. This event counts for external last level cache when the system register CPUECTLR.EXTLLC bit is set, otherwise it counts for the L3 cache. This event counts read transactions returned from outside the core if those transactions are either hit in the system level cache or missed in the SLC and are returned from any other external sources."
> + },
> + {
> + "ArchStdEvent": "LL_CACHE_MISS_RD",
> + "PublicDescription": "Counts read transactions that were returned from outside the core cluster but missed in the system level cache. This event counts for external last level cache when the system register CPUECTLR.EXTLLC bit is set, otherwise it counts for L3 cache. This event counts read transactions returned from outside the core if those transactions are missed in the System level Cache. The data source of the transaction is indicated by a field in the CHI transaction returning to the CPU. This event does not count reads caused by cache maintenance operations."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json
> new file mode 100644
> index 000000000000..e7f7914ecd2b
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json
> @@ -0,0 +1,58 @@
> +[
> + {
> + "ArchStdEvent": "MEM_ACCESS",
> + "PublicDescription": "Counts memory accesses issued by the CPU load store unit, where those accesses are issued due to load or store operations. This event counts memory accesses no matter whether the data is received from any level of cache hierarchy or external memory. If memory accesses are broken up into smaller transactions than what were specified in the load or store instructions, then the event counts those smaller memory transactions."
> + },
> + {
> + "ArchStdEvent": "MEMORY_ERROR",
> + "PublicDescription": "Counts any detected correctable or uncorrectable physical memory errors (ECC or parity) in protected CPUs RAMs. On the core, this event counts errors in the caches (including data and tag rams). Any detected memory error (from either a speculative and abandoned access, or an architecturally executed access) is counted. Note that errors are only detected when the actual protected memory is accessed by an operation."
> + },
> + {
> + "ArchStdEvent": "REMOTE_ACCESS_RD",
> + "PublicDescription": "Counts memory access to another socket in a multi-socket system, read."
> + },
> + {
> + "ArchStdEvent": "MEM_ACCESS_RD",
> + "PublicDescription": "Counts memory accesses issued by the CPU due to load operations. The event counts any memory load access, no matter whether the data is received from any level of cache hierarchy or external memory. The event also counts atomic load operations. If memory accesses are broken up by the load/store unit into smaller transactions that are issued by the bus interface, then the event counts those smaller transactions."
> + },
> + {
> + "ArchStdEvent": "MEM_ACCESS_WR",
> + "PublicDescription": "Counts memory accesses issued by the CPU due to store operations. The event counts any memory store access, no matter whether the data is located in any level of cache or external memory. The event also counts atomic load and store operations. If memory accesses are broken up by the load/store unit into smaller transactions that are issued by the bus interface, then the event counts those smaller transactions."
> + },
> + {
> + "ArchStdEvent": "LDST_ALIGN_LAT",
> + "PublicDescription": "Counts the number of memory read and write accesses in a cycle that incurred additional latency, due to the alignment of the address and the size of data being accessed, which results in store crossing a single cache line."
> + },
> + {
> + "ArchStdEvent": "LD_ALIGN_LAT",
> + "PublicDescription": "Counts the number of memory read accesses in a cycle that incurred additional latency, due to the alignment of the address and size of data being accessed, which results in load crossing a single cache line."
> + },
> + {
> + "ArchStdEvent": "ST_ALIGN_LAT",
> + "PublicDescription": "Counts the number of memory write access in a cycle that incurred additional latency, due to the alignment of the address and size of data being accessed incurred additional latency."
> + },
> + {
> + "ArchStdEvent": "MEM_ACCESS_CHECKED",
> + "PublicDescription": "Counts the number of memory read and write accesses counted by MEM_ACCESS that are tag checked by the Memory Tagging Extension (MTE). This event is implemented as the sum of MEM_ACCESS_CHECKED_RD and MEM_ACCESS_CHECKED_WR"
> + },
> + {
> + "ArchStdEvent": "MEM_ACCESS_CHECKED_RD",
> + "PublicDescription": "Counts the number of memory read accesses in a cycle that are tag checked by the Memory Tagging Extension (MTE)."
> + },
> + {
> + "ArchStdEvent": "MEM_ACCESS_CHECKED_WR",
> + "PublicDescription": "Counts the number of memory write accesses in a cycle that is tag checked by the Memory Tagging Extension (MTE)."
> + },
> + {
> + "ArchStdEvent": "INST_FETCH_PERCYC",
> + "PublicDescription": "Counts number of instruction fetches outstanding per cycle, which will provide an average latency of instruction fetch."
> + },
> + {
> + "ArchStdEvent": "MEM_ACCESS_RD_PERCYC",
> + "PublicDescription": "Counts the number of outstanding loads or memory read accesses per cycle."
> + },
> + {
> + "ArchStdEvent": "INST_FETCH",
> + "PublicDescription": "Counts Instruction memory accesses that the PE makes."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json
> new file mode 100644
> index 000000000000..62cb910c8945
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json
> @@ -0,0 +1,373 @@
> +[
> + {
> + "ArchStdEvent": "backend_bound"
> + },
> + {
> + "MetricName": "backend_busy_bound",
> + "MetricExpr": "STALL_BACKEND_BUSY / STALL_BACKEND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to issue queues being full to accept operations for execution.",
> + "MetricGroup": "Topdown_Backend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "backend_cache_l1d_bound",
> + "MetricExpr": "STALL_BACKEND_L1D / (STALL_BACKEND_L1D + STALL_BACKEND_MEM) * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory access latency issues caused by level 1 data cache misses.",
> + "MetricGroup": "Topdown_Backend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "backend_cache_l2d_bound",
> + "MetricExpr": "STALL_BACKEND_MEM / (STALL_BACKEND_L1D + STALL_BACKEND_MEM) * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory access latency issues caused by level 2 data cache misses.",
> + "MetricGroup": "Topdown_Backend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "backend_mem_bound",
> + "MetricExpr": "STALL_BACKEND_MEMBOUND / STALL_BACKEND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to backend core resource constraints related to memory access latency issues caused by memory access components.",
> + "MetricGroup": "Topdown_Backend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "backend_mem_cache_bound",
> + "MetricExpr": "(STALL_BACKEND_L1D + STALL_BACKEND_MEM) / STALL_BACKEND_MEMBOUND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory latency issues caused by data cache misses.",
> + "MetricGroup": "Topdown_Backend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "backend_mem_store_bound",
> + "MetricExpr": "STALL_BACKEND_ST / STALL_BACKEND_MEMBOUND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to memory write pending caused by stores stalled in the pre-commit stage.",
> + "MetricGroup": "Topdown_Backend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "backend_mem_tlb_bound",
> + "MetricExpr": "STALL_BACKEND_TLB / STALL_BACKEND_MEMBOUND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the backend due to memory access latency issues caused by data TLB misses.",
> + "MetricGroup": "Topdown_Backend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "backend_stalled_cycles",
> + "MetricExpr": "STALL_BACKEND / CPU_CYCLES * 100",
> + "BriefDescription": "This metric is the percentage of cycles that were stalled due to resource constraints in the backend unit of the processor.",
> + "MetricGroup": "Cycle_Accounting",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "ArchStdEvent": "bad_speculation",
> + "MetricExpr": "(1 - STALL_SLOT / (10 * CPU_CYCLES)) * (1 - OP_RETIRED / OP_SPEC) * 100 + STALL_FRONTEND_FLUSH / CPU_CYCLES * 100"
> + },
> + {
> + "MetricName": "branch_direct_ratio",
> + "MetricExpr": "BR_IMMED_RETIRED / BR_RETIRED",
> + "BriefDescription": "This metric measures the ratio of direct branches retired to the total number of branches architecturally executed.",
> + "MetricGroup": "Branch_Effectiveness",
> + "ScaleUnit": "1per branch"
> + },
> + {
> + "MetricName": "branch_indirect_ratio",
> + "MetricExpr": "BR_IND_RETIRED / BR_RETIRED",
> + "BriefDescription": "This metric measures the ratio of indirect branches retired, including function returns, to the total number of branches architecturally executed.",
> + "MetricGroup": "Branch_Effectiveness",
> + "ScaleUnit": "1per branch"
> + },
> + {
> + "MetricName": "branch_misprediction_ratio",
> + "MetricExpr": "BR_MIS_PRED_RETIRED / BR_RETIRED",
> + "BriefDescription": "This metric measures the ratio of branches mispredicted to the total number of branches architecturally executed. This gives an indication of the effectiveness of the branch prediction unit.",
> + "MetricGroup": "Miss_Ratio;Branch_Effectiveness",
> + "ScaleUnit": "100percent of branches"
> + },
> + {
> + "MetricName": "branch_mpki",
> + "MetricExpr": "BR_MIS_PRED_RETIRED / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of branch mispredictions per thousand instructions executed.",
> + "MetricGroup": "MPKI;Branch_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "branch_percentage",
> + "MetricExpr": "PC_WRITE_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures branch operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "branch_return_ratio",
> + "MetricExpr": "BR_RETURN_RETIRED / BR_RETIRED",
> + "BriefDescription": "This metric measures the ratio of branches retired that are function returns to the total number of branches architecturally executed.",
> + "MetricGroup": "Branch_Effectiveness",
> + "ScaleUnit": "1per branch"
> + },
> + {
> + "MetricName": "crypto_percentage",
> + "MetricExpr": "CRYPTO_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures crypto operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "dtlb_mpki",
> + "MetricExpr": "DTLB_WALK / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of data TLB Walks per thousand instructions executed.",
> + "MetricGroup": "MPKI;DTLB_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "dtlb_walk_ratio",
> + "MetricExpr": "DTLB_WALK / L1D_TLB",
> + "BriefDescription": "This metric measures the ratio of data TLB Walks to the total number of data TLB accesses. This gives an indication of the effectiveness of the data TLB accesses.",
> + "MetricGroup": "Miss_Ratio;DTLB_Effectiveness",
> + "ScaleUnit": "100percent of TLB accesses"
> + },
> + {
> + "MetricName": "fp16_percentage",
> + "MetricExpr": "FP_HP_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures half-precision floating point operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "FP_Precision_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "fp32_percentage",
> + "MetricExpr": "FP_SP_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures single-precision floating point operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "FP_Precision_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "fp64_percentage",
> + "MetricExpr": "FP_DP_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures double-precision floating point operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "FP_Precision_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "frontend_cache_l1i_bound",
> + "MetricExpr": "STALL_FRONTEND_L1I / (STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to memory access latency issues caused by level 1 instruction cache misses.",
> + "MetricGroup": "Topdown_Frontend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "frontend_cache_l2i_bound",
> + "MetricExpr": "STALL_FRONTEND_MEM / (STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to memory access latency issues caused by level 2 instruction cache misses.",
> + "MetricGroup": "Topdown_Frontend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "frontend_core_bound",
> + "MetricExpr": "STALL_FRONTEND_CPUBOUND / STALL_FRONTEND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to frontend core resource constraints not related to instruction fetch latency issues caused by memory access components.",
> + "MetricGroup": "Topdown_Frontend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "frontend_core_flush_bound",
> + "MetricExpr": "STALL_FRONTEND_FLUSH / STALL_FRONTEND_CPUBOUND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend as the processor is recovering from a pipeline flush caused by bad speculation or other machine resteers.",
> + "MetricGroup": "Topdown_Frontend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "frontend_mem_bound",
> + "MetricExpr": "STALL_FRONTEND_MEMBOUND / STALL_FRONTEND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to frontend core resource constraints related to the instruction fetch latency issues caused by memory access components.",
> + "MetricGroup": "Topdown_Frontend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "frontend_mem_cache_bound",
> + "MetricExpr": "(STALL_FRONTEND_L1I + STALL_FRONTEND_MEM) / STALL_FRONTEND_MEMBOUND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to instruction fetch latency issues caused by instruction cache misses.",
> + "MetricGroup": "Topdown_Frontend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "frontend_mem_tlb_bound",
> + "MetricExpr": "STALL_FRONTEND_TLB / STALL_FRONTEND_MEMBOUND * 100",
> + "BriefDescription": "This metric is the percentage of total cycles stalled in the frontend due to instruction fetch latency issues caused by instruction TLB misses.",
> + "MetricGroup": "Topdown_Frontend",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "frontend_stalled_cycles",
> + "MetricExpr": "STALL_FRONTEND / CPU_CYCLES * 100",
> + "BriefDescription": "This metric is the percentage of cycles that were stalled due to resource constraints in the frontend unit of the processor.",
> + "MetricGroup": "Cycle_Accounting",
> + "ScaleUnit": "1percent of cycles"
> + },
> + {
> + "MetricName": "integer_dp_percentage",
> + "MetricExpr": "DP_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures scalar integer operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "ipc",
> + "MetricExpr": "INST_RETIRED / CPU_CYCLES",
> + "BriefDescription": "This metric measures the number of instructions retired per cycle.",
> + "MetricGroup": "General",
> + "ScaleUnit": "1per cycle"
> + },
> + {
> + "MetricName": "itlb_mpki",
> + "MetricExpr": "ITLB_WALK / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of instruction TLB Walks per thousand instructions executed.",
> + "MetricGroup": "MPKI;ITLB_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "itlb_walk_ratio",
> + "MetricExpr": "ITLB_WALK / L1I_TLB",
> + "BriefDescription": "This metric measures the ratio of instruction TLB Walks to the total number of instruction TLB accesses. This gives an indication of the effectiveness of the instruction TLB accesses.",
> + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness",
> + "ScaleUnit": "100percent of TLB accesses"
> + },
> + {
> + "MetricName": "l1d_cache_miss_ratio",
> + "MetricExpr": "L1D_CACHE_REFILL / L1D_CACHE",
> + "BriefDescription": "This metric measures the ratio of level 1 data cache accesses missed to the total number of level 1 data cache accesses. This gives an indication of the effectiveness of the level 1 data cache.",
> + "MetricGroup": "Miss_Ratio;L1D_Cache_Effectiveness",
> + "ScaleUnit": "100percent of cache accesses"
> + },
> + {
> + "MetricName": "l1d_cache_mpki",
> + "MetricExpr": "L1D_CACHE_REFILL / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of level 1 data cache accesses missed per thousand instructions executed.",
> + "MetricGroup": "MPKI;L1D_Cache_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "l1d_tlb_miss_ratio",
> + "MetricExpr": "L1D_TLB_REFILL / L1D_TLB",
> + "BriefDescription": "This metric measures the ratio of level 1 data TLB accesses missed to the total number of level 1 data TLB accesses. This gives an indication of the effectiveness of the level 1 data TLB.",
> + "MetricGroup": "Miss_Ratio;DTLB_Effectiveness",
> + "ScaleUnit": "100percent of TLB accesses"
> + },
> + {
> + "MetricName": "l1d_tlb_mpki",
> + "MetricExpr": "L1D_TLB_REFILL / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of level 1 data TLB accesses missed per thousand instructions executed.",
> + "MetricGroup": "MPKI;DTLB_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "l1i_cache_miss_ratio",
> + "MetricExpr": "L1I_CACHE_REFILL / L1I_CACHE",
> + "BriefDescription": "This metric measures the ratio of level 1 instruction cache accesses missed to the total number of level 1 instruction cache accesses. This gives an indication of the effectiveness of the level 1 instruction cache.",
> + "MetricGroup": "Miss_Ratio;L1I_Cache_Effectiveness",
> + "ScaleUnit": "100percent of cache accesses"
> + },
> + {
> + "MetricName": "l1i_cache_mpki",
> + "MetricExpr": "L1I_CACHE_REFILL / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of level 1 instruction cache accesses missed per thousand instructions executed.",
> + "MetricGroup": "MPKI;L1I_Cache_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "l1i_tlb_miss_ratio",
> + "MetricExpr": "L1I_TLB_REFILL / L1I_TLB",
> + "BriefDescription": "This metric measures the ratio of level 1 instruction TLB accesses missed to the total number of level 1 instruction TLB accesses. This gives an indication of the effectiveness of the level 1 instruction TLB.",
> + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness",
> + "ScaleUnit": "100percent of TLB accesses"
> + },
> + {
> + "MetricName": "l1i_tlb_mpki",
> + "MetricExpr": "L1I_TLB_REFILL / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of level 1 instruction TLB accesses missed per thousand instructions executed.",
> + "MetricGroup": "MPKI;ITLB_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "l2_cache_miss_ratio",
> + "MetricExpr": "L2D_CACHE_REFILL / L2D_CACHE",
> + "BriefDescription": "This metric measures the ratio of level 2 cache accesses missed to the total number of level 2 cache accesses. This gives an indication of the effectiveness of the level 2 cache, which is a unified cache that stores both data and instruction. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a unified cache.",
> + "MetricGroup": "Miss_Ratio;L2_Cache_Effectiveness",
> + "ScaleUnit": "100percent of cache accesses"
> + },
> + {
> + "MetricName": "l2_cache_mpki",
> + "MetricExpr": "L2D_CACHE_REFILL / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of level 2 unified cache accesses missed per thousand instructions executed. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a unified cache.",
> + "MetricGroup": "MPKI;L2_Cache_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "l2_tlb_miss_ratio",
> + "MetricExpr": "L2D_TLB_REFILL / L2D_TLB",
> + "BriefDescription": "This metric measures the ratio of level 2 unified TLB accesses missed to the total number of level 2 unified TLB accesses. This gives an indication of the effectiveness of the level 2 TLB.",
> + "MetricGroup": "Miss_Ratio;ITLB_Effectiveness;DTLB_Effectiveness",
> + "ScaleUnit": "100percent of TLB accesses"
> + },
> + {
> + "MetricName": "l2_tlb_mpki",
> + "MetricExpr": "L2D_TLB_REFILL / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of level 2 unified TLB accesses missed per thousand instructions executed.",
> + "MetricGroup": "MPKI;ITLB_Effectiveness;DTLB_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "ll_cache_read_hit_ratio",
> + "MetricExpr": "(LL_CACHE_RD - LL_CACHE_MISS_RD) / LL_CACHE_RD",
> + "BriefDescription": "This metric measures the ratio of last level cache read accesses hit in the cache to the total number of last level cache accesses. This gives an indication of the effectiveness of the last level cache for read traffic. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a system level cache.",
> + "MetricGroup": "LL_Cache_Effectiveness",
> + "ScaleUnit": "100percent of cache accesses"
> + },
> + {
> + "MetricName": "ll_cache_read_miss_ratio",
> + "MetricExpr": "LL_CACHE_MISS_RD / LL_CACHE_RD",
> + "BriefDescription": "This metric measures the ratio of last level cache read accesses missed to the total number of last level cache accesses. This gives an indication of the effectiveness of the last level cache for read traffic. Note that cache accesses in this cache are either data memory access or instruction fetch as this is a system level cache.",
> + "MetricGroup": "Miss_Ratio;LL_Cache_Effectiveness",
> + "ScaleUnit": "100percent of cache accesses"
> + },
> + {
> + "MetricName": "ll_cache_read_mpki",
> + "MetricExpr": "LL_CACHE_MISS_RD / INST_RETIRED * 1000",
> + "BriefDescription": "This metric measures the number of last level cache read accesses missed per thousand instructions executed.",
> + "MetricGroup": "MPKI;LL_Cache_Effectiveness",
> + "ScaleUnit": "1MPKI"
> + },
> + {
> + "MetricName": "load_percentage",
> + "MetricExpr": "LD_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures load operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "scalar_fp_percentage",
> + "MetricExpr": "VFP_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures scalar floating point operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "simd_percentage",
> + "MetricExpr": "ASE_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures advanced SIMD operations as a percentage of total operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "store_percentage",
> + "MetricExpr": "ST_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures store operations as a percentage of operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + },
> + {
> + "MetricName": "sve_all_percentage",
> + "MetricExpr": "SVE_INST_SPEC / INST_SPEC * 100",
> + "BriefDescription": "This metric measures scalable vector operations, including loads and stores, as a percentage of operations speculatively executed.",
> + "MetricGroup": "Operation_Mix",
> + "ScaleUnit": "1percent of operations"
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json
> new file mode 100644
> index 000000000000..d8b7b9f9e5fa
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json
> @@ -0,0 +1,8 @@
> +[
> + {
> + "ArchStdEvent": "PMU_OVFS"
> + },
> + {
> + "ArchStdEvent": "PMU_HOVFS"
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json
> new file mode 100644
> index 000000000000..152f15c1253c
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json
> @@ -0,0 +1,90 @@
> +[
> + {
> + "ArchStdEvent": "SW_INCR",
> + "PublicDescription": "Counts software writes to the PMSWINC_EL0 (software PMU increment) register. The PMSWINC_EL0 register is a manually updated counter for use by application software.\n\nThis event could be used to measure any user program event, such as accesses to a particular data structure (by writing to the PMSWINC_EL0 register each time the data structure is accessed).\n\nTo use the PMSWINC_EL0 register and event, developers must insert instructions that write to the PMSWINC_EL0 register into the source code.\n\nSince the SW_INCR event records writes to the PMSWINC_EL0 register, there is no need to do a read/increment/write sequence to the PMSWINC_EL0 register."
> + },
> + {
> + "ArchStdEvent": "LD_RETIRED",
> + "PublicDescription": "Counts instruction architecturally executed, Condition code check pass, load."
> + },
> + {
> + "ArchStdEvent": "ST_RETIRED",
> + "PublicDescription": "Counts instruction architecturally executed, Condition code check pass, store."
> + },
> + {
> + "ArchStdEvent": "INST_RETIRED",
> + "PublicDescription": "Counts instructions that have been architecturally executed."
> + },
> + {
> + "ArchStdEvent": "CID_WRITE_RETIRED",
> + "PublicDescription": "Counts architecturally executed writes to the CONTEXTIDR_EL1 register, which usually contain the kernel PID and can be output with hardware trace."
> + },
> + {
> + "ArchStdEvent": "PC_WRITE_RETIRED",
> + "PublicDescription": "Counts branch instructions that caused a change of Program Counter, which effectively causes a change in the control flow of the program."
> + },
> + {
> + "ArchStdEvent": "BR_IMMED_RETIRED",
> + "PublicDescription": "Counts architecturally executed direct branches."
> + },
> + {
> + "ArchStdEvent": "BR_RETURN_RETIRED",
> + "PublicDescription": "Counts architecturally executed procedure returns."
> + },
> + {
> + "ArchStdEvent": "TTBR_WRITE_RETIRED",
> + "PublicDescription": "Counts architectural writes to TTBR0/1_EL1. If virtualization host extensions are enabled (by setting the HCR_EL2.E2H bit to 1), then accesses to TTBR0/1_EL1 that are redirected to TTBR0/1_EL2, or accesses to TTBR0/1_EL12, are counted. TTBRn registers are typically updated when the kernel is swapping user-space threads or applications."
> + },
> + {
> + "ArchStdEvent": "BR_RETIRED",
> + "PublicDescription": "Counts architecturally executed branches, whether the branch is taken or not. Instructions that explicitly write to the PC are also counted. Note that exception generating instructions, exception return instructions and context synchronization instructions are not counted."
> + },
> + {
> + "ArchStdEvent": "BR_MIS_PRED_RETIRED",
> + "PublicDescription": "Counts branches counted by BR_RETIRED which were mispredicted and caused a pipeline flush."
> + },
> + {
> + "ArchStdEvent": "OP_RETIRED",
> + "PublicDescription": "Counts micro-operations that are architecturally executed. This is a count of number of micro-operations retired from the commit queue in a single cycle."
> + },
> + {
> + "ArchStdEvent": "SVE_INST_RETIRED",
> + "PublicDescription": "Counts architecturally executed SVE instructions."
> + },
> + {
> + "ArchStdEvent": "BR_INDNR_TAKEN_RETIRED",
> + "PublicDescription": "Counts architecturally executed indirect branches excluding procedure returns that were taken."
> + },
> + {
> + "ArchStdEvent": "BR_IMMED_PRED_RETIRED",
> + "PublicDescription": "Counts architecturally executed direct branches that were correctly predicted."
> + },
> + {
> + "ArchStdEvent": "BR_IMMED_MIS_PRED_RETIRED",
> + "PublicDescription": "Counts architecturally executed direct branches that were mispredicted and caused a pipeline flush."
> + },
> + {
> + "ArchStdEvent": "BR_RETURN_PRED_RETIRED",
> + "PublicDescription": "Counts architecturally executed procedure returns that were correctly predicted."
> + },
> + {
> + "ArchStdEvent": "BR_RETURN_MIS_PRED_RETIRED",
> + "PublicDescription": "Counts architecturally executed procedure returns that were mispredicted and caused a pipeline flush."
> + },
> + {
> + "ArchStdEvent": "BR_INDNR_PRED_RETIRED",
> + "PublicDescription": "Counts architecturally executed indirect branches excluding procedure returns that were correctly predicted."
> + },
> + {
> + "ArchStdEvent": "BR_INDNR_MIS_PRED_RETIRED",
> + "PublicDescription": "Counts architecturally executed indirect branches excluding procedure returns that were mispredicted and caused a pipeline flush."
> + },
> + {
> + "ArchStdEvent": "BR_PRED_RETIRED",
> + "PublicDescription": "Counts branch instructions counted by BR_RETIRED which were correctly predicted."
> + },
> + {
> + "ArchStdEvent": "BR_IND_RETIRED",
> + "PublicDescription": "Counts architecturally executed indirect branches including procedure returns."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.json
> new file mode 100644
> index 000000000000..40c29be53cc0
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.json
> @@ -0,0 +1,70 @@
> +[
> + {
> + "ArchStdEvent": "BR_MIS_PRED",
> + "PublicDescription": "Counts branches which are speculatively executed and mispredicted."
> + },
> + {
> + "ArchStdEvent": "BR_PRED",
> + "PublicDescription": "Counts all speculatively executed branches."
> + },
> + {
> + "ArchStdEvent": "INST_SPEC",
> + "PublicDescription": "Counts operations that have been speculatively executed."
> + },
> + {
> + "ArchStdEvent": "OP_SPEC",
> + "PublicDescription": "Counts micro-operations speculatively executed. This is the count of the number of micro-operations dispatched in a cycle."
> + },
> + {
> + "ArchStdEvent": "STREX_FAIL_SPEC",
> + "PublicDescription": "Counts store-exclusive operations that have been speculatively executed and have not successfully completed the store operation."
> + },
> + {
> + "ArchStdEvent": "STREX_SPEC",
> + "PublicDescription": "Counts store-exclusive operations that have been speculatively executed."
> + },
> + {
> + "ArchStdEvent": "LD_SPEC",
> + "PublicDescription": "Counts speculatively executed load operations including Single Instruction Multiple Data (SIMD) load operations."
> + },
> + {
> + "ArchStdEvent": "ST_SPEC",
> + "PublicDescription": "Counts speculatively executed store operations including Single Instruction Multiple Data (SIMD) store operations."
> + },
> + {
> + "ArchStdEvent": "LDST_SPEC",
> + "PublicDescription": "Counts speculatively executed load and store operations."
> + },
> + {
> + "ArchStdEvent": "DP_SPEC",
> + "PublicDescription": "Counts speculatively executed logical or arithmetic instructions such as MOV/MVN operations."
> + },
> + {
> + "ArchStdEvent": "ASE_SPEC",
> + "PublicDescription": "Counts speculatively executed Advanced SIMD operations excluding load, store and move micro-operations that move data to or from SIMD (vector) registers."
> + },
> + {
> + "ArchStdEvent": "VFP_SPEC",
> + "PublicDescription": "Counts speculatively executed floating point operations. This event does not count operations that move data to or from floating point (vector) registers."
> + },
> + {
> + "ArchStdEvent": "PC_WRITE_SPEC",
> + "PublicDescription": "Counts speculatively executed operations which cause software changes of the PC. Those operations include all taken branch operations."
> + },
> + {
> + "ArchStdEvent": "CRYPTO_SPEC",
> + "PublicDescription": "Counts speculatively executed cryptographic operations except for PMULL and VMULL operations."
> + },
> + {
> + "ArchStdEvent": "BR_IMMED_SPEC",
> + "PublicDescription": "Counts direct branch operations which are speculatively executed."
> + },
> + {
> + "ArchStdEvent": "BR_RETURN_SPEC",
> + "PublicDescription": "Counts procedure return operations (RET, RETAA and RETAB) which are speculatively executed."
> + },
> + {
> + "ArchStdEvent": "BR_INDIRECT_SPEC",
> + "PublicDescription": "Counts indirect branch operations including procedure returns, which are speculatively executed. This includes operations that force a software change of the PC, other than exception-generating operations and direct branch instructions. Some examples of the instructions counted by this event include BR Xn, RET, etc..."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json
> new file mode 100644
> index 000000000000..d65aeb4b8808
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json
> @@ -0,0 +1,82 @@
> +[
> + {
> + "ArchStdEvent": "STALL_FRONTEND",
> + "PublicDescription": "Counts cycles when frontend could not send any micro-operations to the rename stage because of frontend resource stalls caused by fetch memory latency or branch prediction flow stalls. STALL_FRONTEND_SLOTS counts SLOTS during the cycle when this event counts."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND",
> + "PublicDescription": "Counts cycles whenever the rename unit is unable to send any micro-operations to the backend of the pipeline because of backend resource constraints. Backend resource constraints can include issue stage fullness, execution stage fullness, or other internal pipeline resource fullness. All the backend slots were empty during the cycle when this event counts."
> + },
> + {
> + "ArchStdEvent": "STALL",
> + "PublicDescription": "Counts cycles when no operations are sent to the rename unit from the frontend or from the rename unit to the backend for any reason (either frontend or backend stall). This event is the sum of STALL_FRONTEND and STALL_BACKEND"
> + },
> + {
> + "ArchStdEvent": "STALL_SLOT_BACKEND",
> + "PublicDescription": "Counts slots per cycle in which no operations are sent from the rename unit to the backend due to backend resource constraints. STALL_BACKEND counts during the cycle when STALL_SLOT_BACKEND counts at least 1."
> + },
> + {
> + "ArchStdEvent": "STALL_SLOT_FRONTEND",
> + "PublicDescription": "Counts slots per cycle in which no operations are sent to the rename unit from the frontend due to frontend resource constraints."
> + },
> + {
> + "ArchStdEvent": "STALL_SLOT",
> + "PublicDescription": "Counts slots per cycle in which no operations are sent to the rename unit from the frontend or from the rename unit to the backend for any reason (either frontend or backend stall). STALL_SLOT is the sum of STALL_SLOT_FRONTEND and STALL_SLOT_BACKEND."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND_MEM",
> + "PublicDescription": "Counts cycles when the backend is stalled because there is a pending demand load request in progress in the last level core cache."
> + },
> + {
> + "ArchStdEvent": "STALL_FRONTEND_MEMBOUND",
> + "PublicDescription": "Counts cycles when the frontend could not send any micro-operations to the rename stage due to resource constraints in the memory resources."
> + },
> + {
> + "ArchStdEvent": "STALL_FRONTEND_L1I",
> + "PublicDescription": "Counts cycles when the frontend is stalled because there is an instruction fetch request pending in the level 1 instruction cache."
> + },
> + {
> + "ArchStdEvent": "STALL_FRONTEND_MEM",
> + "PublicDescription": "Counts cycles when the frontend is stalled because there is an instruction fetch request pending in the last level core cache."
> + },
> + {
> + "ArchStdEvent": "STALL_FRONTEND_TLB",
> + "PublicDescription": "Counts when the frontend is stalled on any TLB misses being handled. This event also counts the TLB accesses made by hardware prefetches."
> + },
> + {
> + "ArchStdEvent": "STALL_FRONTEND_CPUBOUND",
> + "PublicDescription": "Counts cycles when the frontend could not send any micro-operations to the rename stage due to resource constraints in the CPU resources excluding memory resources."
> + },
> + {
> + "ArchStdEvent": "STALL_FRONTEND_FLOW",
> + "PublicDescription": "Counts cycles when the frontend could not send any micro-operations to the rename stage due to resource constraints in the branch prediction unit."
> + },
> + {
> + "ArchStdEvent": "STALL_FRONTEND_FLUSH",
> + "PublicDescription": "Counts cycles when the frontend could not send any micro-operations to the rename stage as the frontend is recovering from a machine flush or resteer. Example scenarios that cause a flush include branch mispredictions, taken exceptions, micro-architectural flush etc."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND_MEMBOUND",
> + "PublicDescription": "Counts cycles when the backend could not accept any micro-operations due to resource constraints in the memory resources."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND_L1D",
> + "PublicDescription": "Counts cycles when the backend is stalled because there is a pending demand load request in progress in the level 1 data cache."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND_TLB",
> + "PublicDescription": "Counts cycles when the backend is stalled on any demand TLB misses being handled."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND_ST",
> + "PublicDescription": "Counts cycles when the backend is stalled and there is a store that has not reached the pre-commit stage."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND_BUSY",
> + "PublicDescription": "Counts cycles when the backend could not accept any micro-operations because the issue queues are full to take any operations for execution."
> + },
> + {
> + "ArchStdEvent": "STALL_BACKEND_ILOCK",
> + "PublicDescription": "Counts cycles when the backend could not accept any micro-operations due to resource constraints imposed by input dependency."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json
> new file mode 100644
> index 000000000000..21810ce5de8d
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json
> @@ -0,0 +1,22 @@
> +[
> + {
> + "ArchStdEvent": "SVE_INST_SPEC",
> + "PublicDescription": "Counts speculatively executed operations that are SVE operations."
> + },
> + {
> + "ArchStdEvent": "ASE_SVE_INT8_SPEC",
> + "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type an 8-bit integer."
> + },
> + {
> + "ArchStdEvent": "ASE_SVE_INT16_SPEC",
> + "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 16-bit integer."
> + },
> + {
> + "ArchStdEvent": "ASE_SVE_INT32_SPEC",
> + "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 32-bit integer."
> + },
> + {
> + "ArchStdEvent": "ASE_SVE_INT64_SPEC",
> + "PublicDescription": "Counts speculatively executed Advanced SIMD or SVE integer operations with the largest data type a 64-bit integer."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json
> new file mode 100644
> index 000000000000..1de56300e581
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json
> @@ -0,0 +1,78 @@
> +[
> + {
> + "ArchStdEvent": "L1I_TLB_REFILL",
> + "PublicDescription": "Counts level 1 instruction TLB refills from any Instruction fetch. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB."
> + },
> + {
> + "ArchStdEvent": "L1D_TLB_REFILL",
> + "PublicDescription": "Counts level 1 data TLB accesses that resulted in TLB refills. If there are multiple misses in the TLB that are resolved by the refill, then this event only counts once. This event counts for refills caused by preload instructions or hardware prefetch accesses. This event counts regardless of whether the miss hits in L2 or results in a translation table walk. This event will not count if the translation table walk results in a fault (such as a translation or access fault), since there is no new translation created for the TLB. This event will not count on an access from an AT(address translation) instruction."
> + },
> + {
> + "ArchStdEvent": "L1D_TLB",
> + "PublicDescription": "Counts level 1 data TLB accesses caused by any memory load or store operation. Note that load or store instructions can be broken up into multiple memory operations. This event does not count TLB maintenance operations."
> + },
> + {
> + "ArchStdEvent": "L1I_TLB",
> + "PublicDescription": "Counts level 1 instruction TLB accesses, whether the access hits or misses in the TLB. This event counts both demand accesses and prefetch or preload generated accesses."
> + },
> + {
> + "ArchStdEvent": "L2D_TLB_REFILL",
> + "PublicDescription": "Counts level 2 TLB refills caused by memory operations from both data and instruction fetch, except for those caused by TLB maintenance operations and hardware prefetches."
> + },
> + {
> + "ArchStdEvent": "L2D_TLB",
> + "PublicDescription": "Counts level 2 TLB accesses except those caused by TLB maintenance operations."
> + },
> + {
> + "ArchStdEvent": "DTLB_WALK",
> + "PublicDescription": "Counts number of demand data translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that partial translations that cause a translation table walk are also counted. Also note that this event counts walks triggered by software preloads, but not walks triggered by hardware prefetchers, and that this event does not count walks triggered by TLB maintenance operations."
> + },
> + {
> + "ArchStdEvent": "ITLB_WALK",
> + "PublicDescription": "Counts number of instruction translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that partial translations that cause a translation table walk are also counted. Also note that this event does not count walks triggered by TLB maintenance operations."
> + },
> + {
> + "ArchStdEvent": "DTLB_WALK_PERCYC",
> + "PublicDescription": "Counts the number of data translation table walks in progress per cycle."
> + },
> + {
> + "ArchStdEvent": "ITLB_WALK_PERCYC",
> + "PublicDescription": "Counts the number of instruction translation table walks in progress per cycle."
> + },
> + {
> + "ArchStdEvent": "DTLB_HWUPD",
> + "PublicDescription": "Counts number of memory accesses triggered by a data translation table walk and performing an update of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that this event counts accesses triggered by software preloads, but not accesses triggered by hardware prefetchers."
> + },
> + {
> + "ArchStdEvent": "ITLB_HWUPD",
> + "PublicDescription": "Counts number of memory accesses triggered by an instruction translation table walk and performing an update of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD."
> + },
> + {
> + "ArchStdEvent": "DTLB_STEP",
> + "PublicDescription": "Counts number of memory accesses triggered by a demand data translation table walk and performing a read of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that this event counts accesses triggered by software preloads, but not accesses triggered by hardware prefetchers."
> + },
> + {
> + "ArchStdEvent": "ITLB_STEP",
> + "PublicDescription": "Counts number of memory accesses triggered by an instruction translation table walk and performing a read of a translation table entry. Memory accesses are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD."
> + },
> + {
> + "ArchStdEvent": "DTLB_WALK_LARGE",
> + "PublicDescription": "Counts number of demand data translation table walks caused by a miss in the L2 TLB and yielding a large page. The set of large pages is defined as all pages with a final size higher than or equal to 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. If DTLB_WALK_BLOCK is implemented, then it is an alias for this event in this family. Note that partial translations that cause a translation table walk are also counted. Also note that this event counts walks triggered by software preloads, but not walks triggered by hardware prefetchers, and that this event does not count walks triggered by TLB maintenance operations."
> + },
> + {
> + "ArchStdEvent": "ITLB_WALK_LARGE",
> + "PublicDescription": "Counts number of instruction translation table walks caused by a miss in the L2 TLB and yielding a large page. The set of large pages is defined as all pages with a final size higher than or equal to 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. In this family, this is equal to ITLB_WALK_BLOCK event. Note that partial translations that cause a translation table walk are also counted. Also note that this event does not count walks triggered by TLB maintenance operations."
> + },
> + {
> + "ArchStdEvent": "DTLB_WALK_SMALL",
> + "PublicDescription": "Counts number of data translation table walks caused by a miss in the L2 TLB and yielding a small page. The set of small pages is defined as all pages with a final size lower than 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. If DTLB_WALK_PAGE event is implemented, then it is an alias for this event in this family. Note that partial translations that cause a translation table walk are also counted. Also note that this event counts walks triggered by software preloads, but not walks triggered by hardware prefetchers, and that this event does not count walks triggered by TLB maintenance operations."
> + },
> + {
> + "ArchStdEvent": "ITLB_WALK_SMALL",
> + "PublicDescription": "Counts number of instruction translation table walks caused by a miss in the L2 TLB and yielding a small page. The set of small pages is defined as all pages with a final size lower than 2MB. Translation table walks that end up taking a translation fault are not counted, as the page size would be undefined in that case. In this family, this is equal to ITLB_WALK_PAGE event. Note that partial translations that cause a translation table walk are also counted. Also note that this event does not count walks triggered by TLB maintenance operations."
> + },
> + {
> + "ArchStdEvent": "DTLB_WALK_RW",
> + "PublicDescription": "Counts number of demand data translation table walks caused by a miss in the L2 TLB and performing at least one memory access. Translation table walks are counted even if the translation ended up taking a translation fault for reasons different than EPD, E0PD and NFD. Note that partial translations that cause a translation table walk are also counted. Also note that this event does not count walks triggered by TLB maintenance operations."
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json
> new file mode 100644
> index 000000000000..33672a8711d4
> --- /dev/null
> +++ b/tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json
> @@ -0,0 +1,32 @@
> +[
> + {
> + "ArchStdEvent": "TRB_WRAP"
> + },
> + {
> + "ArchStdEvent": "TRB_TRIG"
> + },
> + {
> + "ArchStdEvent": "TRCEXTOUT0"
> + },
> + {
> + "ArchStdEvent": "TRCEXTOUT1"
> + },
> + {
> + "ArchStdEvent": "TRCEXTOUT2"
> + },
> + {
> + "ArchStdEvent": "TRCEXTOUT3"
> + },
> + {
> + "ArchStdEvent": "CTI_TRIGOUT4"
> + },
> + {
> + "ArchStdEvent": "CTI_TRIGOUT5"
> + },
> + {
> + "ArchStdEvent": "CTI_TRIGOUT6"
> + },
> + {
> + "ArchStdEvent": "CTI_TRIGOUT7"
> + }
> +]
> diff --git a/tools/perf/pmu-events/arch/arm64/common-and-microarch.json b/tools/perf/pmu-events/arch/arm64/common-and-microarch.json
> index e40be37addf8..3e774c1e1413 100644
> --- a/tools/perf/pmu-events/arch/arm64/common-and-microarch.json
> +++ b/tools/perf/pmu-events/arch/arm64/common-and-microarch.json
> @@ -1339,6 +1339,11 @@
> "EventName": "INST_FETCH",
> "BriefDescription": "Instruction memory access"
> },
> + {
> + "EventCode": "0x8125",
> + "EventName": "BUS_REQ_RD_PERCYC",
> + "BriefDescription": "Bus read transactions in progress"
> + },
> {
> "EventCode": "0x8128",
> "EventName": "DTLB_WALK_PERCYC",
> @@ -1539,6 +1544,11 @@
> "EventName": "L2D_CACHE_HWPRF",
> "BriefDescription": "Level 2 data cache hardware prefetch."
> },
> + {
> + "EventCode": "0x8156",
> + "EventName": "L3D_CACHE_HWPRF",
> + "BriefDescription": "Level 3 data cache hardware prefetch."
> + },
> {
> "EventCode": "0x8158",
> "EventName": "STALL_FRONTEND_MEMBOUND",
> @@ -1674,6 +1684,11 @@
> "EventName": "DTLB_WALK_PAGE",
> "BriefDescription": "Data TLB page translation table walk."
> },
> + {
> + "EventCode": "0x818D",
> + "EventName": "BUS_REQ_RD",
> + "BriefDescription": "Bus request, read"
> + },
> {
> "EventCode": "0x818B",
> "EventName": "ITLB_WALK_PAGE",
> diff --git a/tools/perf/pmu-events/arch/arm64/mapfile.csv b/tools/perf/pmu-events/arch/arm64/mapfile.csv
> index ccfcae375750..6b98632636e1 100644
> --- a/tools/perf/pmu-events/arch/arm64/mapfile.csv
> +++ b/tools/perf/pmu-events/arch/arm64/mapfile.csv
> @@ -32,6 +32,7 @@
> 0x00000000410fd440,v1,arm/cortex-x1,core
> 0x00000000410fd4c0,v1,arm/cortex-x1,core
> 0x00000000410fd460,v1,arm/cortex-a510,core
> +0x00000000410fd800,v1,arm/cortex-a520,core
> 0x00000000410fd470,v1,arm/cortex-a710,core
> 0x00000000410fd810,v1,arm/cortex-a720,core
> 0x00000000410fd480,v1,arm/cortex-x2,core
> --
> 2.47.2
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics
2025-02-13 15:11 [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics Yangyu Chen
2025-02-13 15:12 ` [PATCH 1/2] perf vendor events arm64: Add Cortex-A720 events/metrics Yangyu Chen
2025-02-13 15:12 ` [PATCH 2/2] perf vendor events arm64: Add Cortex-A520 events/metrics Yangyu Chen
@ 2025-02-14 1:12 ` Namhyung Kim
2025-02-14 5:49 ` Yangyu Chen
2 siblings, 1 reply; 16+ messages in thread
From: Namhyung Kim @ 2025-02-14 1:12 UTC (permalink / raw)
To: Yangyu Chen, Ian Rogers
Cc: linux-perf-users, John Garry, Will Deacon, James Clark,
Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Jiri Olsa, Adrian Hunter, Liang Kan, Yoshihiro Furudera,
linux-arm-kernel, linux-kernel
Hello,
On Thu, Feb 13, 2025 at 11:11:01PM +0800, Yangyu Chen wrote:
> This patchset adds the perf JSON files for the Cortex-A720 and Cortex-A520
> processors. Some events have been tested on Raxda Orion 6 with Cix P1 SoC
> (8xA720 + 4xA520) running mainline Kernel with ACPI mode.
I'm curious how the name of PMUs look like. It is cortex_a720 (or a520)?
I remember there's a logic to check the length of hex digits at the end.
Ian, are you ok with this?
Thanks,
Namhyung
>
> Yangyu Chen (2):
> perf vendor events arm64: Add Cortex-A720 events/metrics
> perf vendor events arm64: Add Cortex-A520 events/metrics
>
> .../arch/arm64/arm/cortex-a520/bus.json | 26 ++
> .../arch/arm64/arm/cortex-a520/exception.json | 18 +
> .../arm64/arm/cortex-a520/fp_operation.json | 14 +
> .../arch/arm64/arm/cortex-a520/general.json | 6 +
> .../arch/arm64/arm/cortex-a520/l1d_cache.json | 50 ++
> .../arch/arm64/arm/cortex-a520/l1i_cache.json | 14 +
> .../arch/arm64/arm/cortex-a520/l2_cache.json | 46 ++
> .../arch/arm64/arm/cortex-a520/l3_cache.json | 21 +
> .../arch/arm64/arm/cortex-a520/ll_cache.json | 10 +
> .../arch/arm64/arm/cortex-a520/memory.json | 58 +++
> .../arch/arm64/arm/cortex-a520/metrics.json | 373 +++++++++++++++
> .../arch/arm64/arm/cortex-a520/pmu.json | 8 +
> .../arch/arm64/arm/cortex-a520/retired.json | 90 ++++
> .../arm64/arm/cortex-a520/spec_operation.json | 70 +++
> .../arch/arm64/arm/cortex-a520/stall.json | 82 ++++
> .../arch/arm64/arm/cortex-a520/sve.json | 22 +
> .../arch/arm64/arm/cortex-a520/tlb.json | 78 ++++
> .../arch/arm64/arm/cortex-a520/trace.json | 32 ++
> .../arch/arm64/arm/cortex-a720/bus.json | 18 +
> .../arch/arm64/arm/cortex-a720/exception.json | 62 +++
> .../arm64/arm/cortex-a720/fp_operation.json | 22 +
> .../arch/arm64/arm/cortex-a720/general.json | 10 +
> .../arch/arm64/arm/cortex-a720/l1d_cache.json | 50 ++
> .../arch/arm64/arm/cortex-a720/l1i_cache.json | 14 +
> .../arch/arm64/arm/cortex-a720/l2_cache.json | 62 +++
> .../arch/arm64/arm/cortex-a720/l3_cache.json | 22 +
> .../arch/arm64/arm/cortex-a720/ll_cache.json | 10 +
> .../arch/arm64/arm/cortex-a720/memory.json | 54 +++
> .../arch/arm64/arm/cortex-a720/metrics.json | 436 ++++++++++++++++++
> .../arch/arm64/arm/cortex-a720/pmu.json | 8 +
> .../arch/arm64/arm/cortex-a720/retired.json | 90 ++++
> .../arch/arm64/arm/cortex-a720/spe.json | 42 ++
> .../arm64/arm/cortex-a720/spec_operation.json | 90 ++++
> .../arch/arm64/arm/cortex-a720/stall.json | 82 ++++
> .../arch/arm64/arm/cortex-a720/sve.json | 50 ++
> .../arch/arm64/arm/cortex-a720/tlb.json | 74 +++
> .../arch/arm64/arm/cortex-a720/trace.json | 32 ++
> .../arch/arm64/common-and-microarch.json | 15 +
> tools/perf/pmu-events/arch/arm64/mapfile.csv | 2 +
> 39 files changed, 2263 insertions(+)
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json
> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json
>
> --
> 2.47.2
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics
2025-02-14 1:12 ` [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics Namhyung Kim
@ 2025-02-14 5:49 ` Yangyu Chen
2025-02-14 10:02 ` James Clark
0 siblings, 1 reply; 16+ messages in thread
From: Yangyu Chen @ 2025-02-14 5:49 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ian Rogers, linux-perf-users, John Garry, Will Deacon,
James Clark, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Jiri Olsa, Adrian Hunter, Liang Kan, Yoshihiro Furudera,
linux-arm-kernel, linux-kernel
> On 14 Feb 2025, at 09:12, Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hello,
>
> On Thu, Feb 13, 2025 at 11:11:01PM +0800, Yangyu Chen wrote:
>> This patchset adds the perf JSON files for the Cortex-A720 and Cortex-A520
>> processors. Some events have been tested on Raxda Orion 6 with Cix P1 SoC
>> (8xA720 + 4xA520) running mainline Kernel with ACPI mode.
>
> I'm curious how the name of PMUs look like. It is cortex_a720 (or a520)?
The name of PMUs comes from Arm's documentation. I have included these
links in each patch.
> I remember there's a logic to check the length of hex digits at the end.
>
Could you provide more details about this?
> Ian, are you ok with this?
>
> Thanks,
> Namhyung
>
>>
>> Yangyu Chen (2):
>> perf vendor events arm64: Add Cortex-A720 events/metrics
>> perf vendor events arm64: Add Cortex-A520 events/metrics
>>
>> .../arch/arm64/arm/cortex-a520/bus.json | 26 ++
>> .../arch/arm64/arm/cortex-a520/exception.json | 18 +
>> .../arm64/arm/cortex-a520/fp_operation.json | 14 +
>> .../arch/arm64/arm/cortex-a520/general.json | 6 +
>> .../arch/arm64/arm/cortex-a520/l1d_cache.json | 50 ++
>> .../arch/arm64/arm/cortex-a520/l1i_cache.json | 14 +
>> .../arch/arm64/arm/cortex-a520/l2_cache.json | 46 ++
>> .../arch/arm64/arm/cortex-a520/l3_cache.json | 21 +
>> .../arch/arm64/arm/cortex-a520/ll_cache.json | 10 +
>> .../arch/arm64/arm/cortex-a520/memory.json | 58 +++
>> .../arch/arm64/arm/cortex-a520/metrics.json | 373 +++++++++++++++
>> .../arch/arm64/arm/cortex-a520/pmu.json | 8 +
>> .../arch/arm64/arm/cortex-a520/retired.json | 90 ++++
>> .../arm64/arm/cortex-a520/spec_operation.json | 70 +++
>> .../arch/arm64/arm/cortex-a520/stall.json | 82 ++++
>> .../arch/arm64/arm/cortex-a520/sve.json | 22 +
>> .../arch/arm64/arm/cortex-a520/tlb.json | 78 ++++
>> .../arch/arm64/arm/cortex-a520/trace.json | 32 ++
>> .../arch/arm64/arm/cortex-a720/bus.json | 18 +
>> .../arch/arm64/arm/cortex-a720/exception.json | 62 +++
>> .../arm64/arm/cortex-a720/fp_operation.json | 22 +
>> .../arch/arm64/arm/cortex-a720/general.json | 10 +
>> .../arch/arm64/arm/cortex-a720/l1d_cache.json | 50 ++
>> .../arch/arm64/arm/cortex-a720/l1i_cache.json | 14 +
>> .../arch/arm64/arm/cortex-a720/l2_cache.json | 62 +++
>> .../arch/arm64/arm/cortex-a720/l3_cache.json | 22 +
>> .../arch/arm64/arm/cortex-a720/ll_cache.json | 10 +
>> .../arch/arm64/arm/cortex-a720/memory.json | 54 +++
>> .../arch/arm64/arm/cortex-a720/metrics.json | 436 ++++++++++++++++++
>> .../arch/arm64/arm/cortex-a720/pmu.json | 8 +
>> .../arch/arm64/arm/cortex-a720/retired.json | 90 ++++
>> .../arch/arm64/arm/cortex-a720/spe.json | 42 ++
>> .../arm64/arm/cortex-a720/spec_operation.json | 90 ++++
>> .../arch/arm64/arm/cortex-a720/stall.json | 82 ++++
>> .../arch/arm64/arm/cortex-a720/sve.json | 50 ++
>> .../arch/arm64/arm/cortex-a720/tlb.json | 74 +++
>> .../arch/arm64/arm/cortex-a720/trace.json | 32 ++
>> .../arch/arm64/common-and-microarch.json | 15 +
>> tools/perf/pmu-events/arch/arm64/mapfile.csv | 2 +
>> 39 files changed, 2263 insertions(+)
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json
>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json
>>
>> --
>> 2.47.2
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics
2025-02-14 5:49 ` Yangyu Chen
@ 2025-02-14 10:02 ` James Clark
2025-02-18 0:41 ` Ian Rogers
0 siblings, 1 reply; 16+ messages in thread
From: James Clark @ 2025-02-14 10:02 UTC (permalink / raw)
To: Yangyu Chen, Namhyung Kim, Ian Rogers
Cc: linux-perf-users, John Garry, Will Deacon, Mike Leach, Leo Yan,
Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
Liang Kan, Yoshihiro Furudera, linux-arm-kernel, linux-kernel
On 14/02/2025 5:49 am, Yangyu Chen wrote:
>
>
>> On 14 Feb 2025, at 09:12, Namhyung Kim <namhyung@kernel.org> wrote:
>>
>> Hello,
>>
>> On Thu, Feb 13, 2025 at 11:11:01PM +0800, Yangyu Chen wrote:
>>> This patchset adds the perf JSON files for the Cortex-A720 and Cortex-A520
>>> processors. Some events have been tested on Raxda Orion 6 with Cix P1 SoC
>>> (8xA720 + 4xA520) running mainline Kernel with ACPI mode.
>>
>> I'm curious how the name of PMUs look like. It is cortex_a720 (or a520)?
>
> The name of PMUs comes from Arm's documentation. I have included these
> links in each patch.
>
>> I remember there's a logic to check the length of hex digits at the end.
>>
>
> Could you provide more details about this?
>
>> Ian, are you ok with this?
>>
I think they wouldn't be merged because they're core PMUs, so should be
fine? Even though they would otherwise be merged because they're more
than 3 hex digits.
>> Thanks,
>> Namhyung
>>
>>>
>>> Yangyu Chen (2):
>>> perf vendor events arm64: Add Cortex-A720 events/metrics
>>> perf vendor events arm64: Add Cortex-A520 events/metrics
>>>
>>> .../arch/arm64/arm/cortex-a520/bus.json | 26 ++
>>> .../arch/arm64/arm/cortex-a520/exception.json | 18 +
>>> .../arm64/arm/cortex-a520/fp_operation.json | 14 +
>>> .../arch/arm64/arm/cortex-a520/general.json | 6 +
>>> .../arch/arm64/arm/cortex-a520/l1d_cache.json | 50 ++
>>> .../arch/arm64/arm/cortex-a520/l1i_cache.json | 14 +
>>> .../arch/arm64/arm/cortex-a520/l2_cache.json | 46 ++
>>> .../arch/arm64/arm/cortex-a520/l3_cache.json | 21 +
>>> .../arch/arm64/arm/cortex-a520/ll_cache.json | 10 +
>>> .../arch/arm64/arm/cortex-a520/memory.json | 58 +++
>>> .../arch/arm64/arm/cortex-a520/metrics.json | 373 +++++++++++++++
>>> .../arch/arm64/arm/cortex-a520/pmu.json | 8 +
>>> .../arch/arm64/arm/cortex-a520/retired.json | 90 ++++
>>> .../arm64/arm/cortex-a520/spec_operation.json | 70 +++
>>> .../arch/arm64/arm/cortex-a520/stall.json | 82 ++++
>>> .../arch/arm64/arm/cortex-a520/sve.json | 22 +
>>> .../arch/arm64/arm/cortex-a520/tlb.json | 78 ++++
>>> .../arch/arm64/arm/cortex-a520/trace.json | 32 ++
>>> .../arch/arm64/arm/cortex-a720/bus.json | 18 +
>>> .../arch/arm64/arm/cortex-a720/exception.json | 62 +++
>>> .../arm64/arm/cortex-a720/fp_operation.json | 22 +
>>> .../arch/arm64/arm/cortex-a720/general.json | 10 +
>>> .../arch/arm64/arm/cortex-a720/l1d_cache.json | 50 ++
>>> .../arch/arm64/arm/cortex-a720/l1i_cache.json | 14 +
>>> .../arch/arm64/arm/cortex-a720/l2_cache.json | 62 +++
>>> .../arch/arm64/arm/cortex-a720/l3_cache.json | 22 +
>>> .../arch/arm64/arm/cortex-a720/ll_cache.json | 10 +
>>> .../arch/arm64/arm/cortex-a720/memory.json | 54 +++
>>> .../arch/arm64/arm/cortex-a720/metrics.json | 436 ++++++++++++++++++
>>> .../arch/arm64/arm/cortex-a720/pmu.json | 8 +
>>> .../arch/arm64/arm/cortex-a720/retired.json | 90 ++++
>>> .../arch/arm64/arm/cortex-a720/spe.json | 42 ++
>>> .../arm64/arm/cortex-a720/spec_operation.json | 90 ++++
>>> .../arch/arm64/arm/cortex-a720/stall.json | 82 ++++
>>> .../arch/arm64/arm/cortex-a720/sve.json | 50 ++
>>> .../arch/arm64/arm/cortex-a720/tlb.json | 74 +++
>>> .../arch/arm64/arm/cortex-a720/trace.json | 32 ++
>>> .../arch/arm64/common-and-microarch.json | 15 +
>>> tools/perf/pmu-events/arch/arm64/mapfile.csv | 2 +
>>> 39 files changed, 2263 insertions(+)
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json
>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json
>>>
>>> --
>>> 2.47.2
>>>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics
2025-02-14 10:02 ` James Clark
@ 2025-02-18 0:41 ` Ian Rogers
2025-02-18 9:30 ` James Clark
0 siblings, 1 reply; 16+ messages in thread
From: Ian Rogers @ 2025-02-18 0:41 UTC (permalink / raw)
To: James Clark
Cc: Yangyu Chen, Namhyung Kim, linux-perf-users, John Garry,
Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Jiri Olsa, Adrian Hunter, Liang Kan, Yoshihiro Furudera,
linux-arm-kernel, linux-kernel
On Fri, Feb 14, 2025 at 2:02 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 14/02/2025 5:49 am, Yangyu Chen wrote:
> >
> >
> >> On 14 Feb 2025, at 09:12, Namhyung Kim <namhyung@kernel.org> wrote:
> >>
> >> Hello,
> >>
> >> On Thu, Feb 13, 2025 at 11:11:01PM +0800, Yangyu Chen wrote:
> >>> This patchset adds the perf JSON files for the Cortex-A720 and Cortex-A520
> >>> processors. Some events have been tested on Raxda Orion 6 with Cix P1 SoC
> >>> (8xA720 + 4xA520) running mainline Kernel with ACPI mode.
> >>
> >> I'm curious how the name of PMUs look like. It is cortex_a720 (or a520)?
> >
> > The name of PMUs comes from Arm's documentation. I have included these
> > links in each patch.
> >
> >> I remember there's a logic to check the length of hex digits at the end.
> >>
> >
> > Could you provide more details about this?
> >
> >> Ian, are you ok with this?
> >>
>
> I think they wouldn't be merged because they're core PMUs, so should be
> fine? Even though they would otherwise be merged because they're more
> than 3 hex digits.
Do we know the PMU names? If they are cortex_a520 and cortex_a720 then
this comment at least reads a little stale:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n76
```
/*
* There is a '_{num}' suffix. For decimal suffixes any length
* will do, for hexadecimal ensure more than 2 hex digits so
* that S390's cpum_cf PMU doesn't match.
*/
```
James is right that core PMUs aren't put on the same list as uncore/other PMUs.
Thanks,
Ian
> >> Thanks,
> >> Namhyung
> >>
> >>>
> >>> Yangyu Chen (2):
> >>> perf vendor events arm64: Add Cortex-A720 events/metrics
> >>> perf vendor events arm64: Add Cortex-A520 events/metrics
> >>>
> >>> .../arch/arm64/arm/cortex-a520/bus.json | 26 ++
> >>> .../arch/arm64/arm/cortex-a520/exception.json | 18 +
> >>> .../arm64/arm/cortex-a520/fp_operation.json | 14 +
> >>> .../arch/arm64/arm/cortex-a520/general.json | 6 +
> >>> .../arch/arm64/arm/cortex-a520/l1d_cache.json | 50 ++
> >>> .../arch/arm64/arm/cortex-a520/l1i_cache.json | 14 +
> >>> .../arch/arm64/arm/cortex-a520/l2_cache.json | 46 ++
> >>> .../arch/arm64/arm/cortex-a520/l3_cache.json | 21 +
> >>> .../arch/arm64/arm/cortex-a520/ll_cache.json | 10 +
> >>> .../arch/arm64/arm/cortex-a520/memory.json | 58 +++
> >>> .../arch/arm64/arm/cortex-a520/metrics.json | 373 +++++++++++++++
> >>> .../arch/arm64/arm/cortex-a520/pmu.json | 8 +
> >>> .../arch/arm64/arm/cortex-a520/retired.json | 90 ++++
> >>> .../arm64/arm/cortex-a520/spec_operation.json | 70 +++
> >>> .../arch/arm64/arm/cortex-a520/stall.json | 82 ++++
> >>> .../arch/arm64/arm/cortex-a520/sve.json | 22 +
> >>> .../arch/arm64/arm/cortex-a520/tlb.json | 78 ++++
> >>> .../arch/arm64/arm/cortex-a520/trace.json | 32 ++
> >>> .../arch/arm64/arm/cortex-a720/bus.json | 18 +
> >>> .../arch/arm64/arm/cortex-a720/exception.json | 62 +++
> >>> .../arm64/arm/cortex-a720/fp_operation.json | 22 +
> >>> .../arch/arm64/arm/cortex-a720/general.json | 10 +
> >>> .../arch/arm64/arm/cortex-a720/l1d_cache.json | 50 ++
> >>> .../arch/arm64/arm/cortex-a720/l1i_cache.json | 14 +
> >>> .../arch/arm64/arm/cortex-a720/l2_cache.json | 62 +++
> >>> .../arch/arm64/arm/cortex-a720/l3_cache.json | 22 +
> >>> .../arch/arm64/arm/cortex-a720/ll_cache.json | 10 +
> >>> .../arch/arm64/arm/cortex-a720/memory.json | 54 +++
> >>> .../arch/arm64/arm/cortex-a720/metrics.json | 436 ++++++++++++++++++
> >>> .../arch/arm64/arm/cortex-a720/pmu.json | 8 +
> >>> .../arch/arm64/arm/cortex-a720/retired.json | 90 ++++
> >>> .../arch/arm64/arm/cortex-a720/spe.json | 42 ++
> >>> .../arm64/arm/cortex-a720/spec_operation.json | 90 ++++
> >>> .../arch/arm64/arm/cortex-a720/stall.json | 82 ++++
> >>> .../arch/arm64/arm/cortex-a720/sve.json | 50 ++
> >>> .../arch/arm64/arm/cortex-a720/tlb.json | 74 +++
> >>> .../arch/arm64/arm/cortex-a720/trace.json | 32 ++
> >>> .../arch/arm64/common-and-microarch.json | 15 +
> >>> tools/perf/pmu-events/arch/arm64/mapfile.csv | 2 +
> >>> 39 files changed, 2263 insertions(+)
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json
> >>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json
> >>>
> >>> --
> >>> 2.47.2
> >>>
> >
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics
2025-02-18 0:41 ` Ian Rogers
@ 2025-02-18 9:30 ` James Clark
2025-02-18 22:19 ` Namhyung Kim
0 siblings, 1 reply; 16+ messages in thread
From: James Clark @ 2025-02-18 9:30 UTC (permalink / raw)
To: Ian Rogers
Cc: Yangyu Chen, Namhyung Kim, linux-perf-users, John Garry,
Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Jiri Olsa, Adrian Hunter, Liang Kan, Yoshihiro Furudera,
linux-arm-kernel, linux-kernel
On 18/02/2025 12:41 am, Ian Rogers wrote:
> On Fri, Feb 14, 2025 at 2:02 AM James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 14/02/2025 5:49 am, Yangyu Chen wrote:
>>>
>>>
>>>> On 14 Feb 2025, at 09:12, Namhyung Kim <namhyung@kernel.org> wrote:
>>>>
>>>> Hello,
>>>>
>>>> On Thu, Feb 13, 2025 at 11:11:01PM +0800, Yangyu Chen wrote:
>>>>> This patchset adds the perf JSON files for the Cortex-A720 and Cortex-A520
>>>>> processors. Some events have been tested on Raxda Orion 6 with Cix P1 SoC
>>>>> (8xA720 + 4xA520) running mainline Kernel with ACPI mode.
>>>>
>>>> I'm curious how the name of PMUs look like. It is cortex_a720 (or a520)?
>>>
>>> The name of PMUs comes from Arm's documentation. I have included these
>>> links in each patch.
>>>
>>>> I remember there's a logic to check the length of hex digits at the end.
>>>>
>>>
>>> Could you provide more details about this?
>>>
>>>> Ian, are you ok with this?
>>>>
>>
>> I think they wouldn't be merged because they're core PMUs, so should be
>> fine? Even though they would otherwise be merged because they're more
>> than 3 hex digits.
>
> Do we know the PMU names? If they are cortex_a520 and cortex_a720 then
It will be "armv9_cortex_a720" from this line:
PMUV3_INIT_SIMPLE(armv9_cortex_a720)
> this comment at least reads a little stale:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n76
> ```
> /*
> * There is a '_{num}' suffix. For decimal suffixes any length
> * will do, for hexadecimal ensure more than 2 hex digits so
> * that S390's cpum_cf PMU doesn't match.
> */
> ```
> James is right that core PMUs aren't put on the same list as uncore/other PMUs.
>
> Thanks,
> Ian
>
>>>> Thanks,
>>>> Namhyung
>>>>
>>>>>
>>>>> Yangyu Chen (2):
>>>>> perf vendor events arm64: Add Cortex-A720 events/metrics
>>>>> perf vendor events arm64: Add Cortex-A520 events/metrics
>>>>>
>>>>> .../arch/arm64/arm/cortex-a520/bus.json | 26 ++
>>>>> .../arch/arm64/arm/cortex-a520/exception.json | 18 +
>>>>> .../arm64/arm/cortex-a520/fp_operation.json | 14 +
>>>>> .../arch/arm64/arm/cortex-a520/general.json | 6 +
>>>>> .../arch/arm64/arm/cortex-a520/l1d_cache.json | 50 ++
>>>>> .../arch/arm64/arm/cortex-a520/l1i_cache.json | 14 +
>>>>> .../arch/arm64/arm/cortex-a520/l2_cache.json | 46 ++
>>>>> .../arch/arm64/arm/cortex-a520/l3_cache.json | 21 +
>>>>> .../arch/arm64/arm/cortex-a520/ll_cache.json | 10 +
>>>>> .../arch/arm64/arm/cortex-a520/memory.json | 58 +++
>>>>> .../arch/arm64/arm/cortex-a520/metrics.json | 373 +++++++++++++++
>>>>> .../arch/arm64/arm/cortex-a520/pmu.json | 8 +
>>>>> .../arch/arm64/arm/cortex-a520/retired.json | 90 ++++
>>>>> .../arm64/arm/cortex-a520/spec_operation.json | 70 +++
>>>>> .../arch/arm64/arm/cortex-a520/stall.json | 82 ++++
>>>>> .../arch/arm64/arm/cortex-a520/sve.json | 22 +
>>>>> .../arch/arm64/arm/cortex-a520/tlb.json | 78 ++++
>>>>> .../arch/arm64/arm/cortex-a520/trace.json | 32 ++
>>>>> .../arch/arm64/arm/cortex-a720/bus.json | 18 +
>>>>> .../arch/arm64/arm/cortex-a720/exception.json | 62 +++
>>>>> .../arm64/arm/cortex-a720/fp_operation.json | 22 +
>>>>> .../arch/arm64/arm/cortex-a720/general.json | 10 +
>>>>> .../arch/arm64/arm/cortex-a720/l1d_cache.json | 50 ++
>>>>> .../arch/arm64/arm/cortex-a720/l1i_cache.json | 14 +
>>>>> .../arch/arm64/arm/cortex-a720/l2_cache.json | 62 +++
>>>>> .../arch/arm64/arm/cortex-a720/l3_cache.json | 22 +
>>>>> .../arch/arm64/arm/cortex-a720/ll_cache.json | 10 +
>>>>> .../arch/arm64/arm/cortex-a720/memory.json | 54 +++
>>>>> .../arch/arm64/arm/cortex-a720/metrics.json | 436 ++++++++++++++++++
>>>>> .../arch/arm64/arm/cortex-a720/pmu.json | 8 +
>>>>> .../arch/arm64/arm/cortex-a720/retired.json | 90 ++++
>>>>> .../arch/arm64/arm/cortex-a720/spe.json | 42 ++
>>>>> .../arm64/arm/cortex-a720/spec_operation.json | 90 ++++
>>>>> .../arch/arm64/arm/cortex-a720/stall.json | 82 ++++
>>>>> .../arch/arm64/arm/cortex-a720/sve.json | 50 ++
>>>>> .../arch/arm64/arm/cortex-a720/tlb.json | 74 +++
>>>>> .../arch/arm64/arm/cortex-a720/trace.json | 32 ++
>>>>> .../arch/arm64/common-and-microarch.json | 15 +
>>>>> tools/perf/pmu-events/arch/arm64/mapfile.csv | 2 +
>>>>> 39 files changed, 2263 insertions(+)
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/bus.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/exception.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/fp_operation.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/general.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1d_cache.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l1i_cache.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l2_cache.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/l3_cache.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/ll_cache.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/memory.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/metrics.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/pmu.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/retired.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/spec_operation.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/stall.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/sve.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/tlb.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a520/trace.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/bus.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/exception.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/fp_operation.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/general.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1d_cache.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l1i_cache.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l2_cache.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/l3_cache.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/ll_cache.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/memory.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/metrics.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/pmu.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/retired.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spe.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/spec_operation.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/stall.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/sve.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/tlb.json
>>>>> create mode 100644 tools/perf/pmu-events/arch/arm64/arm/cortex-a720/trace.json
>>>>>
>>>>> --
>>>>> 2.47.2
>>>>>
>>>
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics
2025-02-18 9:30 ` James Clark
@ 2025-02-18 22:19 ` Namhyung Kim
2025-02-18 22:33 ` Ian Rogers
0 siblings, 1 reply; 16+ messages in thread
From: Namhyung Kim @ 2025-02-18 22:19 UTC (permalink / raw)
To: James Clark
Cc: Ian Rogers, Yangyu Chen, linux-perf-users, John Garry,
Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Jiri Olsa, Adrian Hunter, Liang Kan, Yoshihiro Furudera,
linux-arm-kernel, linux-kernel
On Tue, Feb 18, 2025 at 09:30:23AM +0000, James Clark wrote:
>
>
> On 18/02/2025 12:41 am, Ian Rogers wrote:
> > On Fri, Feb 14, 2025 at 2:02 AM James Clark <james.clark@linaro.org> wrote:
> > >
> > >
> > >
> > > On 14/02/2025 5:49 am, Yangyu Chen wrote:
> > > >
> > > >
> > > > > On 14 Feb 2025, at 09:12, Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > Hello,
> > > > >
> > > > > On Thu, Feb 13, 2025 at 11:11:01PM +0800, Yangyu Chen wrote:
> > > > > > This patchset adds the perf JSON files for the Cortex-A720 and Cortex-A520
> > > > > > processors. Some events have been tested on Raxda Orion 6 with Cix P1 SoC
> > > > > > (8xA720 + 4xA520) running mainline Kernel with ACPI mode.
> > > > >
> > > > > I'm curious how the name of PMUs look like. It is cortex_a720 (or a520)?
> > > >
> > > > The name of PMUs comes from Arm's documentation. I have included these
> > > > links in each patch.
> > > >
> > > > > I remember there's a logic to check the length of hex digits at the end.
> > > > >
> > > >
> > > > Could you provide more details about this?
> > > >
> > > > > Ian, are you ok with this?
> > > > >
> > >
> > > I think they wouldn't be merged because they're core PMUs, so should be
> > > fine? Even though they would otherwise be merged because they're more
> > > than 3 hex digits.
> >
> > Do we know the PMU names? If they are cortex_a520 and cortex_a720 then
>
> It will be "armv9_cortex_a720" from this line:
>
> PMUV3_INIT_SIMPLE(armv9_cortex_a720)
I see, thanks!
>
> > this comment at least reads a little stale:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n76
> > ```
> > /*
> > * There is a '_{num}' suffix. For decimal suffixes any length
> > * will do, for hexadecimal ensure more than 2 hex digits so
> > * that S390's cpum_cf PMU doesn't match.
> > */
> > ```
> > James is right that core PMUs aren't put on the same list as uncore/other PMUs.
Ok, then I guess we're good.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics
2025-02-18 22:19 ` Namhyung Kim
@ 2025-02-18 22:33 ` Ian Rogers
2025-02-19 15:25 ` James Clark
` (2 more replies)
0 siblings, 3 replies; 16+ messages in thread
From: Ian Rogers @ 2025-02-18 22:33 UTC (permalink / raw)
To: Namhyung Kim
Cc: James Clark, Yangyu Chen, linux-perf-users, John Garry,
Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Jiri Olsa, Adrian Hunter, Liang Kan, Yoshihiro Furudera,
linux-arm-kernel, linux-kernel
On Tue, Feb 18, 2025 at 2:19 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Tue, Feb 18, 2025 at 09:30:23AM +0000, James Clark wrote:
> >
> >
> > On 18/02/2025 12:41 am, Ian Rogers wrote:
> > > On Fri, Feb 14, 2025 at 2:02 AM James Clark <james.clark@linaro.org> wrote:
> > > >
> > > >
> > > >
> > > > On 14/02/2025 5:49 am, Yangyu Chen wrote:
> > > > >
> > > > >
> > > > > > On 14 Feb 2025, at 09:12, Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > Hello,
> > > > > >
> > > > > > On Thu, Feb 13, 2025 at 11:11:01PM +0800, Yangyu Chen wrote:
> > > > > > > This patchset adds the perf JSON files for the Cortex-A720 and Cortex-A520
> > > > > > > processors. Some events have been tested on Raxda Orion 6 with Cix P1 SoC
> > > > > > > (8xA720 + 4xA520) running mainline Kernel with ACPI mode.
> > > > > >
> > > > > > I'm curious how the name of PMUs look like. It is cortex_a720 (or a520)?
> > > > >
> > > > > The name of PMUs comes from Arm's documentation. I have included these
> > > > > links in each patch.
> > > > >
> > > > > > I remember there's a logic to check the length of hex digits at the end.
> > > > > >
> > > > >
> > > > > Could you provide more details about this?
> > > > >
> > > > > > Ian, are you ok with this?
> > > > > >
> > > >
> > > > I think they wouldn't be merged because they're core PMUs, so should be
> > > > fine? Even though they would otherwise be merged because they're more
> > > > than 3 hex digits.
> > >
> > > Do we know the PMU names? If they are cortex_a520 and cortex_a720 then
> >
> > It will be "armv9_cortex_a720" from this line:
> >
> > PMUV3_INIT_SIMPLE(armv9_cortex_a720)
>
> I see, thanks!
>
> >
> > > this comment at least reads a little stale:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n76
> > > ```
> > > /*
> > > * There is a '_{num}' suffix. For decimal suffixes any length
> > > * will do, for hexadecimal ensure more than 2 hex digits so
> > > * that S390's cpum_cf PMU doesn't match.
> > > */
> > > ```
> > > James is right that core PMUs aren't put on the same list as uncore/other PMUs.
>
> Ok, then I guess we're good.
I think you may be able to do things that look odd, like today the
"i915" PMU can be called just "i", I think the a520/a720 naming will
allow "armv9_cortex/cycles/" as an event name, then open it on two
PMUs if they are present. We may only show one PMU in perf list as
that code I think assumes they're the same PMU as they only differ by
suffix:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n384
I can imagine aggregation possibly being broken, but I think that
works off the number of PMUs not the names of the PMUs, so it should
be okay. Probably the only thing broken that matter is perf list when
you have a BIG.little system with a520 and a720, this may be broken
with say a a53 and a72 today as both of those suffix lengths are >2,
but maybe they use the "armv8._pmuv3_0", "armv8._pmuv3_1", etc. naming
convention. I suspect the >2 here:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n80
would still work and be correct if it were >4. If that changes then
this will also need to change:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/Documentation/ABI/testing/sysfs-bus-event_source-devices?h=perf-tools-next#n12
Thanks,
Ian
>
> Thanks,
> Namhyung
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics
2025-02-18 22:33 ` Ian Rogers
@ 2025-02-19 15:25 ` James Clark
2025-02-19 18:37 ` Ian Rogers
2025-02-20 3:37 ` Yangyu Chen
[not found] ` <tencent_EDA4AFD185EF51104EDBCEB109D720862B05@qq.com>
2 siblings, 1 reply; 16+ messages in thread
From: James Clark @ 2025-02-19 15:25 UTC (permalink / raw)
To: Ian Rogers, Namhyung Kim
Cc: Yangyu Chen, linux-perf-users, John Garry, Will Deacon,
Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Jiri Olsa, Adrian Hunter, Liang Kan, Yoshihiro Furudera,
linux-arm-kernel, linux-kernel
On 18/02/2025 10:33 pm, Ian Rogers wrote:
> On Tue, Feb 18, 2025 at 2:19 PM Namhyung Kim <namhyung@kernel.org> wrote:
>>
>> On Tue, Feb 18, 2025 at 09:30:23AM +0000, James Clark wrote:
>>>
>>>
>>> On 18/02/2025 12:41 am, Ian Rogers wrote:
>>>> On Fri, Feb 14, 2025 at 2:02 AM James Clark <james.clark@linaro.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 14/02/2025 5:49 am, Yangyu Chen wrote:
>>>>>>
>>>>>>
>>>>>>> On 14 Feb 2025, at 09:12, Namhyung Kim <namhyung@kernel.org> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> On Thu, Feb 13, 2025 at 11:11:01PM +0800, Yangyu Chen wrote:
>>>>>>>> This patchset adds the perf JSON files for the Cortex-A720 and Cortex-A520
>>>>>>>> processors. Some events have been tested on Raxda Orion 6 with Cix P1 SoC
>>>>>>>> (8xA720 + 4xA520) running mainline Kernel with ACPI mode.
>>>>>>>
>>>>>>> I'm curious how the name of PMUs look like. It is cortex_a720 (or a520)?
>>>>>>
>>>>>> The name of PMUs comes from Arm's documentation. I have included these
>>>>>> links in each patch.
>>>>>>
>>>>>>> I remember there's a logic to check the length of hex digits at the end.
>>>>>>>
>>>>>>
>>>>>> Could you provide more details about this?
>>>>>>
>>>>>>> Ian, are you ok with this?
>>>>>>>
>>>>>
>>>>> I think they wouldn't be merged because they're core PMUs, so should be
>>>>> fine? Even though they would otherwise be merged because they're more
>>>>> than 3 hex digits.
>>>>
>>>> Do we know the PMU names? If they are cortex_a520 and cortex_a720 then
>>>
>>> It will be "armv9_cortex_a720" from this line:
>>>
>>> PMUV3_INIT_SIMPLE(armv9_cortex_a720)
>>
>> I see, thanks!
>>
>>>
>>>> this comment at least reads a little stale:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n76
>>>> ```
>>>> /*
>>>> * There is a '_{num}' suffix. For decimal suffixes any length
>>>> * will do, for hexadecimal ensure more than 2 hex digits so
>>>> * that S390's cpum_cf PMU doesn't match.
>>>> */
>>>> ```
>>>> James is right that core PMUs aren't put on the same list as uncore/other PMUs.
>>
>> Ok, then I guess we're good.
>
> I think you may be able to do things that look odd, like today the
> "i915" PMU can be called just "i", I think the a520/a720 naming will
> allow "armv9_cortex/cycles/" as an event name, then open it on two
> PMUs if they are present.
I assumed that was the intended behavior. It seems fairly useful to be
able to open on ones with common prefixes.
> We may only show one PMU in perf list as
> that code I think assumes they're the same PMU as they only differ by
> suffix:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n384
Yeah that is the case. I didn't realise it when looking at the previous
fixes to keep the suffixes in perf stat output.
> I can imagine aggregation possibly being broken, but I think that
> works off the number of PMUs not the names of the PMUs, so it should
> be okay. Probably the only thing broken that matter is perf list when
> you have a BIG.little system with a520 and a720, this may be broken
> with say a a53 and a72 today as both of those suffix lengths are >2,
> but maybe they use the "armv8._pmuv3_0", "armv8._pmuv3_1", etc. naming
> convention. I suspect the >2 here:
Also the case for a53 and a72 right now. Even "perf list --unit
armv8_cortex_a57" doesn't work because we deduplicate before filtering.
Adding -v fixes it though because that disables deduplication. Perhaps
we can change it to disable it with the --unit argument?
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n80
> would still work and be correct if it were >4. If that changes then
> this will also need to change:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/Documentation/ABI/testing/sysfs-bus-event_source-devices?h=perf-tools-next#n12
That could be an easy fix. If >4 is enough to still get rid of all the
uncore duplicates I can make the change?
>
> Thanks,
> Ian
>
>>
>> Thanks,
>> Namhyung
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics
2025-02-19 15:25 ` James Clark
@ 2025-02-19 18:37 ` Ian Rogers
0 siblings, 0 replies; 16+ messages in thread
From: Ian Rogers @ 2025-02-19 18:37 UTC (permalink / raw)
To: James Clark
Cc: Namhyung Kim, Yangyu Chen, linux-perf-users, John Garry,
Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Jiri Olsa, Adrian Hunter, Liang Kan, Yoshihiro Furudera,
linux-arm-kernel, linux-kernel
On Wed, Feb 19, 2025 at 7:25 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 18/02/2025 10:33 pm, Ian Rogers wrote:
> > On Tue, Feb 18, 2025 at 2:19 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >>
> >> On Tue, Feb 18, 2025 at 09:30:23AM +0000, James Clark wrote:
> >>>
> >>>
> >>> On 18/02/2025 12:41 am, Ian Rogers wrote:
> >>>> On Fri, Feb 14, 2025 at 2:02 AM James Clark <james.clark@linaro.org> wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>> On 14/02/2025 5:49 am, Yangyu Chen wrote:
> >>>>>>
> >>>>>>
> >>>>>>> On 14 Feb 2025, at 09:12, Namhyung Kim <namhyung@kernel.org> wrote:
> >>>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> On Thu, Feb 13, 2025 at 11:11:01PM +0800, Yangyu Chen wrote:
> >>>>>>>> This patchset adds the perf JSON files for the Cortex-A720 and Cortex-A520
> >>>>>>>> processors. Some events have been tested on Raxda Orion 6 with Cix P1 SoC
> >>>>>>>> (8xA720 + 4xA520) running mainline Kernel with ACPI mode.
> >>>>>>>
> >>>>>>> I'm curious how the name of PMUs look like. It is cortex_a720 (or a520)?
> >>>>>>
> >>>>>> The name of PMUs comes from Arm's documentation. I have included these
> >>>>>> links in each patch.
> >>>>>>
> >>>>>>> I remember there's a logic to check the length of hex digits at the end.
> >>>>>>>
> >>>>>>
> >>>>>> Could you provide more details about this?
> >>>>>>
> >>>>>>> Ian, are you ok with this?
> >>>>>>>
> >>>>>
> >>>>> I think they wouldn't be merged because they're core PMUs, so should be
> >>>>> fine? Even though they would otherwise be merged because they're more
> >>>>> than 3 hex digits.
> >>>>
> >>>> Do we know the PMU names? If they are cortex_a520 and cortex_a720 then
> >>>
> >>> It will be "armv9_cortex_a720" from this line:
> >>>
> >>> PMUV3_INIT_SIMPLE(armv9_cortex_a720)
> >>
> >> I see, thanks!
> >>
> >>>
> >>>> this comment at least reads a little stale:
> >>>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n76
> >>>> ```
> >>>> /*
> >>>> * There is a '_{num}' suffix. For decimal suffixes any length
> >>>> * will do, for hexadecimal ensure more than 2 hex digits so
> >>>> * that S390's cpum_cf PMU doesn't match.
> >>>> */
> >>>> ```
> >>>> James is right that core PMUs aren't put on the same list as uncore/other PMUs.
> >>
> >> Ok, then I guess we're good.
> >
> > I think you may be able to do things that look odd, like today the
> > "i915" PMU can be called just "i", I think the a520/a720 naming will
> > allow "armv9_cortex/cycles/" as an event name, then open it on two
> > PMUs if they are present.
>
> I assumed that was the intended behavior. It seems fairly useful to be
> able to open on ones with common prefixes.
>
> > We may only show one PMU in perf list as
> > that code I think assumes they're the same PMU as they only differ by
> > suffix:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n384
>
> Yeah that is the case. I didn't realise it when looking at the previous
> fixes to keep the suffixes in perf stat output.
>
> > I can imagine aggregation possibly being broken, but I think that
> > works off the number of PMUs not the names of the PMUs, so it should
> > be okay. Probably the only thing broken that matter is perf list when
> > you have a BIG.little system with a520 and a720, this may be broken
> > with say a a53 and a72 today as both of those suffix lengths are >2,
> > but maybe they use the "armv8._pmuv3_0", "armv8._pmuv3_1", etc. naming
> > convention. I suspect the >2 here:
>
> Also the case for a53 and a72 right now. Even "perf list --unit
> armv8_cortex_a57" doesn't work because we deduplicate before filtering.
> Adding -v fixes it though because that disables deduplication. Perhaps
> we can change it to disable it with the --unit argument?
>
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n80
> > would still work and be correct if it were >4. If that changes then
> > this will also need to change:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/Documentation/ABI/testing/sysfs-bus-event_source-devices?h=perf-tools-next#n12
>
> That could be an easy fix. If >4 is enough to still get rid of all the
> uncore duplicates I can make the change?
The change would be great. I think it is sufficient and doesn't break
the suffix:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/drivers/perf/arm_dmc620_pmu.c?h=perf-tools-next#n710
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/drivers/perf/arm_smmuv3_pmu.c?h=perf-tools-next#n921
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/drivers/perf/marvell_cn10k_ddr_pmu.c?h=perf-tools-next#n1062
it is hard to tell as it is going to depend on the memory addresses
placed in the PMU names. Perhaps you can clear this up and add zero
padding in the drivers if the suffix is <=4 ?
No documentation here:
https://www.kernel.org/doc/Documentation/admin-guide/perf/mrvl-odyssey-ddr-pmu.rst
On a test machine I see in /sys/devices :
..
arm_dmc620_10008c400
..
on a different one I see:
..
smmuv3_pmcg_20528a2
..
so >4 but this is an ARM specific issue as far as I can tell, so you'd
be better placed to judge correctness than me.
Thanks,
Ian
> >
> > Thanks,
> > Ian
> >
> >>
> >> Thanks,
> >> Namhyung
> >>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics
2025-02-18 22:33 ` Ian Rogers
2025-02-19 15:25 ` James Clark
@ 2025-02-20 3:37 ` Yangyu Chen
[not found] ` <tencent_EDA4AFD185EF51104EDBCEB109D720862B05@qq.com>
2 siblings, 0 replies; 16+ messages in thread
From: Yangyu Chen @ 2025-02-20 3:37 UTC (permalink / raw)
To: Ian Rogers
Cc: Namhyung Kim, James Clark, linux-perf-users, John Garry,
Will Deacon, Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Jiri Olsa, Adrian Hunter, Liang Kan, Yoshihiro Furudera,
linux-arm-kernel, linux-kernel
> On 19 Feb 2025, at 06:33, Ian Rogers <irogers@google.com> wrote:
>
> On Tue, Feb 18, 2025 at 2:19 PM Namhyung Kim <namhyung@kernel.org> wrote:
>>
>> On Tue, Feb 18, 2025 at 09:30:23AM +0000, James Clark wrote:
>>>
>>>
>>> On 18/02/2025 12:41 am, Ian Rogers wrote:
>>>> On Fri, Feb 14, 2025 at 2:02 AM James Clark <james.clark@linaro.org> wrote:
>>>>>
>>>>>
>>>>>
>>>>> On 14/02/2025 5:49 am, Yangyu Chen wrote:
>>>>>>
>>>>>>
>>>>>>> On 14 Feb 2025, at 09:12, Namhyung Kim <namhyung@kernel.org> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> On Thu, Feb 13, 2025 at 11:11:01PM +0800, Yangyu Chen wrote:
>>>>>>>> This patchset adds the perf JSON files for the Cortex-A720 and Cortex-A520
>>>>>>>> processors. Some events have been tested on Raxda Orion 6 with Cix P1 SoC
>>>>>>>> (8xA720 + 4xA520) running mainline Kernel with ACPI mode.
>>>>>>>
>>>>>>> I'm curious how the name of PMUs look like. It is cortex_a720 (or a520)?
>>>>>>
>>>>>> The name of PMUs comes from Arm's documentation. I have included these
>>>>>> links in each patch.
>>>>>>
>>>>>>> I remember there's a logic to check the length of hex digits at the end.
>>>>>>>
>>>>>>
>>>>>> Could you provide more details about this?
>>>>>>
>>>>>>> Ian, are you ok with this?
>>>>>>>
>>>>>
>>>>> I think they wouldn't be merged because they're core PMUs, so should be
>>>>> fine? Even though they would otherwise be merged because they're more
>>>>> than 3 hex digits.
>>>>
>>>> Do we know the PMU names? If they are cortex_a520 and cortex_a720 then
>>>
>>> It will be "armv9_cortex_a720" from this line:
>>>
>>> PMUV3_INIT_SIMPLE(armv9_cortex_a720)
>>
>> I see, thanks!
>>
>>>
>>>> this comment at least reads a little stale:
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n76
>>>> ```
>>>> /*
>>>> * There is a '_{num}' suffix. For decimal suffixes any length
>>>> * will do, for hexadecimal ensure more than 2 hex digits so
>>>> * that S390's cpum_cf PMU doesn't match.
>>>> */
>>>> ```
>>>> James is right that core PMUs aren't put on the same list as uncore/other PMUs.
>>
>> Ok, then I guess we're good.
>
> I think you may be able to do things that look odd, like today the
> "i915" PMU can be called just "i", I think the a520/a720 naming will
> allow "armv9_cortex/cycles/" as an event name, then open it on two
> PMUs if they are present. We may only show one PMU in perf list as
> that code I think assumes they're the same PMU as they only differ by
> suffix:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n384
> I can imagine aggregation possibly being broken, but I think that
> works off the number of PMUs not the names of the PMUs, so it should
> be okay. Probably the only thing broken that matter is perf list when
> you have a BIG.little system with a520 and a720, this may be broken
> with say a a53 and a72 today as both of those suffix lengths are >2,
> but maybe they use the "armv8._pmuv3_0", "armv8._pmuv3_1", etc. naming
> convention. I suspect the >2 here:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n80
> would still work and be correct if it were >4. If that changes then
> this will also need to change:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/Documentation/ABI/testing/sysfs-bus-event_source-devices?h=perf-tools-next#n12
>
> Thanks,
> Ian
>
On my system, the names of PMUs are `armv8_pmuv3_0` and
`armv8_pmuv3_1`:
```
$ ls /sys/bus/event_source/devices/
armv8_pmuv3_0 armv8_pmuv3_1 breakpoint kprobe software tracepoint uprobe
```
I searched for ACPI DSDT on my platform, but there's no mention of
a720 or a520. I haven't delved into the PMU kernel driver yet.
Additionally, there's a more significant problem for aarch64
BIG.little platforms when two or more types of cores don't have the
same PMUs. The perf list can only display the core PMUs on core0
unless we use the PERF_CPUID env to override it. This is because
perf will only probe the first MIDR here:
https://github.com/torvalds/linux/blob/87a132e73910e8689902aed7f2fc229d6908383b/tools/perf/arch/arm64/util/header.c#L60
However, I think this doesn't block this patch for adding events and metrics?
Thanks,
Yangyu Chen
>>
>> Thanks,
>> Namhyung
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics
[not found] ` <tencent_EDA4AFD185EF51104EDBCEB109D720862B05@qq.com>
@ 2025-02-20 14:37 ` James Clark
0 siblings, 0 replies; 16+ messages in thread
From: James Clark @ 2025-02-20 14:37 UTC (permalink / raw)
To: Yangyu Chen, Ian Rogers
Cc: Namhyung Kim, linux-perf-users, John Garry, Will Deacon,
Mike Leach, Leo Yan, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Jiri Olsa, Adrian Hunter, Liang Kan, Yoshihiro Furudera,
linux-arm-kernel, linux-kernel
On 20/02/2025 3:33 am, Yangyu Chen wrote:
>
>
>> On 19 Feb 2025, at 06:33, Ian Rogers <irogers@google.com> wrote:
>>
>> On Tue, Feb 18, 2025 at 2:19 PM Namhyung Kim <namhyung@kernel.org <mailto:namhyung@kernel.org>> wrote:
>>>
>>> On Tue, Feb 18, 2025 at 09:30:23AM +0000, James Clark wrote:
>>>>
>>>>
>>>> On 18/02/2025 12:41 am, Ian Rogers wrote:
>>>>> On Fri, Feb 14, 2025 at 2:02 AM James Clark <james.clark@linaro.org> wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 14/02/2025 5:49 am, Yangyu Chen wrote:
>>>>>>>
>>>>>>>
>>>>>>>> On 14 Feb 2025, at 09:12, Namhyung Kim <namhyung@kernel.org> wrote:
>>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> On Thu, Feb 13, 2025 at 11:11:01PM +0800, Yangyu Chen wrote:
>>>>>>>>> This patchset adds the perf JSON files for the Cortex-A720 and Cortex-A520
>>>>>>>>> processors. Some events have been tested on Raxda Orion 6 with Cix P1 SoC
>>>>>>>>> (8xA720 + 4xA520) running mainline Kernel with ACPI mode.
>>>>>>>>
>>>>>>>> I'm curious how the name of PMUs look like. It is cortex_a720 (or a520)?
>>>>>>>
>>>>>>> The name of PMUs comes from Arm's documentation. I have included these
>>>>>>> links in each patch.
>>>>>>>
>>>>>>>> I remember there's a logic to check the length of hex digits at the end.
>>>>>>>>
>>>>>>>
>>>>>>> Could you provide more details about this?
>>>>>>>
>>>>>>>> Ian, are you ok with this?
>>>>>>>>
>>>>>>
>>>>>> I think they wouldn't be merged because they're core PMUs, so should be
>>>>>> fine? Even though they would otherwise be merged because they're more
>>>>>> than 3 hex digits.
>>>>>
>>>>> Do we know the PMU names? If they are cortex_a520 and cortex_a720 then
>>>>
>>>> It will be "armv9_cortex_a720" from this line:
>>>>
>>>> PMUV3_INIT_SIMPLE(armv9_cortex_a720)
>>>
>>> I see, thanks!
>>>
>>>>
>>>>> this comment at least reads a little stale:
>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n76
>>>>> ```
>>>>> /*
>>>>> * There is a '_{num}' suffix. For decimal suffixes any length
>>>>> * will do, for hexadecimal ensure more than 2 hex digits so
>>>>> * that S390's cpum_cf PMU doesn't match.
>>>>> */
>>>>> ```
>>>>> James is right that core PMUs aren't put on the same list as uncore/other PMUs.
>>>
>>> Ok, then I guess we're good.
>>
>> I think you may be able to do things that look odd, like today the
>> "i915" PMU can be called just "i", I think the a520/a720 naming will
>> allow "armv9_cortex/cycles/" as an event name, then open it on two
>> PMUs if they are present. We may only show one PMU in perf list as
>> that code I think assumes they're the same PMU as they only differ by
>> suffix:
>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n384
>> I can imagine aggregation possibly being broken, but I think that
>> works off the number of PMUs not the names of the PMUs, so it should
>> be okay. Probably the only thing broken that matter is perf list when
>> you have a BIG.little system with a520 and a720, this may be broken
>> with say a a53 and a72 today as both of those suffix lengths are >2,
>> but maybe they use the "armv8._pmuv3_0", "armv8._pmuv3_1", etc. naming
>> convention. I suspect the >2 here:
>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmus.c?h=perf-tools-next#n80
>> would still work and be correct if it were >4. If that changes then
>> this will also need to change:
>> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/Documentation/ABI/testing/sysfs-bus-event_source-devices?h=perf-tools-next#n12
>>
>> Thanks,
>> Ian
>>
>
> On my system, the names of PMUs are `armv8_pmuv3_0` and
> `armv8_pmuv3_1`:
>
> ```
> $ ls /sys/bus/event_source/devices/
> armv8_pmuv3_0 armv8_pmuv3_1 breakpoint kprobe software tracepoint uprobe
> ```
>
> I searched for ACPI DSDT on my platform, but there's no mention of
> a720 or a520. I haven't delved into the PMU kernel driver yet.
Ah yeah, with ACPI you get those names instead.
>
> Additionally, there's a more significant problem for aarch64
> BIG.little platforms when two or more types of cores don't have the
> same PMUs. The perf list can only display the core PMUs on core0
> unless we use the PERF_CPUID env to override it. This is because
> perf will only probe the first MIDR here:
> https://github.com/torvalds/linux/blob/87a132e73910e8689902aed7f2fc229d6908383b/tools/perf/arch/arm64/util/header.c#L60
>
> However, I think this doesn't block this patch for adding events and metrics?
>
>
> Thanks,
> Yangyu Chen
>
I don't think that's an issue because events are listed per PMU rather
than per CPU and that MIDR function does take a CPU struct. From my
testing the only thing stopping all PMUs from being listed was the
numeric suffix de-duplication.
Either way, no, it shouldn't affect your patch. But I'm also looking
into Ian's suggestion to improve it anyway.
>>>
>>> Thanks,
>>> Namhyung
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-02-20 14:37 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-13 15:11 [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics Yangyu Chen
2025-02-13 15:12 ` [PATCH 1/2] perf vendor events arm64: Add Cortex-A720 events/metrics Yangyu Chen
2025-02-13 16:49 ` Ian Rogers
2025-02-13 15:12 ` [PATCH 2/2] perf vendor events arm64: Add Cortex-A520 events/metrics Yangyu Chen
2025-02-13 16:53 ` Ian Rogers
2025-02-14 1:12 ` [PATCH 0/2] perf vendor events arm64: Add A720/A520 events/metrics Namhyung Kim
2025-02-14 5:49 ` Yangyu Chen
2025-02-14 10:02 ` James Clark
2025-02-18 0:41 ` Ian Rogers
2025-02-18 9:30 ` James Clark
2025-02-18 22:19 ` Namhyung Kim
2025-02-18 22:33 ` Ian Rogers
2025-02-19 15:25 ` James Clark
2025-02-19 18:37 ` Ian Rogers
2025-02-20 3:37 ` Yangyu Chen
[not found] ` <tencent_EDA4AFD185EF51104EDBCEB109D720862B05@qq.com>
2025-02-20 14:37 ` James Clark
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).