* [PATCH 0/3] perf vendor events amd: Address event errata
@ 2025-05-07 14:28 Sandipan Das
2025-05-07 14:28 ` [PATCH 1/3] perf vendor events amd: Remove Zen 5 instruction cache events Sandipan Das
` (3 more replies)
0 siblings, 4 replies; 8+ messages in thread
From: Sandipan Das @ 2025-05-07 14:28 UTC (permalink / raw)
To: linux-perf-users, linux-kernel
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Stephane Eranian,
Ravi Bangoria, Ananth Narayan, Sandipan Das
Remove unreliable Zen 5 events and metrics. The following errata from
the Revision Guide for AMD Family 1Ah Models 00h-0Fh Processors have
been addressed.
#1569 PMCx078 Counts Incorrectly in Unpredictable Ways
#1583 PMCx18E May Overcount Instruction Cache Accesses
#1587 PMCx188 May Undercount IBS (Instruction Based Sampling) Fetch Events
The document can be downloaded from
https://bugzilla.kernel.org/attachment.cgi?id=308095
Sandipan Das (3):
perf vendor events amd: Remove Zen 5 instruction cache events
perf vendor events amd: Remove Zen 5 TLB flush event
perf vendor events amd: Remove Zen 5 IBS fetch event
.../arch/x86/amdzen5/inst-cache.json | 24 -------------------
.../arch/x86/amdzen5/load-store.json | 6 -----
.../arch/x86/amdzen5/recommended.json | 13 ----------
3 files changed, 43 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH 1/3] perf vendor events amd: Remove Zen 5 instruction cache events
2025-05-07 14:28 [PATCH 0/3] perf vendor events amd: Address event errata Sandipan Das
@ 2025-05-07 14:28 ` Sandipan Das
2025-05-07 14:28 ` [PATCH 2/3] perf vendor events amd: Remove Zen 5 TLB flush event Sandipan Das
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Sandipan Das @ 2025-05-07 14:28 UTC (permalink / raw)
To: linux-perf-users, linux-kernel
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Stephane Eranian,
Ravi Bangoria, Ananth Narayan, Sandipan Das, stable
As mentioned in Erratum 1583 from the Revision Guide for AMD Family 1Ah
Models 00h-0Fh Processors available at the link below, PMCx18E reports
incorrect information about instruction cache accesses on Zen 5
processors. Remove affected events and metrics.
Link: https://bugzilla.kernel.org/attachment.cgi?id=308095
Fixes: 45c072f2537a ("perf vendor events amd: Add Zen 5 core events")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Cc: stable@vger.kernel.org
---
.../arch/x86/amdzen5/inst-cache.json | 18 ------------------
.../arch/x86/amdzen5/recommended.json | 6 ------
2 files changed, 24 deletions(-)
diff --git a/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json b/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json
index ad75e5bf9513..4fd5e2c5432f 100644
--- a/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json
+++ b/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json
@@ -33,24 +33,6 @@
"BriefDescription": "Fetches tagged by Fetch IBS that result in a valid sample and an IBS interrupt.",
"UMask": "0x10"
},
- {
- "EventName": "ic_tag_hit_miss.instruction_cache_hit",
- "EventCode": "0x18e",
- "BriefDescription": "Instruction cache hits.",
- "UMask": "0x07"
- },
- {
- "EventName": "ic_tag_hit_miss.instruction_cache_miss",
- "EventCode": "0x18e",
- "BriefDescription": "Instruction cache misses.",
- "UMask": "0x18"
- },
- {
- "EventName": "ic_tag_hit_miss.all_instruction_cache_accesses",
- "EventCode": "0x18e",
- "BriefDescription": "Instruction cache accesses of all types.",
- "UMask": "0x1f"
- },
{
"EventName": "op_cache_hit_miss.op_cache_hit",
"EventCode": "0x28f",
diff --git a/tools/perf/pmu-events/arch/x86/amdzen5/recommended.json b/tools/perf/pmu-events/arch/x86/amdzen5/recommended.json
index 635d57e3bc15..863f4b5dfc14 100644
--- a/tools/perf/pmu-events/arch/x86/amdzen5/recommended.json
+++ b/tools/perf/pmu-events/arch/x86/amdzen5/recommended.json
@@ -136,12 +136,6 @@
"MetricExpr": "d_ratio(op_cache_hit_miss.op_cache_miss, op_cache_hit_miss.all_op_cache_accesses)",
"ScaleUnit": "100%"
},
- {
- "MetricName": "ic_fetch_miss_ratio",
- "BriefDescription": "Instruction cache miss ratio for all fetches. An instruction cache miss will not be counted by this metric if it is an OC hit.",
- "MetricExpr": "d_ratio(ic_tag_hit_miss.instruction_cache_miss, ic_tag_hit_miss.all_instruction_cache_accesses)",
- "ScaleUnit": "100%"
- },
{
"MetricName": "l1_data_cache_fills_from_memory_pti",
"BriefDescription": "L1 data cache fills from DRAM or MMIO in any NUMA node per thousand instructions.",
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 2/3] perf vendor events amd: Remove Zen 5 TLB flush event
2025-05-07 14:28 [PATCH 0/3] perf vendor events amd: Address event errata Sandipan Das
2025-05-07 14:28 ` [PATCH 1/3] perf vendor events amd: Remove Zen 5 instruction cache events Sandipan Das
@ 2025-05-07 14:28 ` Sandipan Das
2025-05-07 14:28 ` [PATCH 3/3] perf vendor events amd: Remove Zen 5 IBS fetch event Sandipan Das
2025-05-07 15:56 ` [PATCH 0/3] perf vendor events amd: Address event errata Ian Rogers
3 siblings, 0 replies; 8+ messages in thread
From: Sandipan Das @ 2025-05-07 14:28 UTC (permalink / raw)
To: linux-perf-users, linux-kernel
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Stephane Eranian,
Ravi Bangoria, Ananth Narayan, Sandipan Das, stable
As mentioned in Erratum 1569 from the Revision Guide for AMD Family 1Ah
Models 00h-0Fh Processors available at the link below, PMCx078 reports
incorrect information about TLB flushes on Zen 5 processors. Remove
affected events and metrics.
Link: https://bugzilla.kernel.org/attachment.cgi?id=308095
Fixes: 45c072f2537a ("perf vendor events amd: Add Zen 5 core events")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Cc: stable@vger.kernel.org
---
tools/perf/pmu-events/arch/x86/amdzen5/load-store.json | 6 ------
tools/perf/pmu-events/arch/x86/amdzen5/recommended.json | 7 -------
2 files changed, 13 deletions(-)
diff --git a/tools/perf/pmu-events/arch/x86/amdzen5/load-store.json b/tools/perf/pmu-events/arch/x86/amdzen5/load-store.json
index ff6627a77805..f23a92bf55ac 100644
--- a/tools/perf/pmu-events/arch/x86/amdzen5/load-store.json
+++ b/tools/perf/pmu-events/arch/x86/amdzen5/load-store.json
@@ -502,12 +502,6 @@
"EventCode": "0x76",
"BriefDescription": "Core cycles not in halt."
},
- {
- "EventName": "ls_tlb_flush.all",
- "EventCode": "0x78",
- "BriefDescription": "All TLB Flushes.",
- "UMask": "0xff"
- },
{
"EventName": "ls_not_halted_p0_cyc.p0_freq_cyc",
"EventCode": "0x120",
diff --git a/tools/perf/pmu-events/arch/x86/amdzen5/recommended.json b/tools/perf/pmu-events/arch/x86/amdzen5/recommended.json
index 863f4b5dfc14..6b32308b1c3a 100644
--- a/tools/perf/pmu-events/arch/x86/amdzen5/recommended.json
+++ b/tools/perf/pmu-events/arch/x86/amdzen5/recommended.json
@@ -241,13 +241,6 @@
"MetricGroup": "tlb",
"ScaleUnit": "1e3per_1k_instr"
},
- {
- "MetricName": "all_tlbs_flushed_pti",
- "BriefDescription": "All TLBs flushed per thousand instructions.",
- "MetricExpr": "ls_tlb_flush.all / instructions",
- "MetricGroup": "tlb",
- "ScaleUnit": "1e3per_1k_instr"
- },
{
"MetricName": "macro_ops_dispatched",
"BriefDescription": "Macro-ops dispatched.",
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH 3/3] perf vendor events amd: Remove Zen 5 IBS fetch event
2025-05-07 14:28 [PATCH 0/3] perf vendor events amd: Address event errata Sandipan Das
2025-05-07 14:28 ` [PATCH 1/3] perf vendor events amd: Remove Zen 5 instruction cache events Sandipan Das
2025-05-07 14:28 ` [PATCH 2/3] perf vendor events amd: Remove Zen 5 TLB flush event Sandipan Das
@ 2025-05-07 14:28 ` Sandipan Das
2025-05-07 15:56 ` [PATCH 0/3] perf vendor events amd: Address event errata Ian Rogers
3 siblings, 0 replies; 8+ messages in thread
From: Sandipan Das @ 2025-05-07 14:28 UTC (permalink / raw)
To: linux-perf-users, linux-kernel
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, Kan Liang, Stephane Eranian,
Ravi Bangoria, Ananth Narayan, Sandipan Das, stable
As mentioned in Erratum 1544 from the Revision Guide for AMD Family 1Ah
Models 00h-0Fh Processors available at the link below, PMCx188 reports
incorrect information about valid IBS fetch samples when used with unit
mask 0x10 on Zen 5 processors. Remove affected events and metrics.
Link: https://bugzilla.kernel.org/attachment.cgi?id=308095
Fixes: 45c072f2537a ("perf vendor events amd: Add Zen 5 core events")
Signed-off-by: Sandipan Das <sandipan.das@amd.com>
Cc: stable@vger.kernel.org
---
tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json | 6 ------
1 file changed, 6 deletions(-)
diff --git a/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json b/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json
index 4fd5e2c5432f..3b61cf8a04da 100644
--- a/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json
+++ b/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json
@@ -27,12 +27,6 @@
"BriefDescription": "Fetches discarded after being tagged by Fetch IBS due to IBS filtering.",
"UMask": "0x08"
},
- {
- "EventName": "ic_fetch_ibs_events.sample_valid",
- "EventCode": "0x188",
- "BriefDescription": "Fetches tagged by Fetch IBS that result in a valid sample and an IBS interrupt.",
- "UMask": "0x10"
- },
{
"EventName": "op_cache_hit_miss.op_cache_hit",
"EventCode": "0x28f",
--
2.43.0
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] perf vendor events amd: Address event errata
2025-05-07 14:28 [PATCH 0/3] perf vendor events amd: Address event errata Sandipan Das
` (2 preceding siblings ...)
2025-05-07 14:28 ` [PATCH 3/3] perf vendor events amd: Remove Zen 5 IBS fetch event Sandipan Das
@ 2025-05-07 15:56 ` Ian Rogers
2025-05-08 10:56 ` Sandipan Das
3 siblings, 1 reply; 8+ messages in thread
From: Ian Rogers @ 2025-05-07 15:56 UTC (permalink / raw)
To: Sandipan Das
Cc: linux-perf-users, linux-kernel, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang,
Stephane Eranian, Ravi Bangoria, Ananth Narayan
On Wed, May 7, 2025 at 7:28 AM Sandipan Das <sandipan.das@amd.com> wrote:
>
> Remove unreliable Zen 5 events and metrics. The following errata from
> the Revision Guide for AMD Family 1Ah Models 00h-0Fh Processors have
> been addressed.
> #1569 PMCx078 Counts Incorrectly in Unpredictable Ways
> #1583 PMCx18E May Overcount Instruction Cache Accesses
> #1587 PMCx188 May Undercount IBS (Instruction Based Sampling) Fetch Events
>
> The document can be downloaded from
> https://bugzilla.kernel.org/attachment.cgi?id=308095
Hi Sandipan,
the document is somewhat brief, for example:
```
1583 PMCx18E May Overcount Instruction Cache Accesses
Description
If PMCx18E[IcAccessTypes] is programmed to 18x (Instruction Cache
Miss) or 1Fx (All Instruction Cache Accesses) then the performance
counter may overcount.
Potential Effect on System
Inaccuracies in performance monitoring software may be experienced.
Suggested Workaround
None
Fix Planned
No fix planned
```
Given being able to count instruction cache accesses (for example) is
a useful feature, would it be possible to change:
```
- {
- "EventName": "ic_tag_hit_miss.instruction_cache_hit",
- "EventCode": "0x18e",
- "BriefDescription": "Instruction cache hits.",
- "UMask": "0x07"
- },
...
```
to be say:
```
{
"EventName": "ic_tag_hit_miss.instruction_cache_hit",
"EventCode": "0x18e",
"BriefDescription": "Instruction cache hits. Note, this counter is
affected by errata 1583.",
"UMask": "0x07",
"Experimental": "1"
},
```
That is rather than remove the event, the event is tagged as
experimental (taken to mean accuracy isn't guaranteed) and the errata
is explicitly noted in the description. Currently the Experimental tag
has no impact on what happens in the perf tool, for example, the
"Deprecated" tag hides events in the `perf list` command and is
commonly used when an event is renamed.
Thanks,
Ian
> Sandipan Das (3):
> perf vendor events amd: Remove Zen 5 instruction cache events
> perf vendor events amd: Remove Zen 5 TLB flush event
> perf vendor events amd: Remove Zen 5 IBS fetch event
>
> .../arch/x86/amdzen5/inst-cache.json | 24 -------------------
> .../arch/x86/amdzen5/load-store.json | 6 -----
> .../arch/x86/amdzen5/recommended.json | 13 ----------
> 3 files changed, 43 deletions(-)
>
> --
> 2.43.0
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] perf vendor events amd: Address event errata
2025-05-07 15:56 ` [PATCH 0/3] perf vendor events amd: Address event errata Ian Rogers
@ 2025-05-08 10:56 ` Sandipan Das
2025-05-08 15:34 ` Ian Rogers
0 siblings, 1 reply; 8+ messages in thread
From: Sandipan Das @ 2025-05-08 10:56 UTC (permalink / raw)
To: Ian Rogers
Cc: linux-perf-users, linux-kernel, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang,
Stephane Eranian, Ravi Bangoria, Ananth Narayan
On 5/7/2025 9:26 PM, Ian Rogers wrote:
> On Wed, May 7, 2025 at 7:28 AM Sandipan Das <sandipan.das@amd.com> wrote:
>>
>> Remove unreliable Zen 5 events and metrics. The following errata from
>> the Revision Guide for AMD Family 1Ah Models 00h-0Fh Processors have
>> been addressed.
>> #1569 PMCx078 Counts Incorrectly in Unpredictable Ways
>> #1583 PMCx18E May Overcount Instruction Cache Accesses
>> #1587 PMCx188 May Undercount IBS (Instruction Based Sampling) Fetch Events
>>
>> The document can be downloaded from
>> https://bugzilla.kernel.org/attachment.cgi?id=308095
>
> Hi Sandipan,
>
> the document is somewhat brief, for example:
> ```
> 1583 PMCx18E May Overcount Instruction Cache Accesses
>
> Description
> If PMCx18E[IcAccessTypes] is programmed to 18x (Instruction Cache
> Miss) or 1Fx (All Instruction Cache Accesses) then the performance
> counter may overcount.
>
> Potential Effect on System
> Inaccuracies in performance monitoring software may be experienced.
>
> Suggested Workaround
> None
>
> Fix Planned
> No fix planned
> ```
> Given being able to count instruction cache accesses (for example) is
> a useful feature, would it be possible to change:
> ```
> - {
> - "EventName": "ic_tag_hit_miss.instruction_cache_hit",
> - "EventCode": "0x18e",
> - "BriefDescription": "Instruction cache hits.",
> - "UMask": "0x07"
> - },
> ...
> ```
> to be say:
> ```
> {
> "EventName": "ic_tag_hit_miss.instruction_cache_hit",
> "EventCode": "0x18e",
> "BriefDescription": "Instruction cache hits. Note, this counter is
> affected by errata 1583.",
> "UMask": "0x07",
> "Experimental": "1"
> },
> ```
> That is rather than remove the event, the event is tagged as
> experimental (taken to mean accuracy isn't guaranteed) and the errata
> is explicitly noted in the description. Currently the Experimental tag
> has no impact on what happens in the perf tool, for example, the
> "Deprecated" tag hides events in the `perf list` command and is
> commonly used when an event is renamed.
>
I agree that events like IC hits and misses are generally useful and am
fine with the idea of keeping them but my concern is that unless users
read the event description, there is no way for them to know if the
perf output that they are seeing may be unreliable. There is also no
guarantee that such events will be fixed in a future uarch. From a
quick glance, I couldn't find a mechanism that makes perf stat/report
show a warning for named events with known issues.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] perf vendor events amd: Address event errata
2025-05-08 10:56 ` Sandipan Das
@ 2025-05-08 15:34 ` Ian Rogers
2025-05-09 8:40 ` Sandipan Das
0 siblings, 1 reply; 8+ messages in thread
From: Ian Rogers @ 2025-05-08 15:34 UTC (permalink / raw)
To: Sandipan Das
Cc: linux-perf-users, linux-kernel, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang,
Stephane Eranian, Ravi Bangoria, Ananth Narayan
On Thu, May 8, 2025 at 3:56 AM Sandipan Das <sandipan.das@amd.com> wrote:
>
> On 5/7/2025 9:26 PM, Ian Rogers wrote:
> > On Wed, May 7, 2025 at 7:28 AM Sandipan Das <sandipan.das@amd.com> wrote:
> >>
> >> Remove unreliable Zen 5 events and metrics. The following errata from
> >> the Revision Guide for AMD Family 1Ah Models 00h-0Fh Processors have
> >> been addressed.
> >> #1569 PMCx078 Counts Incorrectly in Unpredictable Ways
> >> #1583 PMCx18E May Overcount Instruction Cache Accesses
> >> #1587 PMCx188 May Undercount IBS (Instruction Based Sampling) Fetch Events
> >>
> >> The document can be downloaded from
> >> https://bugzilla.kernel.org/attachment.cgi?id=308095
> >
> > Hi Sandipan,
> >
> > the document is somewhat brief, for example:
> > ```
> > 1583 PMCx18E May Overcount Instruction Cache Accesses
> >
> > Description
> > If PMCx18E[IcAccessTypes] is programmed to 18x (Instruction Cache
> > Miss) or 1Fx (All Instruction Cache Accesses) then the performance
> > counter may overcount.
> >
> > Potential Effect on System
> > Inaccuracies in performance monitoring software may be experienced.
> >
> > Suggested Workaround
> > None
> >
> > Fix Planned
> > No fix planned
> > ```
> > Given being able to count instruction cache accesses (for example) is
> > a useful feature, would it be possible to change:
> > ```
> > - {
> > - "EventName": "ic_tag_hit_miss.instruction_cache_hit",
> > - "EventCode": "0x18e",
> > - "BriefDescription": "Instruction cache hits.",
> > - "UMask": "0x07"
> > - },
> > ...
> > ```
> > to be say:
> > ```
> > {
> > "EventName": "ic_tag_hit_miss.instruction_cache_hit",
> > "EventCode": "0x18e",
> > "BriefDescription": "Instruction cache hits. Note, this counter is
> > affected by errata 1583.",
> > "UMask": "0x07",
> > "Experimental": "1"
> > },
> > ```
> > That is rather than remove the event, the event is tagged as
> > experimental (taken to mean accuracy isn't guaranteed) and the errata
> > is explicitly noted in the description. Currently the Experimental tag
> > has no impact on what happens in the perf tool, for example, the
> > "Deprecated" tag hides events in the `perf list` command and is
> > commonly used when an event is renamed.
> >
>
> I agree that events like IC hits and misses are generally useful and am
> fine with the idea of keeping them but my concern is that unless users
> read the event description, there is no way for them to know if the
> perf output that they are seeing may be unreliable. There is also no
> guarantee that such events will be fixed in a future uarch. From a
> quick glance, I couldn't find a mechanism that makes perf stat/report
> show a warning for named events with known issues.
So I'm forgetting the flow, but rediscovering it. We do have an Errata
json value as shown in:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/arm64/ampere/ampereone/memory.json?h=perf-tools-next#n2
```
{
"ArchStdEvent": "LD_RETIRED",
"Errata": "Errata AC03_CPU_52",
"BriefDescription": "Instruction architecturally executed,
condition code check pass, load.
Impacted by errata -"
},
```
It doesn't impact perf stat/record but it does get added to the event
description for perf list:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/jevents.py?h=perf-tools-next#n340
```
if 'Errata' in jd:
extra_desc += ' Spec update: ' + jd['Errata']
```
which means the perf list description ends up as "Instruction
architecturally executed, condition code check pass, load. Impacted by
errata - Spec update: Errata AC03_CPU_52". We could change this so
that the Errata is distinct in the encoded in perf json and then we
could display the errata when perf stat/record parses the event. I'd
be a little worried about this breaking things that parse perf's text
output, but the impact would be limited to Zen5, Ampere and older
Intel CPUs. We could also make the errata output conditional on
passing a verbose flag to perf. Would just `perf list` support work
for you or would the perf stat/record changes be a requirement for
keeping these events?
Thanks,
Ian
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH 0/3] perf vendor events amd: Address event errata
2025-05-08 15:34 ` Ian Rogers
@ 2025-05-09 8:40 ` Sandipan Das
0 siblings, 0 replies; 8+ messages in thread
From: Sandipan Das @ 2025-05-09 8:40 UTC (permalink / raw)
To: Ian Rogers
Cc: linux-perf-users, linux-kernel, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang,
Stephane Eranian, Ravi Bangoria, Ananth Narayan
On 5/8/2025 9:04 PM, Ian Rogers wrote:
> On Thu, May 8, 2025 at 3:56 AM Sandipan Das <sandipan.das@amd.com> wrote:
>>
>> On 5/7/2025 9:26 PM, Ian Rogers wrote:
>>> On Wed, May 7, 2025 at 7:28 AM Sandipan Das <sandipan.das@amd.com> wrote:
>>>>
>>>> Remove unreliable Zen 5 events and metrics. The following errata from
>>>> the Revision Guide for AMD Family 1Ah Models 00h-0Fh Processors have
>>>> been addressed.
>>>> #1569 PMCx078 Counts Incorrectly in Unpredictable Ways
>>>> #1583 PMCx18E May Overcount Instruction Cache Accesses
>>>> #1587 PMCx188 May Undercount IBS (Instruction Based Sampling) Fetch Events
>>>>
>>>> The document can be downloaded from
>>>> https://bugzilla.kernel.org/attachment.cgi?id=308095
>>>
>>> Hi Sandipan,
>>>
>>> the document is somewhat brief, for example:
>>> ```
>>> 1583 PMCx18E May Overcount Instruction Cache Accesses
>>>
>>> Description
>>> If PMCx18E[IcAccessTypes] is programmed to 18x (Instruction Cache
>>> Miss) or 1Fx (All Instruction Cache Accesses) then the performance
>>> counter may overcount.
>>>
>>> Potential Effect on System
>>> Inaccuracies in performance monitoring software may be experienced.
>>>
>>> Suggested Workaround
>>> None
>>>
>>> Fix Planned
>>> No fix planned
>>> ```
>>> Given being able to count instruction cache accesses (for example) is
>>> a useful feature, would it be possible to change:
>>> ```
>>> - {
>>> - "EventName": "ic_tag_hit_miss.instruction_cache_hit",
>>> - "EventCode": "0x18e",
>>> - "BriefDescription": "Instruction cache hits.",
>>> - "UMask": "0x07"
>>> - },
>>> ...
>>> ```
>>> to be say:
>>> ```
>>> {
>>> "EventName": "ic_tag_hit_miss.instruction_cache_hit",
>>> "EventCode": "0x18e",
>>> "BriefDescription": "Instruction cache hits. Note, this counter is
>>> affected by errata 1583.",
>>> "UMask": "0x07",
>>> "Experimental": "1"
>>> },
>>> ```
>>> That is rather than remove the event, the event is tagged as
>>> experimental (taken to mean accuracy isn't guaranteed) and the errata
>>> is explicitly noted in the description. Currently the Experimental tag
>>> has no impact on what happens in the perf tool, for example, the
>>> "Deprecated" tag hides events in the `perf list` command and is
>>> commonly used when an event is renamed.
>>>
>>
>> I agree that events like IC hits and misses are generally useful and am
>> fine with the idea of keeping them but my concern is that unless users
>> read the event description, there is no way for them to know if the
>> perf output that they are seeing may be unreliable. There is also no
>> guarantee that such events will be fixed in a future uarch. From a
>> quick glance, I couldn't find a mechanism that makes perf stat/report
>> show a warning for named events with known issues.
>
> So I'm forgetting the flow, but rediscovering it. We do have an Errata
> json value as shown in:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/arm64/ampere/ampereone/memory.json?h=perf-tools-next#n2
> ```
> {
> "ArchStdEvent": "LD_RETIRED",
> "Errata": "Errata AC03_CPU_52",
> "BriefDescription": "Instruction architecturally executed,
> condition code check pass, load.
> Impacted by errata -"
> },
> ```
> It doesn't impact perf stat/record but it does get added to the event
> description for perf list:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/jevents.py?h=perf-tools-next#n340
> ```
> if 'Errata' in jd:
> extra_desc += ' Spec update: ' + jd['Errata']
> ```
> which means the perf list description ends up as "Instruction
> architecturally executed, condition code check pass, load. Impacted by
> errata - Spec update: Errata AC03_CPU_52". We could change this so
> that the Errata is distinct in the encoded in perf json and then we
> could display the errata when perf stat/record parses the event. I'd
> be a little worried about this breaking things that parse perf's text
> output, but the impact would be limited to Zen5, Ampere and older
> Intel CPUs. We could also make the errata output conditional on
> passing a verbose flag to perf. Would just `perf list` support work
> for you or would the perf stat/record changes be a requirement for
> keeping these events?
>
Adding "Errata" tags to the affected events is a good start.
We can sort out the perf stat/record changes eventually.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2025-05-09 8:40 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-07 14:28 [PATCH 0/3] perf vendor events amd: Address event errata Sandipan Das
2025-05-07 14:28 ` [PATCH 1/3] perf vendor events amd: Remove Zen 5 instruction cache events Sandipan Das
2025-05-07 14:28 ` [PATCH 2/3] perf vendor events amd: Remove Zen 5 TLB flush event Sandipan Das
2025-05-07 14:28 ` [PATCH 3/3] perf vendor events amd: Remove Zen 5 IBS fetch event Sandipan Das
2025-05-07 15:56 ` [PATCH 0/3] perf vendor events amd: Address event errata Ian Rogers
2025-05-08 10:56 ` Sandipan Das
2025-05-08 15:34 ` Ian Rogers
2025-05-09 8:40 ` Sandipan Das
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).