* [RFC] perf tools: About encodings of legacy event names
@ 2025-02-20 0:38 Namhyung Kim
2025-02-20 6:37 ` Ian Rogers
0 siblings, 1 reply; 12+ messages in thread
From: Namhyung Kim @ 2025-02-20 0:38 UTC (permalink / raw)
To: linux-perf-users, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo
Cc: linux-kernel, Ian Rogers, Mark Rutland, Alexander Shishkin,
Jiri Olsa, Adrian Hunter, Kan Liang, Andi Kleen
Hello,
Ian and I have been discussing the behaviors of event encodings and it's
hard to find a point so far that we can agree on. So I'd like to hear
your opinion to resolve this. If you happen to have some time, you can
follow the thread here:
https://lore.kernel.org/linux-perf-users/20250109222109.567031-5-irogers@google.com/#r
Basically there are some events that were defined in the perf ABI.
PERF_COUNT_HW_CPU_CYCLES, PERF_COUNT_HW_INSTRUCTIONS, ...
So perf tools use those (legacy) encodings with the matching names like
"cycles" (or "cpu-cycles"), "instructions", etc.
On the another hand, it has wildcard matching for event names in PMUs so
that users can conveniently use events without specifying PMU names.
For example, "event1" would expand to "pmuX/event1/", "pmuY/event1/", etc
as long as the PMUs have the event in sysfs or JSON.
And there are platforms where "cycles" event is defined in a (core) PMU
(like "blah/cycles") and the event has a different behavior than the
legacy encoding. Then it has to choose which encoding is better for the
given name. But it's a general issue for the legacy event names.
Q. what should it do with "cycles"?
-----------------------------------
1. just use the legacy encoding (PERF_COUNT_HW_CPU_CYCLES).
2. expand to (possibly multiple) PMU events (like "cpu/cycles/") and
use the encoding defined there.
I think the #1 is the current behavior. The upside is it's simple and
intuitive. But it's different than other names and can make confusion.
Users should use the full name ("cpu/cycles/") if they need sysfs one.
The #2 can make the parsing code simpler and the behavior consistent.
Also it can override the (possibly broken) behavior of the legacy event
on some platforms. No way (and reason?) to use the legacy encodings.
Also Ian says `perf list` shows "cpu-cycles" and "cpu/cpu-cycles/"
together which assumes they are the same. But using #1 can result in a
different behavior respectively. Probably better to make it consistent.
I tried to summarize the issues concisely but may miss some issues. I
hope Ian can fill it up in case I missed something important.
Can you please share your opinion about what would be the best behavior?
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [RFC] perf tools: About encodings of legacy event names 2025-02-20 0:38 [RFC] perf tools: About encodings of legacy event names Namhyung Kim @ 2025-02-20 6:37 ` Ian Rogers 2025-02-24 15:01 ` Arnaldo Carvalho de Melo 0 siblings, 1 reply; 12+ messages in thread From: Ian Rogers @ 2025-02-20 6:37 UTC (permalink / raw) To: Namhyung Kim Cc: linux-perf-users, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, linux-kernel, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Andi Kleen On Wed, Feb 19, 2025 at 4:38 PM Namhyung Kim <namhyung@kernel.org> wrote: > > Hello, > > Ian and I have been discussing the behaviors of event encodings and it's > hard to find a point so far that we can agree on. So I'd like to hear > your opinion to resolve this. If you happen to have some time, you can > follow the thread here: > > https://lore.kernel.org/linux-perf-users/20250109222109.567031-5-irogers@google.com/#r > > Basically there are some events that were defined in the perf ABI. > > PERF_COUNT_HW_CPU_CYCLES, PERF_COUNT_HW_INSTRUCTIONS, ... > > So perf tools use those (legacy) encodings with the matching names like > "cycles" (or "cpu-cycles"), "instructions", etc. > > On the another hand, it has wildcard matching for event names in PMUs so > that users can conveniently use events without specifying PMU names. > For example, "event1" would expand to "pmuX/event1/", "pmuY/event1/", etc > as long as the PMUs have the event in sysfs or JSON. > > And there are platforms where "cycles" event is defined in a (core) PMU > (like "blah/cycles") and the event has a different behavior than the > legacy encoding. Then it has to choose which encoding is better for the > given name. But it's a general issue for the legacy event names. > > Q. what should it do with "cycles"? > ----------------------------------- > 1. just use the legacy encoding (PERF_COUNT_HW_CPU_CYCLES). > > 2. expand to (possibly multiple) PMU events (like "cpu/cycles/") and > use the encoding defined there. > > I think the #1 is the current behavior. The upside is it's simple and > intuitive. But it's different than other names and can make confusion. > Users should use the full name ("cpu/cycles/") if they need sysfs one. So the code keeps changing, for example, the removal of BPF events. I think the important change and the thing that has brought us here is the addition of what Intel call hybrid and ARM call BIG.little. ARM have got architectural events and so the same event encoding on BIG and little cores. On x86 the e-core (atom) and p-cores have different event encodings. In the original hybrid work, pushed on for the launch of Alderlake and reviewed by Jiri and Arnaldo with no involvement from me, Intel wanted the event names to work transparently. So a cycles event would be gathered on the e-core and p-core with a command line option to merge the legacy event cycles on both cores. To be specific the expected behavior of: $ perf (stat|record) -e cycles ... would be as if: $ perf (stat|record) -e cpu_atom/cycles/,cpu_core/cycles/ ... An unfortunate thing in the hybrid code was that it was hardcoded to PMU names of cpu_atom and cpu_core, but what to do for metrics? The original proposal was that metrics would have a PMU name and all names would be relative to that, but what about uncore events? What about metrics that refer to other metrics? A decision was made to not to have PMUs implicitly in metrics and the events in the metric would match those given to perf's -e command line. This also greatly simplifies testing events when creating a metric. I set about rewriting the hybrid code not to use hard coded PMU names but to discover core PMUs at runtime. This was to make the metric and event parsing code as generic as possible, but this had an unintended consequence. ARM's core PMU had previously not been seen as handling legacy events like cycles, and appeared as an uncore PMU. When there are both legacy and sysfs/json events then previously the legacy events had priority. This broke events like cycles on Apple M processors where the legacy events were broken and the sysfs ones not. Yes this is a driver bug, but everybody told me a change in behavior of the tool was needed to fix it, that fix was to make sysfs/json events have priority over legacy events. I prioritized fixing event parsing when a PMU was specified but given "cycles" means "cpu_atom/cycles/ and cpu_core/cycles/" for hybrid, why would the prioritization be different without a PMU specified? I knew of this tech debt and separately RISC-V was also interested to have sysfs/json be the priority so that the legacy to config encoding could exist more in the perf tool than the PMU driver. I'm a SIG vice-chair for RISC-V and hope to push things forward for RISC-V when I can. Given that ARM had originally required the prioritization, Intel signed off on this (with a huge number of Intel test expectations updated as a consequence), RISC-V desiring consistency I thought there was a pretty broad consensus to move forward with having the same prioritization of legacy and sysfs/json events for event names without PMUs as those with it. In doing this change I made: $ perf (stat|record) -e cycles ... behave like: $ perf (stat|record) -e */cycles/ ... This is the behavior with sysfs/json events (ie not legacy). For example: $ perf (stat|record) -e inst_retired.any ... in the event parsing code behaves like: $ perf (stat|record) -e */inst_retired.any/ ... That is every PMU advertising the event inst_retired.any (in the sysfs events directory or in json) will have it opened by the tool. My intent was that "cycles" behaving like "*/cycles/" was that it would match the change already done for hybrid of "cycles" behaving like "cpu_atom/cycles/,cpu_core/cycles/". However, this change caused an issue on ARM Neoverse that Linus couldn't tolerate and so reverted the change. Specifically ARM Neoverse's L3 cache PMU advertises a "cycles" event and such an event will fail to open with perf record. Specifying the PMU with the event avoids the wild card behavior and fixes the issue but it was put over strongly that "cycles" without a PMU should only ever mean CPU cycles. This missed hybrid/BIG.little type systems, but one could imagine in that case CPU means all core PMUs. As with not wanting hybrid PMU names hard coded, I dislike special cases for events. Not least as there are an exciting plethora of legacy names inside the perf tool particularly for legacy cache events. Similarly ARM core PMUs will advertise a "cpu_cycles" event, while "cycles" and "cpu-cycles" are both legacy names. It seems less than obvious that using a hyphen or an underscore would change the PMU wildcard behavior of an event name. That is the legacy "cpu-cycles" event would wild card only with core PMUs while sysfs "cpu_cycles" event would wild card with all PMUs. Anyway, the proposals to move forward on this were: 1) from me, ARM should rename their cycles event on their L3 cache PMU. They have a bus_cycles event on this PMU and so cpu_cycles would seem like a less ambiguous name. Leo wasn't keen on this in case it broke something depending on the event name. We've recently been talking of renaming swathes of ARM's PMUs, and moving things around in sysfs, so I don't feel renaming 1 event is massive but I feel such a change should be coming from ARM. 2) from Arnaldo, with perf record make it so that when events fail to open it is just warned about. This has been added to the patch series. It turns out that if no events open it breaks perf's own tests so the patch series warns when events fail to open and fails if the eventual evlist is empty or only contains dummy events. 3) my interpretation of what Linus and Namhyung are asking for, make certain events special so "cycles" only ever means the cycles on core PMUs. The definition of special seems to be that all legacy names are special, although Linus never explicitly said this he just mentioned cycles - in fact I pushed him on what he meant and he responded by blocking my email. This is a change in the wild carding behavior of the event lookup, but Namhyung has also said that without a PMU the legacy event encodings should be used. An issue here is that we wanted to avoid legacy encodings as they previously broke Apple M, another issue is that RISC-V want to avoid legacy encodings so that the mappings can live in the perf tool. > The #2 can make the parsing code simpler and the behavior consistent. > Also it can override the (possibly broken) behavior of the legacy event > on some platforms. No way (and reason?) to use the legacy encodings. > > Also Ian says `perf list` shows "cpu-cycles" and "cpu/cpu-cycles/" > together which assumes they are the same. But using #1 can result in a > different behavior respectively. Probably better to make it consistent. I could note that previously the event parsing and event printing code lived in the same file. It is odd to me that "cpu-cycles" should mean legacy and "cpu/cpu-cycles/" (that has a sysfs event on Intel) uses the sysfs/json encoding. A separate change I'm working on is making it so that tracepoints can appear in metrics. Currently the colon in a tracepoint is treated as an event modifier in the metrics code thereby breaking the events. Rather than re-invent modifier parsing for events in metrics it seems to make sense to just parse events individually. It would make sense that the logic in metrics to deduplicate events in metrics like: "MetricName": "cache_hit_percent", "MetricExpr": "100 * cache_hits/(cache_hits + cache_misses)", where the cache_hits event is duplicated, would work off of the perf_event_attr rather than the event name as currently happens. Using different encodings would defeat this - although you are unlikely to use different event names within the same metric, we do try to optimize and share metric formulas as much as possible. I think across perf's history the PMU has been treated as an optional prefix meaning (1) wild carding is normally expected and (2) event encodings shouldn't vary because of how the wild carding happens (the optional thing would be a required thing if it changes behavior, for example on Apple M CPUs to work around the PMU driver issue). > I tried to summarize the issues concisely but may miss some issues. I > hope Ian can fill it up in case I missed something important. > > Can you please share your opinion about what would be the best behavior? I'd very much appreciate guidance. This issue matters to me for consistency, cleanliness, tech debt,.. but is strategic for RISC-V. The series Namhyung linked to has the tags: Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Tested-by: James Clark <james.clark@linaro.org> Tested-by: Leo Yan <leo.yan@arm.com> Tested-by: Atish Patra <atishp@rivosinc.com> It is reapplying a patch that lived on linux-next for weeks/months and that should presumably mean that PowerPC and S390 are happy with it, as they are very good at reporting issues when things have gone awry. It carries a patch implementing Arnaldo's suggestion to make perf record failing to open a warning rather than fatal. It is unusual that patches in the perf tool have this level of review and this high number of tags. Thanks, Ian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] perf tools: About encodings of legacy event names 2025-02-20 6:37 ` Ian Rogers @ 2025-02-24 15:01 ` Arnaldo Carvalho de Melo 2025-02-24 17:36 ` Ian Rogers 2025-03-07 14:17 ` James Clark 0 siblings, 2 replies; 12+ messages in thread From: Arnaldo Carvalho de Melo @ 2025-02-24 15:01 UTC (permalink / raw) To: Ian Rogers Cc: Namhyung Kim, linux-perf-users, Peter Zijlstra, Ingo Molnar, linux-kernel, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Andi Kleen On Wed, Feb 19, 2025 at 10:37:33PM -0800, Ian Rogers wrote: > On Wed, Feb 19, 2025 at 4:38 PM Namhyung Kim <namhyung@kernel.org> wrote: > > Ian and I have been discussing the behaviors of event encodings and it's > > hard to find a point so far that we can agree on. So I'd like to hear > > your opinion to resolve this. If you happen to have some time, you can > > follow the thread here: > > https://lore.kernel.org/linux-perf-users/20250109222109.567031-5-irogers@google.com/#r > > Basically there are some events that were defined in the perf ABI. > > PERF_COUNT_HW_CPU_CYCLES, PERF_COUNT_HW_INSTRUCTIONS, ... > > So perf tools use those (legacy) encodings with the matching names like > > "cycles" (or "cpu-cycles"), "instructions", etc. > > On the another hand, it has wildcard matching for event names in PMUs so > > that users can conveniently use events without specifying PMU names. > > For example, "event1" would expand to "pmuX/event1/", "pmuY/event1/", etc > > as long as the PMUs have the event in sysfs or JSON. > > And there are platforms where "cycles" event is defined in a (core) PMU > > (like "blah/cycles") and the event has a different behavior than the > > legacy encoding. Then it has to choose which encoding is better for the > > given name. But it's a general issue for the legacy event names. So we pick the "legacy" one, as always, and the one that is in a PMU needs to have its pmu name specified, no? > > Q. what should it do with "cycles"? > > ----------------------------------- > > 1. just use the legacy encoding (PERF_COUNT_HW_CPU_CYCLES). Right > > 2. expand to (possibly multiple) PMU events (like "cpu/cycles/") and > > use the encoding defined there. Nope > > I think the #1 is the current behavior. The upside is it's simple and > > intuitive. But it's different than other names and can make confusion. > > Users should use the full name ("cpu/cycles/") if they need sysfs one. Right > So the code keeps changing, for example, the removal of BPF events. I Humm, this seems like a different discussion. > think the important change and the thing that has brought us here is > the addition of what Intel call hybrid and ARM call BIG.little. ARM Right, the support for hybrid systems brought lots of problems, most people didn't have access to such systems and thus couldn't test patches, so the attempt was to keep the existing code working as it had been while allowing for the support for the new hybrid systems. As more people get access to hybrid systems we started noticing problems and working on fixing it, you did a lot of work in this area, highly appreciated. > have got architectural events and so the same event encoding on BIG > and little cores. On x86 the e-core (atom) and p-cores have different > event encodings. In the original hybrid work, pushed on for the launch > of Alderlake and reviewed by Jiri and Arnaldo with no involvement from > me, Intel wanted the event names to work transparently. So a cycles Without access to such systems, yes, see above. > event would be gathered on the e-core and p-core with a command line > option to merge the legacy event cycles on both cores. To be specific > the expected behavior of: > $ perf (stat|record) -e cycles ... > would be as if: > $ perf (stat|record) -e cpu_atom/cycles/,cpu_core/cycles/ ... Yes, and if somebody wants to profile in just one of those core types, just specify the "cpu_atom" or "cpu_core" to have it fully specified. > An unfortunate thing in the hybrid code was that it was hardcoded to > PMU names of cpu_atom and cpu_core, but what to do for metrics? The Yeah, metrics IIRC came before hybrid systems. > original proposal was that metrics would have a PMU name and all names > would be relative to that, but what about uncore events? What about > metrics that refer to other metrics? A decision was made to not to > have PMUs implicitly in metrics and the events in the metric would > match those given to perf's -e command line. This also greatly > simplifies testing events when creating a metric. > I set about rewriting the hybrid code not to use hard coded PMU names > but to discover core PMUs at runtime. This was to make the metric and > event parsing code as generic as possible, but this had an unintended > consequence. ARM's core PMU had previously not been seen as handling > legacy events like cycles, and appeared as an uncore PMU. When there > are both legacy and sysfs/json events then previously the legacy > events had priority. This broke events like cycles on Apple M > processors where the legacy events were broken and the sysfs ones not. > Yes this is a driver bug, but everybody told me a change in behavior > of the tool was needed to fix it, that fix was to make sysfs/json > events have priority over legacy events. I prioritized fixing event > parsing when a PMU was specified but given "cycles" means > "cpu_atom/cycles/ and cpu_core/cycles/" for hybrid, why would the > prioritization be different without a PMU specified? > I knew of this tech debt and separately RISC-V was also interested to > have sysfs/json be the priority so that the legacy to config encoding > could exist more in the perf tool than the PMU driver. I'm a SIG I saw them saying that supporting PERF_TYPE_HARDWARE counters was ok as they didn't want to break the perf tooling workflow, no? > vice-chair for RISC-V and hope to push things forward for RISC-V when > I can. Given that ARM had originally required the prioritization, > Intel signed off on this (with a huge number of Intel test > expectations updated as a consequence), RISC-V desiring consistency I > thought there was a pretty broad consensus to move forward with having > the same prioritization of legacy and sysfs/json events for event > names without PMUs as those with it. > In doing this change I made: > $ perf (stat|record) -e cycles ... > behave like: > $ perf (stat|record) -e */cycles/ ... > This is the behavior with sysfs/json events (ie not legacy). For example: > $ perf (stat|record) -e inst_retired.any ... > in the event parsing code behaves like: > $ perf (stat|record) -e */inst_retired.any/ ... > That is every PMU advertising the event inst_retired.any (in the sysfs > events directory or in json) will have it opened by the tool. > My intent was that "cycles" behaving like "*/cycles/" was that it > would match the change already done for hybrid of "cycles" behaving > like "cpu_atom/cycles/,cpu_core/cycles/". However, this change caused > an issue on ARM Neoverse that Linus couldn't tolerate and so reverted > the change. Specifically ARM Neoverse's L3 cache PMU advertises a > "cycles" event and such an event will fail to open with perf record. > Specifying the PMU with the event avoids the wild card behavior and > fixes the issue but it was put over strongly that "cycles" without a > PMU should only ever mean CPU cycles. This missed hybrid/BIG.little > type systems, but one could imagine in that case CPU means all core > PMUs. > As with not wanting hybrid PMU names hard coded, I dislike special > cases for events. Not least as there are an exciting plethora of Ok, but the desire for consistency clashes with how things have been for a long time, tools expect, scripts, etc, so we seem to need the special case for what has been called in these threads "legacy events". > legacy names inside the perf tool particularly for legacy cache > events. Similarly ARM core PMUs will advertise a "cpu_cycles" event, > while "cycles" and "cpu-cycles" are both legacy names. It seems less > than obvious that using a hyphen or an underscore would change the PMU > wildcard behavior of an event name. That is the legacy "cpu-cycles" > event would wild card only with core PMUs while sysfs "cpu_cycles" > event would wild card with all PMUs. > Anyway, the proposals to move forward on this were: > 1) from me, ARM should rename their cycles event on their L3 cache > PMU. They have a bus_cycles event on this PMU and so cpu_cycles would > seem like a less ambiguous name. Leo wasn't keen on this in case it > broke something depending on the event name. We've recently been > talking of renaming swathes of ARM's PMUs, and moving things around in > sysfs, so I don't feel renaming 1 event is massive but I feel such a > change should be coming from ARM. > 2) from Arnaldo, with perf record make it so that when events fail to > open it is just warned about. This has been added to the patch series. In verbose mode, IIRC. If the user wants some "cycles" event and it is not included in the tool output then, as with other cases, using -v to see if some extra debug message can clarify seems the usual workflow. > It turns out that if no events open it breaks perf's own tests so the > patch series warns when events fail to open and fails if the eventual > evlist is empty or only contains dummy events. Do you have a link to this thread? > 3) my interpretation of what Linus and Namhyung are asking for, make > certain events special so "cycles" only ever means the cycles on core > PMUs. The definition of special seems to be that all legacy names are > special, although Linus never explicitly said this he just mentioned I think that is fair to say, to try and make the events that most people use as they worked in the past, i.e. no surprise for people not interested in metrics, hybrid systems, and lots of other features in perf that most people seldomly, if at all, use. - Arnaldo > cycles - in fact I pushed him on what he meant and he responded by > blocking my email. This is a change in the wild carding behavior of > the event lookup, but Namhyung has also said that without a PMU the > legacy event encodings should be used. An issue here is that we wanted > to avoid legacy encodings as they previously broke Apple M, another > issue is that RISC-V want to avoid legacy encodings so that the > mappings can live in the perf tool. > > The #2 can make the parsing code simpler and the behavior consistent. > > Also it can override the (possibly broken) behavior of the legacy event > > on some platforms. No way (and reason?) to use the legacy encodings. > > Also Ian says `perf list` shows "cpu-cycles" and "cpu/cpu-cycles/" > > together which assumes they are the same. But using #1 can result in a > > different behavior respectively. Probably better to make it consistent. > I could note that previously the event parsing and event printing code > lived in the same file. It is odd to me that "cpu-cycles" should mean > legacy and "cpu/cpu-cycles/" (that has a sysfs event on Intel) uses > the sysfs/json encoding. A separate change I'm working on is making it > so that tracepoints can appear in metrics. Currently the colon in a > tracepoint is treated as an event modifier in the metrics code thereby > breaking the events. Rather than re-invent modifier parsing for events > in metrics it seems to make sense to just parse events individually. > It would make sense that the logic in metrics to deduplicate events in > metrics like: > "MetricName": "cache_hit_percent", > "MetricExpr": "100 * cache_hits/(cache_hits + cache_misses)", > where the cache_hits event is duplicated, would work off of the > perf_event_attr rather than the event name as currently happens. Using > different encodings would defeat this - although you are unlikely to > use different event names within the same metric, we do try to > optimize and share metric formulas as much as possible. I think across > perf's history the PMU has been treated as an optional prefix meaning > (1) wild carding is normally expected and (2) event encodings > shouldn't vary because of how the wild carding happens (the optional > thing would be a required thing if it changes behavior, for example on > Apple M CPUs to work around the PMU driver issue). > > > I tried to summarize the issues concisely but may miss some issues. I > > hope Ian can fill it up in case I missed something important. > > > > Can you please share your opinion about what would be the best behavior? > > I'd very much appreciate guidance. This issue matters to me for > consistency, cleanliness, tech debt,.. but is strategic for RISC-V. > The series Namhyung linked to has the tags: > Signed-off-by: Ian Rogers <irogers@google.com> > Reviewed-by: Kan Liang <kan.liang@linux.intel.com> > Tested-by: James Clark <james.clark@linaro.org> > Tested-by: Leo Yan <leo.yan@arm.com> > Tested-by: Atish Patra <atishp@rivosinc.com> > It is reapplying a patch that lived on linux-next for weeks/months and > that should presumably mean that PowerPC and S390 are happy with it, > as they are very good at reporting issues when things have gone awry. > It carries a patch implementing Arnaldo's suggestion to make perf > record failing to open a warning rather than fatal. It is unusual that > patches in the perf tool have this level of review and this high > number of tags. > > Thanks, > Ian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] perf tools: About encodings of legacy event names 2025-02-24 15:01 ` Arnaldo Carvalho de Melo @ 2025-02-24 17:36 ` Ian Rogers 2025-02-24 20:25 ` Arnaldo Carvalho de Melo 2025-03-07 5:24 ` Ian Rogers 2025-03-07 14:17 ` James Clark 1 sibling, 2 replies; 12+ messages in thread From: Ian Rogers @ 2025-02-24 17:36 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Namhyung Kim, linux-perf-users, Peter Zijlstra, Ingo Molnar, linux-kernel, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Andi Kleen On Mon, Feb 24, 2025 at 7:01 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > On Wed, Feb 19, 2025 at 10:37:33PM -0800, Ian Rogers wrote: > > On Wed, Feb 19, 2025 at 4:38 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > Ian and I have been discussing the behaviors of event encodings and it's > > > hard to find a point so far that we can agree on. So I'd like to hear > > > your opinion to resolve this. If you happen to have some time, you can > > > follow the thread here: > > > > https://lore.kernel.org/linux-perf-users/20250109222109.567031-5-irogers@google.com/#r > > > > Basically there are some events that were defined in the perf ABI. > > > > PERF_COUNT_HW_CPU_CYCLES, PERF_COUNT_HW_INSTRUCTIONS, ... > > > > So perf tools use those (legacy) encodings with the matching names like > > > "cycles" (or "cpu-cycles"), "instructions", etc. > > > > On the another hand, it has wildcard matching for event names in PMUs so > > > that users can conveniently use events without specifying PMU names. > > > For example, "event1" would expand to "pmuX/event1/", "pmuY/event1/", etc > > > as long as the PMUs have the event in sysfs or JSON. > > > > And there are platforms where "cycles" event is defined in a (core) PMU > > > (like "blah/cycles") and the event has a different behavior than the > > > legacy encoding. Then it has to choose which encoding is better for the > > > given name. But it's a general issue for the legacy event names. > > So we pick the "legacy" one, as always, and the one that is in a PMU > needs to have its pmu name specified, no? > > > > Q. what should it do with "cycles"? > > > ----------------------------------- > > > 1. just use the legacy encoding (PERF_COUNT_HW_CPU_CYCLES). > > Right > > > > 2. expand to (possibly multiple) PMU events (like "cpu/cycles/") and > > > use the encoding defined there. > > Nope > > > > I think the #1 is the current behavior. The upside is it's simple and > > > intuitive. But it's different than other names and can make confusion. > > > Users should use the full name ("cpu/cycles/") if they need sysfs one. > > Right > > > So the code keeps changing, for example, the removal of BPF events. I > > Humm, this seems like a different discussion. > > > think the important change and the thing that has brought us here is > > the addition of what Intel call hybrid and ARM call BIG.little. ARM > > Right, the support for hybrid systems brought lots of problems, most > people didn't have access to such systems and thus couldn't test > patches, so the attempt was to keep the existing code working as it had > been while allowing for the support for the new hybrid systems. > > As more people get access to hybrid systems we started noticing problems > and working on fixing it, you did a lot of work in this area, highly > appreciated. > > > have got architectural events and so the same event encoding on BIG > > and little cores. On x86 the e-core (atom) and p-cores have different > > event encodings. In the original hybrid work, pushed on for the launch > > of Alderlake and reviewed by Jiri and Arnaldo with no involvement from > > me, Intel wanted the event names to work transparently. So a cycles > > Without access to such systems, yes, see above. > > > event would be gathered on the e-core and p-core with a command line > > option to merge the legacy event cycles on both cores. To be specific > > the expected behavior of: > > $ perf (stat|record) -e cycles ... > > would be as if: > > $ perf (stat|record) -e cpu_atom/cycles/,cpu_core/cycles/ ... > > Yes, and if somebody wants to profile in just one of those core types, > just specify the "cpu_atom" or "cpu_core" to have it fully specified. > > > An unfortunate thing in the hybrid code was that it was hardcoded to > > PMU names of cpu_atom and cpu_core, but what to do for metrics? The > > Yeah, metrics IIRC came before hybrid systems. > > > original proposal was that metrics would have a PMU name and all names > > would be relative to that, but what about uncore events? What about > > metrics that refer to other metrics? A decision was made to not to > > have PMUs implicitly in metrics and the events in the metric would > > match those given to perf's -e command line. This also greatly > > simplifies testing events when creating a metric. > > > I set about rewriting the hybrid code not to use hard coded PMU names > > but to discover core PMUs at runtime. This was to make the metric and > > event parsing code as generic as possible, but this had an unintended > > consequence. ARM's core PMU had previously not been seen as handling > > legacy events like cycles, and appeared as an uncore PMU. When there > > > are both legacy and sysfs/json events then previously the legacy > > events had priority. This broke events like cycles on Apple M > > processors where the legacy events were broken and the sysfs ones not. > > Yes this is a driver bug, but everybody told me a change in behavior > > of the tool was needed to fix it, that fix was to make sysfs/json > > events have priority over legacy events. I prioritized fixing event > > parsing when a PMU was specified but given "cycles" means > > "cpu_atom/cycles/ and cpu_core/cycles/" for hybrid, why would the > > prioritization be different without a PMU specified? > > > I knew of this tech debt and separately RISC-V was also interested to > > have sysfs/json be the priority so that the legacy to config encoding > > could exist more in the perf tool than the PMU driver. I'm a SIG > > I saw them saying that supporting PERF_TYPE_HARDWARE counters was ok as > they didn't want to break the perf tooling workflow, no? No. The request has always been that they don't want the PMU driver to do the PERF_TYPE_HARDWARE/PERF_COUNT_HW_* to PMU/config=??? mapping given the diversity of hardware and to keep the PMU driver simple. Here is Atish's feedback: https://lore.kernel.org/lkml/CAHBxVyH1q5CRW3emWTZu1oLZEfTWWVRH6B0OVggFxt-0NRzvwQ@mail.gmail.com/ """ If the overriding legacy with JSON is available, each future vendor may just provide the json file instead of modifying the driver. """ Because of the pain in landing the reverted patch then RISC-V has had to work with perf's behavior as it is, they need to ship products. I think things would have been simpler in their driver and the wider ecosystem if this hadn't been forced to happen. > > vice-chair for RISC-V and hope to push things forward for RISC-V when > > I can. Given that ARM had originally required the prioritization, > > Intel signed off on this (with a huge number of Intel test > > expectations updated as a consequence), RISC-V desiring consistency I > > thought there was a pretty broad consensus to move forward with having > > the same prioritization of legacy and sysfs/json events for event > > names without PMUs as those with it. > > > In doing this change I made: > > $ perf (stat|record) -e cycles ... > > behave like: > > $ perf (stat|record) -e */cycles/ ... > > This is the behavior with sysfs/json events (ie not legacy). For example: > > $ perf (stat|record) -e inst_retired.any ... > > in the event parsing code behaves like: > > $ perf (stat|record) -e */inst_retired.any/ ... > > That is every PMU advertising the event inst_retired.any (in the sysfs > > events directory or in json) will have it opened by the tool. > > > My intent was that "cycles" behaving like "*/cycles/" was that it > > would match the change already done for hybrid of "cycles" behaving > > like "cpu_atom/cycles/,cpu_core/cycles/". However, this change caused > > an issue on ARM Neoverse that Linus couldn't tolerate and so reverted > > the change. Specifically ARM Neoverse's L3 cache PMU advertises a > > "cycles" event and such an event will fail to open with perf record. > > > Specifying the PMU with the event avoids the wild card behavior and > > fixes the issue but it was put over strongly that "cycles" without a > > PMU should only ever mean CPU cycles. This missed hybrid/BIG.little > > type systems, but one could imagine in that case CPU means all core > > PMUs. > > > As with not wanting hybrid PMU names hard coded, I dislike special > > cases for events. Not least as there are an exciting plethora of > > Ok, but the desire for consistency clashes with how things have been for > a long time, tools expect, scripts, etc, so we seem to need the special > case for what has been called in these threads "legacy events". There is only one known issue of a problem for 1 event on 1 ARM model. If ARM renamed the event this wouldn't be an issue. The patch series works for this event but warns about it if you try to use it with perf record. > > legacy names inside the perf tool particularly for legacy cache > > events. Similarly ARM core PMUs will advertise a "cpu_cycles" event, > > while "cycles" and "cpu-cycles" are both legacy names. It seems less > > than obvious that using a hyphen or an underscore would change the PMU > > wildcard behavior of an event name. That is the legacy "cpu-cycles" > > event would wild card only with core PMUs while sysfs "cpu_cycles" > > event would wild card with all PMUs. > > > Anyway, the proposals to move forward on this were: > > > 1) from me, ARM should rename their cycles event on their L3 cache > > PMU. They have a bus_cycles event on this PMU and so cpu_cycles would > > seem like a less ambiguous name. Leo wasn't keen on this in case it > > broke something depending on the event name. We've recently been > > talking of renaming swathes of ARM's PMUs, and moving things around in > > sysfs, so I don't feel renaming 1 event is massive but I feel such a > > change should be coming from ARM. > > > > 2) from Arnaldo, with perf record make it so that when events fail to > > open it is just warned about. This has been added to the patch series. > > In verbose mode, IIRC. If the user wants some "cycles" event and it is > not included in the tool output then, as with other cases, using -v to > see if some extra debug message can clarify seems the usual workflow. So this has never made sense to me. To be specific, if I do: $ perf record -e data_read -a true then data_read is an uncore_imc_free_running event that doesn't support sampling. With the change you say then rather than a warning and failure, as currently happens, you are proposing that the command exits and doesn't warn. This is an extension of the only warn if verbose and ignore failing cycles event you mention. The problem is that there are scripts that look for the failure message, I think it is also surprising for users. Why do I know scripts check the output? perf's own tests started breaking because we weren't exiting when events failed to open. We routinely grep the perf output in those tests, I'm not aware of a specific case with perf record but given we do it with perf stat, then it doesn't seem unreasonable. For example, looking for "Access to performance monitoring and observability operations is limited." in the perf stat warning output: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/stat_all_pmu.sh?h=perf-tools-next#n31 but I mean it is routine as we rely on values like "<not supported>" being in the output, although I'd say more tests infer this from the exit code of the perf command being tested. Fwiw, it is common in Google's testing to not just expect an exit code but also a specific error message. > > It turns out that if no events open it breaks perf's own tests so the > > patch series warns when events fail to open and fails if the eventual > > evlist is empty or only contains dummy events. > > Do you have a link to this thread? It is in the cover letter: ``` v4: Rework the no events opening change from v3 to make it handle multiple dummy events. Sadly an evlist isn't empty if it just contains dummy events as the dummy event may be used with "perf record -e dummy .." as a way to determine whether permission issues exist. Other software events like cpu-clock would suffice for this, but the using dummy genie has left the bottle. Another problem is that we appear to have an excessive number of dummy events added, for example, we can likely avoid a dummy event and add sideband data to the original event. For auxtrace more dummy events may be opened too. Anyway, this has led to the approach taken in patch 3 where the number of dummy parsed events is computed. If the number of removed/failing-to-open non-dummy events matches the number of non-dummy events then we want to fail, but only if there are no parsed dummy events or if there was one then it must have opened. The math here is hard to read, but passes my manual testing. ``` James reported the issue: https://lore.kernel.org/lkml/823e66dc-9ff4-4168-be54-2e800aef0b28@linaro.org/ There was a discussion on the implementation here: https://lore.kernel.org/lkml/20250107180854.770470-4-irogers@google.com/ > > 3) my interpretation of what Linus and Namhyung are asking for, make > > certain events special so "cycles" only ever means the cycles on core > > PMUs. The definition of special seems to be that all legacy names are > > special, although Linus never explicitly said this he just mentioned > > I think that is fair to say, to try and make the events that most people > use as they worked in the past, i.e. no surprise for people not > interested in metrics, hybrid systems, and lots of other features in > perf that most people seldomly, if at all, use. I think this is what the series achieves: 1) encoding is consistent cycles == cpu/cycles/ - which I think has to be a goal for basic sanity; 2) the wildcard matching is consistent cpu_cycles will match on every PMU, cpu-cycles will match on every PMU, the behavior of hybrid is consistent with uncore, etc.; 3) "most people" is open to interpretation, I think this series fixes problems making "most people" happier. Specifically: 3.1) fixes a latent bug on ARM Apple-M CPUs due to broken legacy mappings, the whole reason the priority was flipped due to a broken PMU, 3.2) addresses RISC-V's issues, 3.3) allows ARM's neoverse not to fail for `perf record -e cycles ...` but warns about the PMU event inconsistency. The series also continues the push to make perf generic, avoid having special "hey if it is this do magic" type things, and so on. The problem with "no surprise" was that we've moved past a world of just 1 core PMU. A decision was made to make legacy events wildcard, and without my input Jiri Olsa and yourself reviewed and merged those changes. Fixing those changes to not be hardcoded to cpu_atom/cpu_core (arguably something that should never have been allowed to be merged in the 1st place given its obviously non-generic nature) broke ARM Apple-M and we made legacy events lower priority than sysfs/json to fix this as multiple people strongly requested (their proposal was just to revert 6 or more Linux releases worth of perf tool changes including adding Intel topdown metric support that I'd written). The move away from just a core PMU will continue as systems have greater heterogeneity and you are likely already spending more money on accelerators like GPUs than on your core PMU - something I keep pushing the perf tool to support, for example with hwmon and drm PMUs/events. Even 15 years ago this trend was clear. I think making it so that events fail to open and requiring a "-v" flag is surprising - "-v" is far too spammy for regular users. I think the perf_event_attr encoding of cpu-cycles not matching cpu/cpu-cycles/ is surprising. I think inst_retired.any matching events on all PMUs, but cycles only matching core PMUs (how should a user even know what these are?) is surprising. The series is trying to make "most people" happy with "no surprise" for them unless their hardware vendor happens to name an event on an uncore PMU to match a legacy event - there is only 1 known instance of this. There is a 4 character fix for that, specifically the cycles event should be "cpu_cycles" for consistency with the "bus_cycles" event, as cycles could imply either. The line of code where this 4 character change is necessary is here: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/perf/arm_dsu_pmu.c#n177 Given we live in a world where ARM PMU names mean we don't see core PMUs in perf list (very surprising behavior imo) the work to fix that will be a lot more invasive than a 1 event name change on an uncore PMU. Thanks, Ian > - Arnaldo > > > cycles - in fact I pushed him on what he meant and he responded by > > blocking my email. This is a change in the wild carding behavior of > > the event lookup, but Namhyung has also said that without a PMU the > > legacy event encodings should be used. An issue here is that we wanted > > to avoid legacy encodings as they previously broke Apple M, another > > issue is that RISC-V want to avoid legacy encodings so that the > > mappings can live in the perf tool. > > > > The #2 can make the parsing code simpler and the behavior consistent. > > > Also it can override the (possibly broken) behavior of the legacy event > > > on some platforms. No way (and reason?) to use the legacy encodings. > > > > Also Ian says `perf list` shows "cpu-cycles" and "cpu/cpu-cycles/" > > > together which assumes they are the same. But using #1 can result in a > > > different behavior respectively. Probably better to make it consistent. > > > I could note that previously the event parsing and event printing code > > lived in the same file. It is odd to me that "cpu-cycles" should mean > > legacy and "cpu/cpu-cycles/" (that has a sysfs event on Intel) uses > > the sysfs/json encoding. A separate change I'm working on is making it > > so that tracepoints can appear in metrics. Currently the colon in a > > tracepoint is treated as an event modifier in the metrics code thereby > > breaking the events. Rather than re-invent modifier parsing for events > > in metrics it seems to make sense to just parse events individually. > > It would make sense that the logic in metrics to deduplicate events in > > metrics like: > > "MetricName": "cache_hit_percent", > > "MetricExpr": "100 * cache_hits/(cache_hits + cache_misses)", > > where the cache_hits event is duplicated, would work off of the > > perf_event_attr rather than the event name as currently happens. Using > > different encodings would defeat this - although you are unlikely to > > use different event names within the same metric, we do try to > > optimize and share metric formulas as much as possible. I think across > > perf's history the PMU has been treated as an optional prefix meaning > > (1) wild carding is normally expected and (2) event encodings > > shouldn't vary because of how the wild carding happens (the optional > > thing would be a required thing if it changes behavior, for example on > > Apple M CPUs to work around the PMU driver issue). > > > > > I tried to summarize the issues concisely but may miss some issues. I > > > hope Ian can fill it up in case I missed something important. > > > > > > Can you please share your opinion about what would be the best behavior? > > > > I'd very much appreciate guidance. This issue matters to me for > > consistency, cleanliness, tech debt,.. but is strategic for RISC-V. > > The series Namhyung linked to has the tags: > > Signed-off-by: Ian Rogers <irogers@google.com> > > Reviewed-by: Kan Liang <kan.liang@linux.intel.com> > > Tested-by: James Clark <james.clark@linaro.org> > > Tested-by: Leo Yan <leo.yan@arm.com> > > Tested-by: Atish Patra <atishp@rivosinc.com> > > It is reapplying a patch that lived on linux-next for weeks/months and > > that should presumably mean that PowerPC and S390 are happy with it, > > as they are very good at reporting issues when things have gone awry. > > It carries a patch implementing Arnaldo's suggestion to make perf > > record failing to open a warning rather than fatal. It is unusual that > > patches in the perf tool have this level of review and this high > > number of tags. > > > > Thanks, > > Ian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] perf tools: About encodings of legacy event names 2025-02-24 17:36 ` Ian Rogers @ 2025-02-24 20:25 ` Arnaldo Carvalho de Melo 2025-02-24 21:34 ` Ian Rogers 2025-03-07 5:24 ` Ian Rogers 1 sibling, 1 reply; 12+ messages in thread From: Arnaldo Carvalho de Melo @ 2025-02-24 20:25 UTC (permalink / raw) To: Ian Rogers Cc: Namhyung Kim, linux-perf-users, Peter Zijlstra, Ingo Molnar, linux-kernel, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Andi Kleen On Mon, Feb 24, 2025 at 09:36:16AM -0800, Ian Rogers wrote: > On Mon, Feb 24, 2025 at 7:01 AM Arnaldo Carvalho de Melo > <acme@kernel.org> wrote: > > > > On Wed, Feb 19, 2025 at 10:37:33PM -0800, Ian Rogers wrote: > > > On Wed, Feb 19, 2025 at 4:38 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > Ian and I have been discussing the behaviors of event encodings and it's > > > > hard to find a point so far that we can agree on. So I'd like to hear > > > > your opinion to resolve this. If you happen to have some time, you can > > > > follow the thread here: > > > > > > https://lore.kernel.org/linux-perf-users/20250109222109.567031-5-irogers@google.com/#r > > > > > > Basically there are some events that were defined in the perf ABI. > > > > > > PERF_COUNT_HW_CPU_CYCLES, PERF_COUNT_HW_INSTRUCTIONS, ... > > > > > > So perf tools use those (legacy) encodings with the matching names like > > > > "cycles" (or "cpu-cycles"), "instructions", etc. > > > > > > On the another hand, it has wildcard matching for event names in PMUs so > > > > that users can conveniently use events without specifying PMU names. > > > > For example, "event1" would expand to "pmuX/event1/", "pmuY/event1/", etc > > > > as long as the PMUs have the event in sysfs or JSON. > > > > > > And there are platforms where "cycles" event is defined in a (core) PMU > > > > (like "blah/cycles") and the event has a different behavior than the > > > > legacy encoding. Then it has to choose which encoding is better for the > > > > given name. But it's a general issue for the legacy event names. > > > > So we pick the "legacy" one, as always, and the one that is in a PMU > > needs to have its pmu name specified, no? > > > > > > Q. what should it do with "cycles"? > > > > ----------------------------------- > > > > 1. just use the legacy encoding (PERF_COUNT_HW_CPU_CYCLES). > > > > Right > > > > > > 2. expand to (possibly multiple) PMU events (like "cpu/cycles/") and > > > > use the encoding defined there. > > > > Nope > > > > > > I think the #1 is the current behavior. The upside is it's simple and > > > > intuitive. But it's different than other names and can make confusion. > > > > Users should use the full name ("cpu/cycles/") if they need sysfs one. > > > > Right > > > > > So the code keeps changing, for example, the removal of BPF events. I > > > > Humm, this seems like a different discussion. > > > > > think the important change and the thing that has brought us here is > > > the addition of what Intel call hybrid and ARM call BIG.little. ARM > > > > Right, the support for hybrid systems brought lots of problems, most > > people didn't have access to such systems and thus couldn't test > > patches, so the attempt was to keep the existing code working as it had > > been while allowing for the support for the new hybrid systems. > > > > As more people get access to hybrid systems we started noticing problems > > and working on fixing it, you did a lot of work in this area, highly > > appreciated. > > > > > have got architectural events and so the same event encoding on BIG > > > and little cores. On x86 the e-core (atom) and p-cores have different > > > event encodings. In the original hybrid work, pushed on for the launch > > > of Alderlake and reviewed by Jiri and Arnaldo with no involvement from > > > me, Intel wanted the event names to work transparently. So a cycles > > > > Without access to such systems, yes, see above. > > > > > event would be gathered on the e-core and p-core with a command line > > > option to merge the legacy event cycles on both cores. To be specific > > > the expected behavior of: > > > $ perf (stat|record) -e cycles ... > > > would be as if: > > > $ perf (stat|record) -e cpu_atom/cycles/,cpu_core/cycles/ ... > > > > Yes, and if somebody wants to profile in just one of those core types, > > just specify the "cpu_atom" or "cpu_core" to have it fully specified. > > > > > An unfortunate thing in the hybrid code was that it was hardcoded to > > > PMU names of cpu_atom and cpu_core, but what to do for metrics? The > > > > Yeah, metrics IIRC came before hybrid systems. > > > > > original proposal was that metrics would have a PMU name and all names > > > would be relative to that, but what about uncore events? What about > > > metrics that refer to other metrics? A decision was made to not to > > > have PMUs implicitly in metrics and the events in the metric would > > > match those given to perf's -e command line. This also greatly > > > simplifies testing events when creating a metric. > > > > > I set about rewriting the hybrid code not to use hard coded PMU names > > > but to discover core PMUs at runtime. This was to make the metric and > > > event parsing code as generic as possible, but this had an unintended > > > consequence. ARM's core PMU had previously not been seen as handling > > > legacy events like cycles, and appeared as an uncore PMU. When there > > > > > are both legacy and sysfs/json events then previously the legacy > > > events had priority. This broke events like cycles on Apple M > > > processors where the legacy events were broken and the sysfs ones not. > > > Yes this is a driver bug, but everybody told me a change in behavior > > > of the tool was needed to fix it, that fix was to make sysfs/json > > > events have priority over legacy events. I prioritized fixing event > > > parsing when a PMU was specified but given "cycles" means > > > "cpu_atom/cycles/ and cpu_core/cycles/" for hybrid, why would the > > > prioritization be different without a PMU specified? > > > > > I knew of this tech debt and separately RISC-V was also interested to > > > have sysfs/json be the priority so that the legacy to config encoding > > > could exist more in the perf tool than the PMU driver. I'm a SIG > > > > I saw them saying that supporting PERF_TYPE_HARDWARE counters was ok as > > they didn't want to break the perf tooling workflow, no? > > No. The request has always been that they don't want the PMU driver to > do the PERF_TYPE_HARDWARE/PERF_COUNT_HW_* to PMU/config=??? mapping > given the diversity of hardware and to keep the PMU driver simple. > Here is Atish's feedback: > https://lore.kernel.org/lkml/CAHBxVyH1q5CRW3emWTZu1oLZEfTWWVRH6B0OVggFxt-0NRzvwQ@mail.gmail.com/ > """ > If the overriding legacy with JSON is available, each future vendor > may just provide the json file instead of modifying the driver. > """ So, making users to figure out what is the best event to use for some specific processor was one of the major reasons for perf to come about, otherwise we would still be stuck with oprofile. While it is clear that picking some "most important event and ratios of events" is super difficult, that was one of the ideas that made perf to come about. I wasn´t the architect of this, Ingo and Thomas were, with Peter then tryhing to make the kernel part sane. I just tried to follow those principles while not getting in the way of people wanting to have a common ground for observability in Linux. Having all these JSON tables was in place was a major concession to that original vision. Changing things as we go, with all these changes in the hardware landscape, some fleeting, some reinforced, is super difficult and leads to all this pain in trying to provide a sane experience to people using these tools. I don't think we can have it super consistent. It is super useful as is, or at least that what people say. So we need to keep on improving it while not telling people used to its workflows that they should re-learn the tool because at some point a decision was not properly made. > Because of the pain in landing the reverted patch then RISC-V has had > to work with perf's behavior as it is, they need to ship products. I you can present it like that, or you can say that they wanted the API to be different but since it wasn't like they liked they had to comply. Differently phrased he finally understood the value of the current API, maybe. > think things would have been simpler in their driver and the wider > ecosystem if this hadn't been forced to happen. > > > vice-chair for RISC-V and hope to push things forward for RISC-V when > > > I can. Given that ARM had originally required the prioritization, > > > Intel signed off on this (with a huge number of Intel test > > > expectations updated as a consequence), RISC-V desiring consistency I > > > thought there was a pretty broad consensus to move forward with having > > > the same prioritization of legacy and sysfs/json events for event > > > names without PMUs as those with it. > > > > > In doing this change I made: > > > $ perf (stat|record) -e cycles ... > > > behave like: > > > $ perf (stat|record) -e */cycles/ ... > > > This is the behavior with sysfs/json events (ie not legacy). For example: > > > $ perf (stat|record) -e inst_retired.any ... > > > in the event parsing code behaves like: > > > $ perf (stat|record) -e */inst_retired.any/ ... > > > That is every PMU advertising the event inst_retired.any (in the sysfs > > > events directory or in json) will have it opened by the tool. > > > > > My intent was that "cycles" behaving like "*/cycles/" was that it > > > would match the change already done for hybrid of "cycles" behaving > > > like "cpu_atom/cycles/,cpu_core/cycles/". However, this change caused > > > an issue on ARM Neoverse that Linus couldn't tolerate and so reverted > > > the change. Specifically ARM Neoverse's L3 cache PMU advertises a > > > "cycles" event and such an event will fail to open with perf record. > > > > > Specifying the PMU with the event avoids the wild card behavior and > > > fixes the issue but it was put over strongly that "cycles" without a > > > PMU should only ever mean CPU cycles. This missed hybrid/BIG.little > > > type systems, but one could imagine in that case CPU means all core > > > PMUs. > > > > > As with not wanting hybrid PMU names hard coded, I dislike special > > > cases for events. Not least as there are an exciting plethora of > > > > Ok, but the desire for consistency clashes with how things have been for > > a long time, tools expect, scripts, etc, so we seem to need the special > > case for what has been called in these threads "legacy events". > There is only one known issue of a problem for 1 event on 1 ARM model. > If ARM renamed the event this wouldn't be an issue. The patch series > works for this event but warns about it if you try to use it with perf > record. Well, maybe just the event name shouldn't be what determines if it should be used in some cases? Maybe some other characteristic should be probed for in each case? Have to stop here, not enough bandwidth. - Arnaldo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] perf tools: About encodings of legacy event names 2025-02-24 20:25 ` Arnaldo Carvalho de Melo @ 2025-02-24 21:34 ` Ian Rogers 2025-02-24 23:28 ` Ian Rogers 0 siblings, 1 reply; 12+ messages in thread From: Ian Rogers @ 2025-02-24 21:34 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Namhyung Kim, linux-perf-users, Peter Zijlstra, Ingo Molnar, linux-kernel, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Andi Kleen On Mon, Feb 24, 2025 at 12:25 PM Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > On Mon, Feb 24, 2025 at 09:36:16AM -0800, Ian Rogers wrote: > > On Mon, Feb 24, 2025 at 7:01 AM Arnaldo Carvalho de Melo > > <acme@kernel.org> wrote: > > > > > > On Wed, Feb 19, 2025 at 10:37:33PM -0800, Ian Rogers wrote: > > > > On Wed, Feb 19, 2025 at 4:38 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > Ian and I have been discussing the behaviors of event encodings and it's > > > > > hard to find a point so far that we can agree on. So I'd like to hear > > > > > your opinion to resolve this. If you happen to have some time, you can > > > > > follow the thread here: > > > > > > > > https://lore.kernel.org/linux-perf-users/20250109222109.567031-5-irogers@google.com/#r > > > > > > > > Basically there are some events that were defined in the perf ABI. > > > > > > > > PERF_COUNT_HW_CPU_CYCLES, PERF_COUNT_HW_INSTRUCTIONS, ... > > > > > > > > So perf tools use those (legacy) encodings with the matching names like > > > > > "cycles" (or "cpu-cycles"), "instructions", etc. > > > > > > > > On the another hand, it has wildcard matching for event names in PMUs so > > > > > that users can conveniently use events without specifying PMU names. > > > > > For example, "event1" would expand to "pmuX/event1/", "pmuY/event1/", etc > > > > > as long as the PMUs have the event in sysfs or JSON. > > > > > > > > And there are platforms where "cycles" event is defined in a (core) PMU > > > > > (like "blah/cycles") and the event has a different behavior than the > > > > > legacy encoding. Then it has to choose which encoding is better for the > > > > > given name. But it's a general issue for the legacy event names. > > > > > > So we pick the "legacy" one, as always, and the one that is in a PMU > > > needs to have its pmu name specified, no? > > > > > > > > Q. what should it do with "cycles"? > > > > > ----------------------------------- > > > > > 1. just use the legacy encoding (PERF_COUNT_HW_CPU_CYCLES). > > > > > > Right > > > > > > > > 2. expand to (possibly multiple) PMU events (like "cpu/cycles/") and > > > > > use the encoding defined there. > > > > > > Nope > > > > > > > > I think the #1 is the current behavior. The upside is it's simple and > > > > > intuitive. But it's different than other names and can make confusion. > > > > > Users should use the full name ("cpu/cycles/") if they need sysfs one. > > > > > > Right > > > > > > > So the code keeps changing, for example, the removal of BPF events. I > > > > > > Humm, this seems like a different discussion. > > > > > > > think the important change and the thing that has brought us here is > > > > the addition of what Intel call hybrid and ARM call BIG.little. ARM > > > > > > Right, the support for hybrid systems brought lots of problems, most > > > people didn't have access to such systems and thus couldn't test > > > patches, so the attempt was to keep the existing code working as it had > > > been while allowing for the support for the new hybrid systems. > > > > > > As more people get access to hybrid systems we started noticing problems > > > and working on fixing it, you did a lot of work in this area, highly > > > appreciated. > > > > > > > have got architectural events and so the same event encoding on BIG > > > > and little cores. On x86 the e-core (atom) and p-cores have different > > > > event encodings. In the original hybrid work, pushed on for the launch > > > > of Alderlake and reviewed by Jiri and Arnaldo with no involvement from > > > > me, Intel wanted the event names to work transparently. So a cycles > > > > > > Without access to such systems, yes, see above. > > > > > > > event would be gathered on the e-core and p-core with a command line > > > > option to merge the legacy event cycles on both cores. To be specific > > > > the expected behavior of: > > > > $ perf (stat|record) -e cycles ... > > > > would be as if: > > > > $ perf (stat|record) -e cpu_atom/cycles/,cpu_core/cycles/ ... > > > > > > Yes, and if somebody wants to profile in just one of those core types, > > > just specify the "cpu_atom" or "cpu_core" to have it fully specified. > > > > > > > An unfortunate thing in the hybrid code was that it was hardcoded to > > > > PMU names of cpu_atom and cpu_core, but what to do for metrics? The > > > > > > Yeah, metrics IIRC came before hybrid systems. > > > > > > > original proposal was that metrics would have a PMU name and all names > > > > would be relative to that, but what about uncore events? What about > > > > metrics that refer to other metrics? A decision was made to not to > > > > have PMUs implicitly in metrics and the events in the metric would > > > > match those given to perf's -e command line. This also greatly > > > > simplifies testing events when creating a metric. > > > > > > > I set about rewriting the hybrid code not to use hard coded PMU names > > > > but to discover core PMUs at runtime. This was to make the metric and > > > > event parsing code as generic as possible, but this had an unintended > > > > consequence. ARM's core PMU had previously not been seen as handling > > > > legacy events like cycles, and appeared as an uncore PMU. When there > > > > > > > are both legacy and sysfs/json events then previously the legacy > > > > events had priority. This broke events like cycles on Apple M > > > > processors where the legacy events were broken and the sysfs ones not. > > > > Yes this is a driver bug, but everybody told me a change in behavior > > > > of the tool was needed to fix it, that fix was to make sysfs/json > > > > events have priority over legacy events. I prioritized fixing event > > > > parsing when a PMU was specified but given "cycles" means > > > > "cpu_atom/cycles/ and cpu_core/cycles/" for hybrid, why would the > > > > prioritization be different without a PMU specified? > > > > > > > I knew of this tech debt and separately RISC-V was also interested to > > > > have sysfs/json be the priority so that the legacy to config encoding > > > > could exist more in the perf tool than the PMU driver. I'm a SIG > > > > > > I saw them saying that supporting PERF_TYPE_HARDWARE counters was ok as > > > they didn't want to break the perf tooling workflow, no? > > > > No. The request has always been that they don't want the PMU driver to > > do the PERF_TYPE_HARDWARE/PERF_COUNT_HW_* to PMU/config=??? mapping > > given the diversity of hardware and to keep the PMU driver simple. > > Here is Atish's feedback: > > https://lore.kernel.org/lkml/CAHBxVyH1q5CRW3emWTZu1oLZEfTWWVRH6B0OVggFxt-0NRzvwQ@mail.gmail.com/ > > """ > > If the overriding legacy with JSON is available, each future vendor > > may just provide the json file instead of modifying the driver. > > """ > > So, making users to figure out what is the best event to use for some > specific processor was one of the major reasons for perf to come about, > otherwise we would still be stuck with oprofile. Tbh, I think OProfile's major issue was a lack of active development. There were arguments that perf was going to be more integrated into the kernel, but there are no arch events, for example, in arch/arm64/events. Perf provided new energy for something started as a student project, it was and is good for having done that. > While it is clear that picking some "most important event and ratios of > events" is super difficult, that was one of the ideas that made perf to > come about. > > I wasn´t the architect of this, Ingo and Thomas were, with Peter then > tryhing to make the kernel part sane. > > I just tried to follow those principles while not getting in the way of > people wanting to have a common ground for observability in Linux. > > Having all these JSON tables was in place was a major concession to that > original vision. > > Changing things as we go, with all these changes in the hardware > landscape, some fleeting, some reinforced, is super difficult and leads > to all this pain in trying to provide a sane experience to people using > these tools. > > I don't think we can have it super consistent. > > It is super useful as is, or at least that what people say. > > So we need to keep on improving it while not telling people used to its > workflows that they should re-learn the tool because at some point a > decision was not properly made. So yes, but no. If you are used to perf on a pentium core 2 and then move to a hybrid alderlake system there has to be some notion of what cycles means on each core type. We had to change the tool to support this. ARM dealt with this long before Intel hybrid ever did, but they did so with no legacy event support just sysfs/json. Of course the wildcard behavior means that all the PMUs that supply an event will have it selected and people have been happy with this a long time. We used to have BPF events but no longer. It seemed like a fruitful thing, it became obsolete and was removed. We have legacy encodings where LLC means L2: https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evsel.c?h=perf-tools-next#n801 Fine at the time, now seems silly. LLC seems like a reasonable PMU name, but event parsing will trip over it in the clumsy way it does with raw events and "read" (raw event of 0xEAD). Event parsing used to broken for events like "llc-llc" now you can have that as an event now. Anyway, I dislike that you are arguing that I am trying to tell people to change their way of using the tool, it implies the patch series breaks something. If you feel this then you should provide an example. Yes Linus had a problem with the cycles event on Neoverse and now that, as you suggested, is lowered to be a warning. The original patch landed for weeks on linux-next without complaint. I am not asking people to "re-lean the tool", I am trying to fix things and make order out of something less than that. > > Because of the pain in landing the reverted patch then RISC-V has had > > to work with perf's behavior as it is, they need to ship products. I > > you can present it like that, or you can say that they wanted the API to > be different but since it wasn't like they liked they had to comply. > > Differently phrased he finally understood the value of the current API, > maybe. So Atish spoke to us at LPC 2023: https://lpc.events/event/17/ Mark Rutland also spoke to me at that time explaining how unhappy he was that ARM Apple-M was broken. I combined both issues in our discussions. Atish has also spoken to Namhyung and myself face-to-face during the RISC-V conference. These issues were discussed verbally before I wrote patches, before Atish made assumptions, and I think we both believe the proposals were and are reasonable. So let me completely disagree with your rephrasing. It is rude to imply that Atish and myself somehow are trying to push an agenda of changing and breaking an API. What has always been done is an attempt at pragmatism, such as cleaning up the hard coded issues with the original hybrid support, fixing Apple-M, .. > > think things would have been simpler in their driver and the wider > > ecosystem if this hadn't been forced to happen. > > > > > vice-chair for RISC-V and hope to push things forward for RISC-V when > > > > I can. Given that ARM had originally required the prioritization, > > > > Intel signed off on this (with a huge number of Intel test > > > > expectations updated as a consequence), RISC-V desiring consistency I > > > > thought there was a pretty broad consensus to move forward with having > > > > the same prioritization of legacy and sysfs/json events for event > > > > names without PMUs as those with it. > > > > > > > In doing this change I made: > > > > $ perf (stat|record) -e cycles ... > > > > behave like: > > > > $ perf (stat|record) -e */cycles/ ... > > > > This is the behavior with sysfs/json events (ie not legacy). For example: > > > > $ perf (stat|record) -e inst_retired.any ... > > > > in the event parsing code behaves like: > > > > $ perf (stat|record) -e */inst_retired.any/ ... > > > > That is every PMU advertising the event inst_retired.any (in the sysfs > > > > events directory or in json) will have it opened by the tool. > > > > > > > My intent was that "cycles" behaving like "*/cycles/" was that it > > > > would match the change already done for hybrid of "cycles" behaving > > > > like "cpu_atom/cycles/,cpu_core/cycles/". However, this change caused > > > > an issue on ARM Neoverse that Linus couldn't tolerate and so reverted > > > > the change. Specifically ARM Neoverse's L3 cache PMU advertises a > > > > "cycles" event and such an event will fail to open with perf record. > > > > > > > Specifying the PMU with the event avoids the wild card behavior and > > > > fixes the issue but it was put over strongly that "cycles" without a > > > > PMU should only ever mean CPU cycles. This missed hybrid/BIG.little > > > > type systems, but one could imagine in that case CPU means all core > > > > PMUs. > > > > > > > As with not wanting hybrid PMU names hard coded, I dislike special > > > > cases for events. Not least as there are an exciting plethora of > > > > > > Ok, but the desire for consistency clashes with how things have been for > > > a long time, tools expect, scripts, etc, so we seem to need the special > > > case for what has been called in these threads "legacy events". > > > There is only one known issue of a problem for 1 event on 1 ARM model. > > If ARM renamed the event this wouldn't be an issue. The patch series > > works for this event but warns about it if you try to use it with perf > > record. > > Well, maybe just the event name shouldn't be what determines if it > should be used in some cases? Maybe some other characteristic should be > probed for in each case? Do you want to explain this? We have events, without a PMU all events wildcard on every PMU. This is existing and long standing behavior. Uncore PMUs rely on it. Core PMUs with say a json event like inst_retired.any rely on it. You want a magic list of events that only work on core PMUs? Fine, what is it? I hate the idea as why does cpu_cycles mean something different than cpu-cycles, but if that is the only way to land the series then let's do it. You had been arguing for lowering the failure to just being a warning and not doing this. I do not understand where you stand. My stance is hopefully clear with the patch series. If you and Namhyung are worried that Linus seeing a warning message is a blocker the easiest fix is to rename 1 ARM uncore event. We could hide warnings and move them to verbose, imo that's gross. We can have a magic list of events, similarly imo gross but at least it unblocks this. > Have to stop here, not enough bandwidth. Thanks for making some time on this, Ian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] perf tools: About encodings of legacy event names 2025-02-24 21:34 ` Ian Rogers @ 2025-02-24 23:28 ` Ian Rogers 0 siblings, 0 replies; 12+ messages in thread From: Ian Rogers @ 2025-02-24 23:28 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Namhyung Kim, linux-perf-users, Peter Zijlstra, Ingo Molnar, linux-kernel, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Andi Kleen On Mon, Feb 24, 2025 at 1:34 PM Ian Rogers <irogers@google.com> wrote: > > On Mon, Feb 24, 2025 at 12:25 PM Arnaldo Carvalho de Melo > <acme@kernel.org> wrote: > > > > On Mon, Feb 24, 2025 at 09:36:16AM -0800, Ian Rogers wrote: > > > On Mon, Feb 24, 2025 at 7:01 AM Arnaldo Carvalho de Melo > > > <acme@kernel.org> wrote: > > > > > > > > On Wed, Feb 19, 2025 at 10:37:33PM -0800, Ian Rogers wrote: > > > > > On Wed, Feb 19, 2025 at 4:38 PM Namhyung Kim <namhyung@kernel.org> wrote: > > > > > > Ian and I have been discussing the behaviors of event encodings and it's > > > > > > hard to find a point so far that we can agree on. So I'd like to hear > > > > > > your opinion to resolve this. If you happen to have some time, you can > > > > > > follow the thread here: > > > > > > > > > > https://lore.kernel.org/linux-perf-users/20250109222109.567031-5-irogers@google.com/#r > > > > > > > > > > Basically there are some events that were defined in the perf ABI. > > > > > > > > > > PERF_COUNT_HW_CPU_CYCLES, PERF_COUNT_HW_INSTRUCTIONS, ... > > > > > > > > > > So perf tools use those (legacy) encodings with the matching names like > > > > > > "cycles" (or "cpu-cycles"), "instructions", etc. > > > > > > > > > > On the another hand, it has wildcard matching for event names in PMUs so > > > > > > that users can conveniently use events without specifying PMU names. > > > > > > For example, "event1" would expand to "pmuX/event1/", "pmuY/event1/", etc > > > > > > as long as the PMUs have the event in sysfs or JSON. > > > > > > > > > > And there are platforms where "cycles" event is defined in a (core) PMU > > > > > > (like "blah/cycles") and the event has a different behavior than the > > > > > > legacy encoding. Then it has to choose which encoding is better for the > > > > > > given name. But it's a general issue for the legacy event names. > > > > > > > > So we pick the "legacy" one, as always, and the one that is in a PMU > > > > needs to have its pmu name specified, no? > > > > > > > > > > Q. what should it do with "cycles"? > > > > > > ----------------------------------- > > > > > > 1. just use the legacy encoding (PERF_COUNT_HW_CPU_CYCLES). > > > > > > > > Right > > > > > > > > > > 2. expand to (possibly multiple) PMU events (like "cpu/cycles/") and > > > > > > use the encoding defined there. > > > > > > > > Nope > > > > > > > > > > I think the #1 is the current behavior. The upside is it's simple and > > > > > > intuitive. But it's different than other names and can make confusion. > > > > > > Users should use the full name ("cpu/cycles/") if they need sysfs one. > > > > > > > > Right > > > > > > > > > So the code keeps changing, for example, the removal of BPF events. I > > > > > > > > Humm, this seems like a different discussion. > > > > > > > > > think the important change and the thing that has brought us here is > > > > > the addition of what Intel call hybrid and ARM call BIG.little. ARM > > > > > > > > Right, the support for hybrid systems brought lots of problems, most > > > > people didn't have access to such systems and thus couldn't test > > > > patches, so the attempt was to keep the existing code working as it had > > > > been while allowing for the support for the new hybrid systems. > > > > > > > > As more people get access to hybrid systems we started noticing problems > > > > and working on fixing it, you did a lot of work in this area, highly > > > > appreciated. > > > > > > > > > have got architectural events and so the same event encoding on BIG > > > > > and little cores. On x86 the e-core (atom) and p-cores have different > > > > > event encodings. In the original hybrid work, pushed on for the launch > > > > > of Alderlake and reviewed by Jiri and Arnaldo with no involvement from > > > > > me, Intel wanted the event names to work transparently. So a cycles > > > > > > > > Without access to such systems, yes, see above. > > > > > > > > > event would be gathered on the e-core and p-core with a command line > > > > > option to merge the legacy event cycles on both cores. To be specific > > > > > the expected behavior of: > > > > > $ perf (stat|record) -e cycles ... > > > > > would be as if: > > > > > $ perf (stat|record) -e cpu_atom/cycles/,cpu_core/cycles/ ... > > > > > > > > Yes, and if somebody wants to profile in just one of those core types, > > > > just specify the "cpu_atom" or "cpu_core" to have it fully specified. > > > > > > > > > An unfortunate thing in the hybrid code was that it was hardcoded to > > > > > PMU names of cpu_atom and cpu_core, but what to do for metrics? The > > > > > > > > Yeah, metrics IIRC came before hybrid systems. > > > > > > > > > original proposal was that metrics would have a PMU name and all names > > > > > would be relative to that, but what about uncore events? What about > > > > > metrics that refer to other metrics? A decision was made to not to > > > > > have PMUs implicitly in metrics and the events in the metric would > > > > > match those given to perf's -e command line. This also greatly > > > > > simplifies testing events when creating a metric. > > > > > > > > > I set about rewriting the hybrid code not to use hard coded PMU names > > > > > but to discover core PMUs at runtime. This was to make the metric and > > > > > event parsing code as generic as possible, but this had an unintended > > > > > consequence. ARM's core PMU had previously not been seen as handling > > > > > legacy events like cycles, and appeared as an uncore PMU. When there > > > > > > > > > are both legacy and sysfs/json events then previously the legacy > > > > > events had priority. This broke events like cycles on Apple M > > > > > processors where the legacy events were broken and the sysfs ones not. > > > > > Yes this is a driver bug, but everybody told me a change in behavior > > > > > of the tool was needed to fix it, that fix was to make sysfs/json > > > > > events have priority over legacy events. I prioritized fixing event > > > > > parsing when a PMU was specified but given "cycles" means > > > > > "cpu_atom/cycles/ and cpu_core/cycles/" for hybrid, why would the > > > > > prioritization be different without a PMU specified? > > > > > > > > > I knew of this tech debt and separately RISC-V was also interested to > > > > > have sysfs/json be the priority so that the legacy to config encoding > > > > > could exist more in the perf tool than the PMU driver. I'm a SIG > > > > > > > > I saw them saying that supporting PERF_TYPE_HARDWARE counters was ok as > > > > they didn't want to break the perf tooling workflow, no? > > > > > > No. The request has always been that they don't want the PMU driver to > > > do the PERF_TYPE_HARDWARE/PERF_COUNT_HW_* to PMU/config=??? mapping > > > given the diversity of hardware and to keep the PMU driver simple. > > > Here is Atish's feedback: > > > https://lore.kernel.org/lkml/CAHBxVyH1q5CRW3emWTZu1oLZEfTWWVRH6B0OVggFxt-0NRzvwQ@mail.gmail.com/ > > > """ > > > If the overriding legacy with JSON is available, each future vendor > > > may just provide the json file instead of modifying the driver. > > > """ > > > > So, making users to figure out what is the best event to use for some > > specific processor was one of the major reasons for perf to come about, > > otherwise we would still be stuck with oprofile. > > Tbh, I think OProfile's major issue was a lack of active development. > There were arguments that perf was going to be more integrated into > the kernel, but there are no arch events, for example, in > arch/arm64/events. Perf provided new energy for something started as a > student project, it was and is good for having done that. > > > While it is clear that picking some "most important event and ratios of > > events" is super difficult, that was one of the ideas that made perf to > > come about. > > > > I wasn´t the architect of this, Ingo and Thomas were, with Peter then > > tryhing to make the kernel part sane. > > > > I just tried to follow those principles while not getting in the way of > > people wanting to have a common ground for observability in Linux. > > > > Having all these JSON tables was in place was a major concession to that > > original vision. Fwiw, I believe the original vision was seen as being flawed - at least that's what's come up in discussions I've had and Google has a policy to avoid legacy events. Specifically, a legacy event like instructions or branch-misses what could go wrong? Well does the instructions count contain speculatively executed instructions or not? Similarly for branch-misses. Cache misses, can they be speculative or not? The idea that there can be a counter and it gives just one value across vendors and models just doesn't exist in practice. On ARM in their json you'll see INST_RETIRED and INST_SPEC to deal with this, the exact behavior of the event can be seen in its description. What about the legacy instructions event? Well there's no description but it happens that it is mapped onto the retired instruction count, as it is for Intel. So the PMUs are trying for some consistency but you are crossing your fingers. If a PMU lacked a retired instruction count then why not use the speculative instruction count? If you look at the events for perf stat -ddd, on the well supported Intel (Skylake) and AMD (Turin): Supported on both: instructions, cycles, branches, branch-misses, L1-dcache-loads, L1-dcache-load-misses, L1-icache-load-misses, dTLB-loads, dTLB-load-misses, iTLB-loads, iTLB-load-misses. Supported only on Intel: LLC-loads, LLC-load-misses - presumably LLC means L2, so the event name is misleading Supported only on AMD: stalled-cycles-frontend, stalled-cycles-backed, L1-icache-loads, L1-dcache-prefetches Supported on neither: L1-dcache-prefetch-misses For metrics we added the "Default" metric group to deal with this diversity. I think "Default" also makes sense for events as: 1) it is more intention revealing to have a json event with a description rather than legacy event, 2) we can avoid events like branches and branch-misses that cause multiplexing on Intel p-cores when gathered with TopdownL1 (another surprise is this combination hides the multiplexing percentages), 3) legacy events just don't really make sense when we're asking for something (by default) like L1-dcache-prefetch-misses and no PMUs support it, 4) we can optionally set precision on events. People seem to be wildly ignorant of what precision architectures support, such as Linus copying cycles:ppp from Intel to ARM and expecting the ppps to actually mean something. Arguably it would be a service if perf failed if a precision greater than the PMU's max_precise is given - that's not practical given how precision works for AMD. Anyway, the idea that legacy events should make a comeback, this is best achieved, as ARM do, by having standard json events with standard names. Losing descriptions and adding ambiguity with kernel tables is a step backward imo. Thanks, Ian > > Changing things as we go, with all these changes in the hardware > > landscape, some fleeting, some reinforced, is super difficult and leads > > to all this pain in trying to provide a sane experience to people using > > these tools. > > > > I don't think we can have it super consistent. > > > > It is super useful as is, or at least that what people say. > > > > So we need to keep on improving it while not telling people used to its > > workflows that they should re-learn the tool because at some point a > > decision was not properly made. > > So yes, but no. If you are used to perf on a pentium core 2 and then > move to a hybrid alderlake system there has to be some notion of what > cycles means on each core type. We had to change the tool to support > this. ARM dealt with this long before Intel hybrid ever did, but they > did so with no legacy event support just sysfs/json. Of course the > wildcard behavior means that all the PMUs that supply an event will > have it selected and people have been happy with this a long time. > We used to have BPF events but no longer. It seemed like a fruitful > thing, it became obsolete and was removed. > We have legacy encodings where LLC means L2: > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evsel.c?h=perf-tools-next#n801 > Fine at the time, now seems silly. LLC seems like a reasonable PMU > name, but event parsing will trip over it in the clumsy way it does > with raw events and "read" (raw event of 0xEAD). Event parsing used to > broken for events like "llc-llc" now you can have that as an event > now. > > Anyway, I dislike that you are arguing that I am trying to tell people > to change their way of using the tool, it implies the patch series > breaks something. If you feel this then you should provide an example. > Yes Linus had a problem with the cycles event on Neoverse and now > that, as you suggested, is lowered to be a warning. The original patch > landed for weeks on linux-next without complaint. I am not asking > people to "re-lean the tool", I am trying to fix things and make order > out of something less than that. > > > > Because of the pain in landing the reverted patch then RISC-V has had > > > to work with perf's behavior as it is, they need to ship products. I > > > > you can present it like that, or you can say that they wanted the API to > > be different but since it wasn't like they liked they had to comply. > > > > Differently phrased he finally understood the value of the current API, > > maybe. > > So Atish spoke to us at LPC 2023: > https://lpc.events/event/17/ > Mark Rutland also spoke to me at that time explaining how unhappy he > was that ARM Apple-M was broken. I combined both issues in our > discussions. > Atish has also spoken to Namhyung and myself face-to-face during the > RISC-V conference. > These issues were discussed verbally before I wrote patches, before > Atish made assumptions, and I think we both believe the proposals were > and are reasonable. > > So let me completely disagree with your rephrasing. It is rude to > imply that Atish and myself somehow are trying to push an agenda of > changing and breaking an API. What has always been done is an attempt > at pragmatism, such as cleaning up the hard coded issues with the > original hybrid support, fixing Apple-M, .. > > > > think things would have been simpler in their driver and the wider > > > ecosystem if this hadn't been forced to happen. > > > > > > > vice-chair for RISC-V and hope to push things forward for RISC-V when > > > > > I can. Given that ARM had originally required the prioritization, > > > > > Intel signed off on this (with a huge number of Intel test > > > > > expectations updated as a consequence), RISC-V desiring consistency I > > > > > thought there was a pretty broad consensus to move forward with having > > > > > the same prioritization of legacy and sysfs/json events for event > > > > > names without PMUs as those with it. > > > > > > > > > In doing this change I made: > > > > > $ perf (stat|record) -e cycles ... > > > > > behave like: > > > > > $ perf (stat|record) -e */cycles/ ... > > > > > This is the behavior with sysfs/json events (ie not legacy). For example: > > > > > $ perf (stat|record) -e inst_retired.any ... > > > > > in the event parsing code behaves like: > > > > > $ perf (stat|record) -e */inst_retired.any/ ... > > > > > That is every PMU advertising the event inst_retired.any (in the sysfs > > > > > events directory or in json) will have it opened by the tool. > > > > > > > > > My intent was that "cycles" behaving like "*/cycles/" was that it > > > > > would match the change already done for hybrid of "cycles" behaving > > > > > like "cpu_atom/cycles/,cpu_core/cycles/". However, this change caused > > > > > an issue on ARM Neoverse that Linus couldn't tolerate and so reverted > > > > > the change. Specifically ARM Neoverse's L3 cache PMU advertises a > > > > > "cycles" event and such an event will fail to open with perf record. > > > > > > > > > Specifying the PMU with the event avoids the wild card behavior and > > > > > fixes the issue but it was put over strongly that "cycles" without a > > > > > PMU should only ever mean CPU cycles. This missed hybrid/BIG.little > > > > > type systems, but one could imagine in that case CPU means all core > > > > > PMUs. > > > > > > > > > As with not wanting hybrid PMU names hard coded, I dislike special > > > > > cases for events. Not least as there are an exciting plethora of > > > > > > > > Ok, but the desire for consistency clashes with how things have been for > > > > a long time, tools expect, scripts, etc, so we seem to need the special > > > > case for what has been called in these threads "legacy events". > > > > > There is only one known issue of a problem for 1 event on 1 ARM model. > > > If ARM renamed the event this wouldn't be an issue. The patch series > > > works for this event but warns about it if you try to use it with perf > > > record. > > > > Well, maybe just the event name shouldn't be what determines if it > > should be used in some cases? Maybe some other characteristic should be > > probed for in each case? > > Do you want to explain this? We have events, without a PMU all events > wildcard on every PMU. This is existing and long standing behavior. > Uncore PMUs rely on it. Core PMUs with say a json event like > inst_retired.any rely on it. > You want a magic list of events that only work on core PMUs? Fine, > what is it? I hate the idea as why does cpu_cycles mean something > different than cpu-cycles, but if that is the only way to land the > series then let's do it. You had been arguing for lowering the failure > to just being a warning and not doing this. I do not understand where > you stand. My stance is hopefully clear with the patch series. If you > and Namhyung are worried that Linus seeing a warning message is a > blocker the easiest fix is to rename 1 ARM uncore event. We could hide > warnings and move them to verbose, imo that's gross. We can have a > magic list of events, similarly imo gross but at least it unblocks > this. > > > Have to stop here, not enough bandwidth. > > Thanks for making some time on this, > Ian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] perf tools: About encodings of legacy event names 2025-02-24 17:36 ` Ian Rogers 2025-02-24 20:25 ` Arnaldo Carvalho de Melo @ 2025-03-07 5:24 ` Ian Rogers 1 sibling, 0 replies; 12+ messages in thread From: Ian Rogers @ 2025-03-07 5:24 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: Namhyung Kim, linux-perf-users, Peter Zijlstra, Ingo Molnar, linux-kernel, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Andi Kleen, Dapeng Mi, Thomas Falcon, Atish Patra, James Clark, Leo Yan On Mon, Feb 24, 2025 at 9:36 AM Ian Rogers <irogers@google.com> wrote: > > The series is trying to make "most people" happy with "no surprise" So the patch [1] fixes a number of event grouping bugs present on all Intel CPUs post Icelake (released April 2021). I think the code isn't trivial, and I've tried to simplify everything as much as possible. Again it shows the event parsing isn't a static problem and I am trying to contribute fixes which this RFC on [2] is holding up. On Alderlake I see problems: ``` $ perf stat -vv -e 'instructions,cpu_core/instructions/,cpu_atom/instructions/' perf test -w noploop Using CPUID GenuineIntel-6-B7-1 ... ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0xa00000001 (cpu_atom/PERF_COUNT_HW_INSTRUCTIONS/) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 ------------------------------------------------------------ sys_perf_event_open: pid 1157109 cpu -1 group_fd -1 flags 0x8 = 5 ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0x400000001 (cpu_core/PERF_COUNT_HW_INSTRUCTIONS/) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 ------------------------------------------------------------ sys_perf_event_open: pid 1157109 cpu -1 group_fd -1 flags 0x8 = 6 ------------------------------------------------------------ perf_event_attr: type 4 (cpu_core) size 136 config 0xc0 (instructions) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 ------------------------------------------------------------ sys_perf_event_open: pid 1157109 cpu -1 group_fd -1 flags 0x8 = 7 ------------------------------------------------------------ perf_event_attr: type 10 (cpu_atom) size 136 config 0xc0 (instructions) sample_type IDENTIFIER read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING disabled 1 inherit 1 enable_on_exec 1 ------------------------------------------------------------ ... Performance counter stats for 'perf test -w noploop': 13,323,070,728 cpu_atom/instructions/ (1.87%) 26,695,170,453 cpu_core/instructions/ (98.13%) 26,695,170,453 cpu_core/instructions/ (98.13%) 13,323,070,728 cpu_atom/instructions/ (1.87%) 1.007358136 seconds time elapsed 1.003129000 seconds user 0.004012000 seconds sys ``` That is the instructions event, literally the 2nd legacy event after cycles, getting 2 different encodings but the stat output showing the events as if they are the same. The multiplexing numbers obviously look off. There is still work to do. Specifying a topdown event twice in an event list currently breaks (e.g. `perf stat -e "slots,slots" ..`). I see situations in perf-tools-next where a wildcard tries to open a legacy event on even uncore PMUs. Given [2] under pins so much I think merging it has to be a priority. I suspect it needs rebasing and checking in the light of the changes in [1]. Given it has extensive reviewing and testing by vendors, it helps RISC-V and ARM Apple-M, I hope that after 2 weeks of this thread that hasn't stimulated additional comment (beyond the original comments on the series) things can move forward and fixing things can continue. Thanks, Ian [1] https://lore.kernel.org/lkml/20250307023906.1135613-3-irogers@google.com/ [2] https://lore.kernel.org/linux-perf-users/20250109222109.567031-5-irogers@google.com/#r ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] perf tools: About encodings of legacy event names 2025-02-24 15:01 ` Arnaldo Carvalho de Melo 2025-02-24 17:36 ` Ian Rogers @ 2025-03-07 14:17 ` James Clark 2025-03-07 15:10 ` Arnaldo Carvalho de Melo 1 sibling, 1 reply; 12+ messages in thread From: James Clark @ 2025-03-07 14:17 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, Ian Rogers, Namhyung Kim Cc: linux-perf-users, Peter Zijlstra, Ingo Molnar, linux-kernel, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Andi Kleen On 24/02/2025 3:01 pm, Arnaldo Carvalho de Melo wrote: > On Wed, Feb 19, 2025 at 10:37:33PM -0800, Ian Rogers wrote: >> On Wed, Feb 19, 2025 at 4:38 PM Namhyung Kim <namhyung@kernel.org> wrote: >>> Ian and I have been discussing the behaviors of event encodings and it's >>> hard to find a point so far that we can agree on. So I'd like to hear >>> your opinion to resolve this. If you happen to have some time, you can >>> follow the thread here: > >>> https://lore.kernel.org/linux-perf-users/20250109222109.567031-5-irogers@google.com/#r > >>> Basically there are some events that were defined in the perf ABI. > >>> PERF_COUNT_HW_CPU_CYCLES, PERF_COUNT_HW_INSTRUCTIONS, ... > >>> So perf tools use those (legacy) encodings with the matching names like >>> "cycles" (or "cpu-cycles"), "instructions", etc. > >>> On the another hand, it has wildcard matching for event names in PMUs so >>> that users can conveniently use events without specifying PMU names. >>> For example, "event1" would expand to "pmuX/event1/", "pmuY/event1/", etc >>> as long as the PMUs have the event in sysfs or JSON. > >>> And there are platforms where "cycles" event is defined in a (core) PMU >>> (like "blah/cycles") and the event has a different behavior than the >>> legacy encoding. Then it has to choose which encoding is better for the >>> given name. But it's a general issue for the legacy event names. > > So we pick the "legacy" one, as always, and the one that is in a PMU > needs to have its pmu name specified, no? > >>> Q. what should it do with "cycles"? >>> ----------------------------------- >>> 1. just use the legacy encoding (PERF_COUNT_HW_CPU_CYCLES). > > Right > >>> 2. expand to (possibly multiple) PMU events (like "cpu/cycles/") and >>> use the encoding defined there. > > Nope > >>> I think the #1 is the current behavior. The upside is it's simple and >>> intuitive. But it's different than other names and can make confusion. >>> Users should use the full name ("cpu/cycles/") if they need sysfs one. > > Right > >> So the code keeps changing, for example, the removal of BPF events. I > > Humm, this seems like a different discussion. > >> think the important change and the thing that has brought us here is >> the addition of what Intel call hybrid and ARM call BIG.little. ARM > > Right, the support for hybrid systems brought lots of problems, most > people didn't have access to such systems and thus couldn't test > patches, so the attempt was to keep the existing code working as it had > been while allowing for the support for the new hybrid systems. > > As more people get access to hybrid systems we started noticing problems > and working on fixing it, you did a lot of work in this area, highly > appreciated. > >> have got architectural events and so the same event encoding on BIG >> and little cores. On x86 the e-core (atom) and p-cores have different >> event encodings. In the original hybrid work, pushed on for the launch >> of Alderlake and reviewed by Jiri and Arnaldo with no involvement from >> me, Intel wanted the event names to work transparently. So a cycles > > Without access to such systems, yes, see above. > >> event would be gathered on the e-core and p-core with a command line >> option to merge the legacy event cycles on both cores. To be specific >> the expected behavior of: >> $ perf (stat|record) -e cycles ... >> would be as if: >> $ perf (stat|record) -e cpu_atom/cycles/,cpu_core/cycles/ ... > > Yes, and if somebody wants to profile in just one of those core types, > just specify the "cpu_atom" or "cpu_core" to have it fully specified. > >> An unfortunate thing in the hybrid code was that it was hardcoded to >> PMU names of cpu_atom and cpu_core, but what to do for metrics? The > > Yeah, metrics IIRC came before hybrid systems. > >> original proposal was that metrics would have a PMU name and all names >> would be relative to that, but what about uncore events? What about >> metrics that refer to other metrics? A decision was made to not to >> have PMUs implicitly in metrics and the events in the metric would >> match those given to perf's -e command line. This also greatly >> simplifies testing events when creating a metric. > >> I set about rewriting the hybrid code not to use hard coded PMU names >> but to discover core PMUs at runtime. This was to make the metric and >> event parsing code as generic as possible, but this had an unintended >> consequence. ARM's core PMU had previously not been seen as handling >> legacy events like cycles, and appeared as an uncore PMU. When there > >> are both legacy and sysfs/json events then previously the legacy >> events had priority. This broke events like cycles on Apple M >> processors where the legacy events were broken and the sysfs ones not. >> Yes this is a driver bug, but everybody told me a change in behavior >> of the tool was needed to fix it, that fix was to make sysfs/json >> events have priority over legacy events. I prioritized fixing event >> parsing when a PMU was specified but given "cycles" means >> "cpu_atom/cycles/ and cpu_core/cycles/" for hybrid, why would the >> prioritization be different without a PMU specified? > >> I knew of this tech debt and separately RISC-V was also interested to >> have sysfs/json be the priority so that the legacy to config encoding >> could exist more in the perf tool than the PMU driver. I'm a SIG > > I saw them saying that supporting PERF_TYPE_HARDWARE counters was ok as > they didn't want to break the perf tooling workflow, no? > Doesn't most of the discussion stem from this particular point? I also understood it that way, that risc-v folks agreed it was better to support these to make all existing software work, not just Perf. Maybe one issue was calling them 'legacy' events in the first place, and I'm not sure if there is complete consensus that these are legacy. Can't they continue be the short easy list of events likely to be common across platforms? If there is an issue with some of them being wrong in some places we can move forward from that by making sure new platforms do it right, rather than changing the logic for everyone to fix that bug. For the argument that Google prefers to use the sysfs events because of these differences, I don't think there is anything preventing that kind of use today? Or at least not for the main priority flip proposed, but maybe there are some smaller adjacent bugs that can be fixed up separately. Thanks James ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] perf tools: About encodings of legacy event names 2025-03-07 14:17 ` James Clark @ 2025-03-07 15:10 ` Arnaldo Carvalho de Melo 2025-03-07 18:48 ` Ian Rogers 0 siblings, 1 reply; 12+ messages in thread From: Arnaldo Carvalho de Melo @ 2025-03-07 15:10 UTC (permalink / raw) To: James Clark Cc: Ian Rogers, Namhyung Kim, linux-perf-users, Peter Zijlstra, Ingo Molnar, linux-kernel, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Andi Kleen On Fri, Mar 07, 2025 at 02:17:22PM +0000, James Clark wrote: > On 24/02/2025 3:01 pm, Arnaldo Carvalho de Melo wrote: > > On Wed, Feb 19, 2025 at 10:37:33PM -0800, Ian Rogers wrote: > > > I knew of this tech debt and separately RISC-V was also interested to > > > have sysfs/json be the priority so that the legacy to config encoding > > > could exist more in the perf tool than the PMU driver. I'm a SIG > > I saw them saying that supporting PERF_TYPE_HARDWARE counters was ok as > > they didn't want to break the perf tooling workflow, no? > Doesn't most of the discussion stem from this particular point? I also > understood it that way, that risc-v folks agreed it was better to support > these to make all existing software work, not just Perf. That is my understanding, and I agree with them and with you. > Maybe one issue was calling them 'legacy' events in the first place, and I'm > not sure if there is complete consensus that these are legacy. I don't see them as "legacy". > Can't they continue be the short easy list of events likely to be common across > platforms? That is my understanding of the original intent, yes. A first approximation, those who want to dig deeper, well, learn more about the architecture, learn about the extensive support for vendor/JSON events, sysfs ones, how to properly configure them taking advantage of the high level of flexibility both perf, the tool and perf the kernel subsystem allows them to be used, in groups, leader sampling, multiplexing or not, etc. But lots of developers seem to be OK with just the default events or using those aliases for expected events across architectures, sometimes specifying :ppp as a hint that if there are more precise events in this architecture, please use them, for instance. > If there is an issue with some of them being wrong in some places > we can move forward from that by making sure new platforms do it right, And adding special case for broken things when we know that some event named "cycles" shouldn't be used for sampling, for instance. > rather than changing the logic for everyone to fix that bug. Right. And again, if something doesn't work for a while in some architecture, its just a matter of specifying the name of the event in full form, with the PMU prefix, etc. > For the argument that Google prefers to use the sysfs events because of > these differences, I don't think there is anything preventing that kind of > use today? Indeed. > Or at least not for the main priority flip proposed, but maybe > there are some smaller adjacent bugs that can be fixed up separately. Yes, and work in this area is greatly appreciated. - Arnaldo ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] perf tools: About encodings of legacy event names 2025-03-07 15:10 ` Arnaldo Carvalho de Melo @ 2025-03-07 18:48 ` Ian Rogers 2025-03-07 22:10 ` Ian Rogers 0 siblings, 1 reply; 12+ messages in thread From: Ian Rogers @ 2025-03-07 18:48 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: James Clark, Namhyung Kim, linux-perf-users, Peter Zijlstra, Ingo Molnar, linux-kernel, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Andi Kleen On Fri, Mar 7, 2025 at 7:10 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > On Fri, Mar 07, 2025 at 02:17:22PM +0000, James Clark wrote: > > On 24/02/2025 3:01 pm, Arnaldo Carvalho de Melo wrote: > > > On Wed, Feb 19, 2025 at 10:37:33PM -0800, Ian Rogers wrote: > > > > I knew of this tech debt and separately RISC-V was also interested to > > > > have sysfs/json be the priority so that the legacy to config encoding > > > > could exist more in the perf tool than the PMU driver. I'm a SIG > > > > I saw them saying that supporting PERF_TYPE_HARDWARE counters was ok as > > > they didn't want to break the perf tooling workflow, no? > > > Doesn't most of the discussion stem from this particular point? I also > > understood it that way, that risc-v folks agreed it was better to support > > these to make all existing software work, not just Perf. > > That is my understanding, and I agree with them and with you. This is describing what RISC-V have been forced into doing: 1) to support non-perf tooling, 2) because the perf is inconsistent in priority with legacy and sysfs/json events. Their preference has been to move these problems into the tool not the PMU driver. What you are saying here is to ignore their preference. I've already quoted them in this thread saying this, but this keeps being ignored. Here is my previous message: https://lore.kernel.org/lkml/CAP-5=fXSgpZaAgickZSWgjt-2iTWK7FFZc65_HG3QhrTg1mtBw@mail.gmail.com/ > > Maybe one issue was calling them 'legacy' events in the first place, and I'm > > not sure if there is complete consensus that these are legacy. > > I don't see them as "legacy". So let me say this is really distracting from the intent in the series. The series is: 1) trying to clean up wild carding ambiguity - not making it dependent on the name of the event being parsed, the behavior of `cpu_cycles` matches that of `cpu-cycles` 2) trying to make the legacy vs sysfs/json prioritization consistent - making it so that `cpu_core/instructions/` encoding matches `instructions` as we display both of these as cpu_core/instructions/ and it is confusing to a user that different encodings were used. We also pattern match perf_event_attr config values in places like: https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/arch/x86/util/topdown.c?h=perf-tools-next#n38 so >1 config for the same event means such pattern matching needs to consider all cases. There is now a "Make Legacy Events Great Again" (MLEGA) effort that is standing in the way of clean up work. As already stated but repeating, why is MLEGA a bad thing: 1) legacy events lack descriptions and are open for interpretation. For example, do the events include counts for things done speculatively? 2) it is unneeded. Vendors can choose to name events the same name in sysfs and json. ARM are achieving pretty much all of the same thing with architecture standard events but in their use they will have appropriate event descriptions for each model giving all the caveats for the event. When something is common we can encode it in the common json we don't need legacy events for this: https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common?h=perf-tools-next 3) LLC doesn't mean L2, it nearly always means L3, the event names have become obsolete and confusing. More MLEGA means more of this. 4) PMUs have only ever supported a subset of the legacy events. We have to make use of legacy events in `perf stat` not fail when they are implicitly added as default events and via the -ddd options. 5) multiple encodings/PMU types for the same thing complicates things like topdown event ordering, that is a kernel/PMU restriction, and metric event deduplication. 6) legacy events are broken on ARM Apple-M and have been broken on Juno boards. 7) architectures trying to push complexity into user land (RISC-V) are being forced to push it into the kernel/driver. Is MLEGA relevant here? Well if you want legacy events to be > sysfs/json then yes. For wild carding I don't see why MLEGA cares. Do I want to push on MLEGA? No, and I think the reasons above are why it hasn't happened in over 10 years. > > Can't they continue be the short easy list of events likely to be common across > > platforms? > > That is my understanding of the original intent, yes. > > A first approximation, those who want to dig deeper, well, learn more > about the architecture, learn about the extensive support for > vendor/JSON events, sysfs ones, how to properly configure them taking > advantage of the high level of flexibility both perf, the tool and perf > the kernel subsystem allows them to be used, in groups, leader sampling, > multiplexing or not, etc. > > But lots of developers seem to be OK with just the default events or > using those aliases for expected events across architectures, sometimes > specifying :ppp as a hint that if there are more precise events in this > architecture, please use them, for instance. When and where have I said that I don't want to support events like instructions and cycles? See above, consistent wild carding and the encoding priority are the only issues here. > > If there is an issue with some of them being wrong in some places > > we can move forward from that by making sure new platforms do it right, > > And adding special case for broken things when we know that some event > named "cycles" shouldn't be used for sampling, for instance. What is this? A new framework for special casing PMUs and events, where we're maintaining lists of broken PMUs and changing encodings? And tooling like event sorting, metrics, is all supposed to just work with this? Are we going to write json for this? Who is writing/testing it for Apple-M? Special cases should be the exception and not an expected norm. > > rather than changing the logic for everyone to fix that bug. > > Right. And again, if something doesn't work for a while in some > architecture, its just a matter of specifying the name of the event in > full form, with the PMU prefix, etc. So MLEGA would like sysfs/json when they are broken? This is just silly, if something is broken we should just not use it. Having 2 ways of stating something and expecting different behaviors from them is clearly brittle. > > For the argument that Google prefers to use the sysfs events because of > > these differences, I don't think there is anything preventing that kind of > > use today? > > Indeed. I explained that in the context of why legacy events are wrong. I've repeated it above. This is not addressing the issues of wild carding and the encoding priority. > > Or at least not for the main priority flip proposed, but maybe > > there are some smaller adjacent bugs that can be fixed up separately. > > Yes, and work in this area is greatly appreciated. I don't know what your proposals are and to my eyes none of them have ever existed, no one has created them in over 10 years. I am trying to fix wild carding and the encoding priority. Bike shedding on MLEGA, please can we move it to a separate email thread. Thanks, Ian ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [RFC] perf tools: About encodings of legacy event names 2025-03-07 18:48 ` Ian Rogers @ 2025-03-07 22:10 ` Ian Rogers 0 siblings, 0 replies; 12+ messages in thread From: Ian Rogers @ 2025-03-07 22:10 UTC (permalink / raw) To: Arnaldo Carvalho de Melo Cc: James Clark, Namhyung Kim, linux-perf-users, Peter Zijlstra, Ingo Molnar, linux-kernel, Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Andi Kleen On Fri, Mar 7, 2025 at 10:48 AM Ian Rogers <irogers@google.com> wrote: > > On Fri, Mar 7, 2025 at 7:10 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote: > > > > On Fri, Mar 07, 2025 at 02:17:22PM +0000, James Clark wrote: > > > On 24/02/2025 3:01 pm, Arnaldo Carvalho de Melo wrote: > > > > On Wed, Feb 19, 2025 at 10:37:33PM -0800, Ian Rogers wrote: > > > > > I knew of this tech debt and separately RISC-V was also interested to > > > > > have sysfs/json be the priority so that the legacy to config encoding > > > > > could exist more in the perf tool than the PMU driver. I'm a SIG > > > > > > I saw them saying that supporting PERF_TYPE_HARDWARE counters was ok as > > > > they didn't want to break the perf tooling workflow, no? > > > > > Doesn't most of the discussion stem from this particular point? I also > > > understood it that way, that risc-v folks agreed it was better to support > > > these to make all existing software work, not just Perf. > > > > That is my understanding, and I agree with them and with you. > > This is describing what RISC-V have been forced into doing: > 1) to support non-perf tooling, > 2) because the perf is inconsistent in priority with legacy and > sysfs/json events. > > Their preference has been to move these problems into the tool not the > PMU driver. What you are saying here is to ignore their preference. > I've already quoted them in this thread saying this, but this keeps > being ignored. Here is my previous message: > https://lore.kernel.org/lkml/CAP-5=fXSgpZaAgickZSWgjt-2iTWK7FFZc65_HG3QhrTg1mtBw@mail.gmail.com/ > > > > Maybe one issue was calling them 'legacy' events in the first place, and I'm > > > not sure if there is complete consensus that these are legacy. > > > > I don't see them as "legacy". > > So let me say this is really distracting from the intent in the > series. The series is: > 1) trying to clean up wild carding ambiguity - not making it dependent > on the name of the event being parsed, the behavior of `cpu_cycles` > matches that of `cpu-cycles` > 2) trying to make the legacy vs sysfs/json prioritization consistent - > making it so that `cpu_core/instructions/` encoding matches > `instructions` as we display both of these as cpu_core/instructions/ > and it is confusing to a user that different encodings were used. We > also pattern match perf_event_attr config values in places like: > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/arch/x86/util/topdown.c?h=perf-tools-next#n38 > so >1 config for the same event means such pattern matching needs to > consider all cases. > > There is now a "Make Legacy Events Great Again" (MLEGA) effort that > is standing in the way of clean up work. As already stated but > repeating, why is MLEGA a bad thing: > 1) legacy events lack descriptions and are open for interpretation. > For example, do the events include counts for things done > speculatively? > 2) it is unneeded. Vendors can choose to name events the same name in > sysfs and json. ARM are achieving pretty much all of the same thing > with architecture standard events but in their use they will have > appropriate event descriptions for each model giving all the caveats > for the event. When something is common we can encode it in the common > json we don't need legacy events for this: > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/arch/common/common?h=perf-tools-next > 3) LLC doesn't mean L2, it nearly always means L3, the event names > have become obsolete and confusing. More MLEGA means more of this. > 4) PMUs have only ever supported a subset of the legacy events. We > have to make use of legacy events in `perf stat` not fail when they > are implicitly added as default events and via the -ddd options. > 5) multiple encodings/PMU types for the same thing complicates things > like topdown event ordering, that is a kernel/PMU restriction, and > metric event deduplication. > 6) legacy events are broken on ARM Apple-M and have been broken on Juno boards. To be clear here. If legacy events have priority and the encoding is broken in the driver, the tool needs to have workarounds for specific models, or the user needs to specify a PMU, we complicate event config matching, etc. If the event encoding is broken in the json we simply fix the json - no need to introduce additional tool complications. Beside a potential consistency of name across models, something that has never existed due to PMUs not implementing most legacy events and ambiguity over what an event is, I see no advantage to legacy events and major drawbacks. Thanks, Ian > 7) architectures trying to push complexity into user land (RISC-V) are > being forced to push it into the kernel/driver. > > Is MLEGA relevant here? Well if you want legacy events to be > > sysfs/json then yes. For wild carding I don't see why MLEGA cares. Do > I want to push on MLEGA? No, and I think the reasons above are why it > hasn't happened in over 10 years. > > > > Can't they continue be the short easy list of events likely to be common across > > > platforms? > > > > That is my understanding of the original intent, yes. > > > > A first approximation, those who want to dig deeper, well, learn more > > about the architecture, learn about the extensive support for > > vendor/JSON events, sysfs ones, how to properly configure them taking > > advantage of the high level of flexibility both perf, the tool and perf > > the kernel subsystem allows them to be used, in groups, leader sampling, > > multiplexing or not, etc. > > > > But lots of developers seem to be OK with just the default events or > > using those aliases for expected events across architectures, sometimes > > specifying :ppp as a hint that if there are more precise events in this > > architecture, please use them, for instance. > > When and where have I said that I don't want to support events like > instructions and cycles? See above, consistent wild carding and the > encoding priority are the only issues here. > > > > If there is an issue with some of them being wrong in some places > > > we can move forward from that by making sure new platforms do it right, > > > > And adding special case for broken things when we know that some event > > named "cycles" shouldn't be used for sampling, for instance. > > What is this? A new framework for special casing PMUs and events, > where we're maintaining lists of broken PMUs and changing encodings? > And tooling like event sorting, metrics, is all supposed to just work > with this? Are we going to write json for this? Who is writing/testing > it for Apple-M? > > Special cases should be the exception and not an expected norm. > > > > rather than changing the logic for everyone to fix that bug. > > > > Right. And again, if something doesn't work for a while in some > > architecture, its just a matter of specifying the name of the event in > > full form, with the PMU prefix, etc. > > So MLEGA would like sysfs/json when they are broken? This is just > silly, if something is broken we should just not use it. Having 2 ways > of stating something and expecting different behaviors from them is > clearly brittle. > > > > For the argument that Google prefers to use the sysfs events because of > > > these differences, I don't think there is anything preventing that kind of > > > use today? > > > > Indeed. > > I explained that in the context of why legacy events are wrong. I've > repeated it above. This is not addressing the issues of wild carding > and the encoding priority. > > > > Or at least not for the main priority flip proposed, but maybe > > > there are some smaller adjacent bugs that can be fixed up separately. > > > > Yes, and work in this area is greatly appreciated. > > I don't know what your proposals are and to my eyes none of them have > ever existed, no one has created them in over 10 years. > I am trying to fix wild carding and the encoding priority. > Bike shedding on MLEGA, please can we move it to a separate email thread. > > Thanks, > Ian ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2025-03-07 22:10 UTC | newest] Thread overview: 12+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-20 0:38 [RFC] perf tools: About encodings of legacy event names Namhyung Kim 2025-02-20 6:37 ` Ian Rogers 2025-02-24 15:01 ` Arnaldo Carvalho de Melo 2025-02-24 17:36 ` Ian Rogers 2025-02-24 20:25 ` Arnaldo Carvalho de Melo 2025-02-24 21:34 ` Ian Rogers 2025-02-24 23:28 ` Ian Rogers 2025-03-07 5:24 ` Ian Rogers 2025-03-07 14:17 ` James Clark 2025-03-07 15:10 ` Arnaldo Carvalho de Melo 2025-03-07 18:48 ` Ian Rogers 2025-03-07 22:10 ` Ian Rogers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).