linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 00/28] Legacy hardware/cache events as json
@ 2025-09-23 22:32 Ian Rogers
  2025-09-23 22:32 ` [PATCH v6 01/28] perf stat: Allow retry for default events Ian Rogers
                   ` (28 more replies)
  0 siblings, 29 replies; 38+ messages in thread
From: Ian Rogers @ 2025-09-23 22:32 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Xu Yang,
	Thomas Falcon, Andi Kleen, linux-kernel, linux-perf-users,
	Atish Patra, Beeman Strong, Leo Yan, Vince Weaver

Mirroring similar work for software events in commit 6e9fa4131abb
("perf parse-events: Remove non-json software events"). These changes
migrate the legacy hardware and cache events to json.  With no hard
coded legacy hardware or cache events the wild card, case
insensitivity, etc. is consistent for events. This does, however, mean
events like cycles will wild card against all PMUs. A change doing the
same was originally posted and merged from:
https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
and reverted by Linus in commit 4f1b067359ac ("Revert "perf
parse-events: Prefer sysfs/JSON hardware events over legacy"") due to
his dislike for the cycles behavior on ARM with perf record. Earlier
patches in this series make perf record event opening failures
non-fatal and hide the cycles event's failure to open on ARM in perf
record, so it is expected the behavior will now be transparent in perf
record on ARM. perf stat with a cycles event will wildcard open the
event on all PMUs, however, with default events the cycles event will
only be opened on core PMUs.

The key motivation for these patches is so that if, for example, you
run `perf stat -e cpu-cycles ...` on a hybrid x86 and the results are
printed out with "cpu_core/cpu-cycles/" and "cpu_atom/cpu-cycles/",
the perf_event_attr for cpu-cycles and cpu_core/cpu-cycles/ be the
same, similarly for the cpu_atom event. Prior to these patches the
event with a PMU prefers sysfs/json over legacy encodings while with
no PMU legacy encodings are preferred - these are different encodings
on x86.

The change to support legacy events with PMUs was done to clean up
Intel's hybrid PMU implementation. Having sysfs/json events with
increased priority to legacy was requested by Mark Rutland
 <mark.rutland@arm.com> to fix Apple-M PMU issues wrt broken legacy
events on that PMU. It is believed the PMU driver is now fixed, but
this has only been confirmed on ARM Juno boards. It was requested that
RISC-V be able to add events to the perf tool json so the PMU driver
didn't need to map legacy events to config encodings:
https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
This patch series achieves this.

A previous series of patches decreasing legacy hardware event
priorities was posted in:
https://lore.kernel.org/lkml/20250416045117.876775-1-irogers@google.com/
Namhyung Kim <namhyung@kernel.org> mentioned that hardware and
software events can be implemented similarly:
https://lore.kernel.org/lkml/aIJmJns2lopxf3EK@google.com/
and this patch series achieves this.

Note, patch 2 (perf parse-events: Fix legacy cache events if event is
duplicated in a PMU) fixes a function deleted by patch 17 (perf
parse-events: Remove hard coded legacy hardware and cache
parsing). Adding the json exposed an issue when legacy cache (not
legacy hardware) and sysfs/json events exist. The fix is necessary to
keep tests passing through the series. It is also posted for backports
to stable trees.

The perf list behavior includes a lot more information and events. The
before behavior on a hybrid alderlake is:
```
$ perf list hw

List of pre-defined events (to be used in -e or -M):

  branch-instructions OR branches                    [Hardware event]
  branch-misses                                      [Hardware event]
  bus-cycles                                         [Hardware event]
  cache-misses                                       [Hardware event]
  cache-references                                   [Hardware event]
  cpu-cycles OR cycles                               [Hardware event]
  instructions                                       [Hardware event]
  ref-cycles                                         [Hardware event]
$ perf list hwcache

List of pre-defined events (to be used in -e or -M):


cache:
  L1-dcache-loads OR cpu_atom/L1-dcache-loads/
  L1-dcache-stores OR cpu_atom/L1-dcache-stores/
  L1-icache-loads OR cpu_atom/L1-icache-loads/
  L1-icache-load-misses OR cpu_atom/L1-icache-load-misses/
  LLC-loads OR cpu_atom/LLC-loads/
  LLC-load-misses OR cpu_atom/LLC-load-misses/
  LLC-stores OR cpu_atom/LLC-stores/
  LLC-store-misses OR cpu_atom/LLC-store-misses/
  dTLB-loads OR cpu_atom/dTLB-loads/
  dTLB-load-misses OR cpu_atom/dTLB-load-misses/
  dTLB-stores OR cpu_atom/dTLB-stores/
  dTLB-store-misses OR cpu_atom/dTLB-store-misses/
  iTLB-load-misses OR cpu_atom/iTLB-load-misses/
  branch-loads OR cpu_atom/branch-loads/
  branch-load-misses OR cpu_atom/branch-load-misses/
  L1-dcache-loads OR cpu_core/L1-dcache-loads/
  L1-dcache-load-misses OR cpu_core/L1-dcache-load-misses/
  L1-dcache-stores OR cpu_core/L1-dcache-stores/
  L1-icache-load-misses OR cpu_core/L1-icache-load-misses/
  LLC-loads OR cpu_core/LLC-loads/
  LLC-load-misses OR cpu_core/LLC-load-misses/
  LLC-stores OR cpu_core/LLC-stores/
  LLC-store-misses OR cpu_core/LLC-store-misses/
  dTLB-loads OR cpu_core/dTLB-loads/
  dTLB-load-misses OR cpu_core/dTLB-load-misses/
  dTLB-stores OR cpu_core/dTLB-stores/
  dTLB-store-misses OR cpu_core/dTLB-store-misses/
  iTLB-load-misses OR cpu_core/iTLB-load-misses/
  branch-loads OR cpu_core/branch-loads/
  branch-load-misses OR cpu_core/branch-load-misses/
  node-loads OR cpu_core/node-loads/
  node-load-misses OR cpu_core/node-load-misses/
```
and after it is:
```
$ perf list hw

legacy hardware:
  branch-instructions
       [Retired branch instructions [This event is an alias of branches].
        Unit: cpu_atom]
  branch-misses
       [Mispredicted branch instructions. Unit: cpu_atom]
  branches
       [Retired branch instructions [This event is an alias of
        branch-instructions]. Unit: cpu_atom]
  bus-cycles
       [Bus cycles,which can be different from total cycles. Unit: cpu_atom]
  cache-misses
       [Cache misses. Usually this indicates Last Level Cache misses; this is
        intended to be used in conjunction with the
        PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
        Unit: cpu_atom]
  cache-references
       [Cache accesses. Usually this indicates Last Level Cache accesses but
        this may vary depending on your CPU. This may include prefetches and
        coherency messages; again this depends on the design of your CPU.
        Unit: cpu_atom]
  cpu-cycles
       [Total cycles. Be wary of what happens during CPU frequency scaling
        [This event is an alias of cycles]. Unit: cpu_atom]
  cycles
       [Total cycles. Be wary of what happens during CPU frequency scaling
        [This event is an alias of cpu-cycles]. Unit: cpu_atom]
  instructions
       [Retired instructions. Be careful,these can be affected by various
        issues,most notably hardware interrupt counts. Unit: cpu_atom]
  ref-cycles
       [Total cycles; not affected by CPU frequency scaling. Unit: cpu_atom]
  branch-instructions
       [Retired branch instructions [This event is an alias of branches].
        Unit: cpu_core]
  branch-misses
       [Mispredicted branch instructions. Unit: cpu_core]
  branches
       [Retired branch instructions [This event is an alias of
        branch-instructions]. Unit: cpu_core]
  bus-cycles
       [Bus cycles,which can be different from total cycles. Unit: cpu_core]
  cache-misses
       [Cache misses. Usually this indicates Last Level Cache misses; this is
        intended to be used in conjunction with the
        PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.
        Unit: cpu_core]
  cache-references
       [Cache accesses. Usually this indicates Last Level Cache accesses but
        this may vary depending on your CPU. This may include prefetches and
        coherency messages; again this depends on the design of your CPU.
        Unit: cpu_core]
  cpu-cycles
       [Total cycles. Be wary of what happens during CPU frequency scaling
        [This event is an alias of cycles]. Unit: cpu_core]
  cycles
       [Total cycles. Be wary of what happens during CPU frequency scaling
        [This event is an alias of cpu-cycles]. Unit: cpu_core]
  instructions
       [Retired instructions. Be careful,these can be affected by various
        issues,most notably hardware interrupt counts. Unit: cpu_core]
  ref-cycles
       [Total cycles; not affected by CPU frequency scaling. Unit: cpu_core]
$ perf list hwcache

legacy cache:
  branch-load-misses
       [Branch prediction unit read misses. Unit: cpu_atom]
  branch-loads
       [Branch prediction unit read accesses. Unit: cpu_atom]
  dtlb-load-misses
       [Data TLB read misses. Unit: cpu_atom]
  dtlb-loads
       [Data TLB read accesses. Unit: cpu_atom]
  dtlb-store-misses
       [Data TLB write misses. Unit: cpu_atom]
  dtlb-stores
       [Data TLB write accesses. Unit: cpu_atom]
  itlb-load-misses
       [Instruction TLB read misses. Unit: cpu_atom]
  l1-dcache-loads
       [Level 1 data cache read accesses. Unit: cpu_atom]
  l1-dcache-stores
       [Level 1 data cache write accesses. Unit: cpu_atom]
  l1-icache-load-misses
       [Level 1 instruction cache read misses. Unit: cpu_atom]
  l1-icache-loads
       [Level 1 instruction cache read accesses. Unit: cpu_atom]
  llc-load-misses
       [Last level cache read misses. Unit: cpu_atom]
  llc-loads
       [Last level cache read accesses. Unit: cpu_atom]
  llc-store-misses
       [Last level cache write misses. Unit: cpu_atom]
  llc-stores
       [Last level cache write accesses. Unit: cpu_atom]
  branch-load-misses
       [Branch prediction unit read misses. Unit: cpu_core]
  branch-loads
       [Branch prediction unit read accesses. Unit: cpu_core]
  dtlb-load-misses
       [Data TLB read misses. Unit: cpu_core]
  dtlb-loads
       [Data TLB read accesses. Unit: cpu_core]
  dtlb-store-misses
       [Data TLB write misses. Unit: cpu_core]
  dtlb-stores
       [Data TLB write accesses. Unit: cpu_core]
  itlb-load-misses
       [Instruction TLB read misses. Unit: cpu_core]
  l1-dcache-load-misses
       [Level 1 data cache read misses. Unit: cpu_core]
  l1-dcache-loads
       [Level 1 data cache read accesses. Unit: cpu_core]
  l1-dcache-stores
       [Level 1 data cache write accesses. Unit: cpu_core]
  l1-icache-load-misses
       [Level 1 instruction cache read misses. Unit: cpu_core]
  llc-load-misses
       [Last level cache read misses. Unit: cpu_core]
  llc-loads
       [Last level cache read accesses. Unit: cpu_core]
  llc-store-misses
       [Last level cache write misses. Unit: cpu_core]
  llc-stores
       [Last level cache write accesses. Unit: cpu_core]
  node-load-misses
       [Local memory read misses. Unit: cpu_core]
  node-loads
       [Local memory read accesses. Unit: cpu_core]
```

v6. Fix x86 hybrid mismatched number of evsels for the case a PMU is
    specified. Add patches to make failures in the parse-events test
    easier to diagnose. Reorder the perf stat default events patch to
    come earlier.

v5. Add patch for retrying default events, fixing regression when
    non-root and paranoid. Make cycles to cpu-cycles test event change
    (to avoid non-core ARM events) the default on all architectures
    (suggested by Namhyung). Switch all non-test cases to specifying a
    PMU. Improvements to the parse-events test including core PMU
    parsing support for architectures without a "cpu" PMU.
    https://lore.kernel.org/lkml/20250923041844.400164-1-irogers@google.com/

v4: Fixes for matching hard coded metrics in stat-shadow. Make the
    default "cycles" event string on ARM "cpu-cycles" which is the
    same legacy event but avoids name collisions on ARM PMUs. To
    support this, use evlist__new_default for the no command line
    event case in `perf record` and `perf top`. Make
    evlist__new_default only scan core PMUs.
    https://lore.kernel.org/lkml/20250914181121.1952748-1-irogers@google.com/#t

v3: Deprecate the legacy cache events that aren't shown in the
    previous perf list to avoid the perf list output being too verbose.
    https://lore.kernel.org/lkml/20250828205930.4007284-1-irogers@google.com/

v2: Additional details to the cover letter. Credit to Vince Weaver
    added to the commit message for the event details. Additional
    patches to clean up perf_pmu new_alias by removing an unused term
    scanner argument and avoid stdio usage.
    https://lore.kernel.org/lkml/20250828163225.3839073-1-irogers@google.com/

v1: https://lore.kernel.org/lkml/20250828064231.1762997-1-irogers@google.com/

Ian Rogers (28):
  perf stat: Allow retry for default events
  perf parse-events: Fix legacy cache events if event is duplicated in a
    PMU
  perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
  perf stat: Avoid wildcarding PMUs for default events
  perf record: Skip don't fail for events that don't open
  perf jevents: Support copying the source json files to OUTPUT
  perf pmu: Don't eagerly parse event terms
  perf parse-events: Remove unused FILE input argument to scanner
  perf pmu: Use fd rather than FILE from new_alias
  perf pmu: Factor term parsing into a perf_event_attr into a helper
  perf parse-events: Add terms for legacy hardware and cache config
    values
  perf jevents: Add legacy json terms and default_core event table
    helper
  perf pmu: Add and use legacy_terms in alias information
  perf jevents: Add legacy-hardware and legacy-cache json
  perf print-events: Remove print_hwcache_events
  perf print-events: Remove print_symbol_events
  perf parse-events: Remove hard coded legacy hardware and cache parsing
  perf record: Use evlist__new_default when no events specified
  perf top: Use evlist__new_default when no events specified
  perf evlist: Avoid scanning all PMUs for evlist__new_default
  perf evsel: Improvements to __evsel__match
  perf test parse-events: Use evsel__match for legacy events
  perf test parse-events: Without a PMU use cpu-cycles rather than
    cycles
  perf test parse-events: Remove cpu PMU requirement
  perf test: Switch cycles event to cpu-cycles
  perf test: Clean up test_..config helpers
  perf test parse-events: Add evlist test helper
  perf test parse-events: Add evsel test helper

 tools/perf/Makefile.perf                      |   21 +-
 tools/perf/arch/x86/util/intel-pt.c           |    2 +-
 tools/perf/builtin-list.c                     |   34 +-
 tools/perf/builtin-record.c                   |   97 +-
 tools/perf/builtin-stat.c                     |  171 +-
 tools/perf/builtin-top.c                      |    8 +-
 tools/perf/pmu-events/Build                   |   24 +-
 .../arch/common/common/legacy-hardware.json   |   72 +
 tools/perf/pmu-events/empty-pmu-events.c      | 2771 ++++++++++++++++-
 tools/perf/pmu-events/jevents.py              |   32 +
 tools/perf/pmu-events/make_legacy_cache.py    |  129 +
 tools/perf/pmu-events/pmu-events.h            |    1 +
 tools/perf/tests/code-reading.c               |    2 +-
 tools/perf/tests/keep-tracking.c              |    2 +-
 tools/perf/tests/parse-events.c               | 2010 ++++++------
 tools/perf/tests/perf-time-to-tsc.c           |    4 +-
 tools/perf/tests/pmu-events.c                 |   24 +-
 tools/perf/tests/pmu.c                        |    3 +-
 tools/perf/tests/switch-tracking.c            |    2 +-
 tools/perf/util/evlist.c                      |   18 +-
 tools/perf/util/evsel.c                       |   21 +-
 tools/perf/util/parse-events.c                |  282 +-
 tools/perf/util/parse-events.h                |   22 +-
 tools/perf/util/parse-events.l                |   54 +-
 tools/perf/util/parse-events.y                |  114 +-
 tools/perf/util/perf_api_probe.c              |   27 +-
 tools/perf/util/pmu.c                         |  309 +-
 tools/perf/util/print-events.c                |  112 -
 tools/perf/util/print-events.h                |    4 -
 29 files changed, 4523 insertions(+), 1849 deletions(-)
 create mode 100644 tools/perf/pmu-events/arch/common/common/legacy-hardware.json
 create mode 100755 tools/perf/pmu-events/make_legacy_cache.py

-- 
2.51.0.534.gc79095c0ca-goog


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2025-10-03 11:26 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-23 22:32 [PATCH v6 00/28] Legacy hardware/cache events as json Ian Rogers
2025-09-23 22:32 ` [PATCH v6 01/28] perf stat: Allow retry for default events Ian Rogers
2025-09-23 22:32 ` [PATCH v6 02/28] perf parse-events: Fix legacy cache events if event is duplicated in a PMU Ian Rogers
2025-09-23 22:32 ` [PATCH v6 03/28] perf perf_api_probe: Avoid scanning all PMUs, try software PMU first Ian Rogers
2025-09-23 22:32 ` [PATCH v6 04/28] perf stat: Avoid wildcarding PMUs for default events Ian Rogers
2025-09-23 22:32 ` [PATCH v6 05/28] perf record: Skip don't fail for events that don't open Ian Rogers
2025-09-23 22:32 ` [PATCH v6 06/28] perf jevents: Support copying the source json files to OUTPUT Ian Rogers
2025-09-23 22:32 ` [PATCH v6 07/28] perf pmu: Don't eagerly parse event terms Ian Rogers
2025-09-23 22:32 ` [PATCH v6 08/28] perf parse-events: Remove unused FILE input argument to scanner Ian Rogers
2025-09-23 22:32 ` [PATCH v6 09/28] perf pmu: Use fd rather than FILE from new_alias Ian Rogers
2025-09-23 22:32 ` [PATCH v6 10/28] perf pmu: Factor term parsing into a perf_event_attr into a helper Ian Rogers
2025-09-23 22:32 ` [PATCH v6 11/28] perf parse-events: Add terms for legacy hardware and cache config values Ian Rogers
2025-09-23 22:32 ` [PATCH v6 12/28] perf jevents: Add legacy json terms and default_core event table helper Ian Rogers
2025-09-23 22:32 ` [PATCH v6 13/28] perf pmu: Add and use legacy_terms in alias information Ian Rogers
2025-09-23 22:32 ` [PATCH v6 14/28] perf jevents: Add legacy-hardware and legacy-cache json Ian Rogers
2025-09-23 22:32 ` [PATCH v6 15/28] perf print-events: Remove print_hwcache_events Ian Rogers
2025-09-23 22:33 ` [PATCH v6 16/28] perf print-events: Remove print_symbol_events Ian Rogers
2025-09-23 22:33 ` [PATCH v6 17/28] perf parse-events: Remove hard coded legacy hardware and cache parsing Ian Rogers
2025-09-23 22:33 ` [PATCH v6 18/28] perf record: Use evlist__new_default when no events specified Ian Rogers
2025-09-23 22:33 ` [PATCH v6 19/28] perf top: " Ian Rogers
2025-09-23 22:33 ` [PATCH v6 20/28] perf evlist: Avoid scanning all PMUs for evlist__new_default Ian Rogers
2025-09-23 22:33 ` [PATCH v6 21/28] perf evsel: Improvements to __evsel__match Ian Rogers
2025-09-23 22:33 ` [PATCH v6 22/28] perf test parse-events: Use evsel__match for legacy events Ian Rogers
2025-09-23 22:33 ` [PATCH v6 23/28] perf test parse-events: Without a PMU use cpu-cycles rather than cycles Ian Rogers
2025-09-23 22:33 ` [PATCH v6 24/28] perf test parse-events: Remove cpu PMU requirement Ian Rogers
2025-09-23 22:33 ` [PATCH v6 25/28] perf test: Switch cycles event to cpu-cycles Ian Rogers
2025-09-23 22:33 ` [PATCH v6 26/28] perf test: Clean up test_..config helpers Ian Rogers
2025-09-23 22:33 ` [PATCH v6 27/28] perf test parse-events: Add evlist test helper Ian Rogers
2025-09-23 22:33 ` [PATCH v6 28/28] perf test parse-events: Add evsel " Ian Rogers
2025-10-01 13:37 ` [PATCH v6 00/28] Legacy hardware/cache events as json James Clark
2025-10-01 15:12   ` Ian Rogers
2025-10-01 20:55     ` Ian Rogers
2025-10-02 14:05       ` James Clark
2025-10-02 15:46         ` Ian Rogers
2025-10-02 17:58           ` Ian Rogers
2025-10-02 20:10             ` Ian Rogers
2025-10-03  9:20               ` James Clark
2025-10-03 11:26                 ` James Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).