linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v5 0/4] Prefer sysfs/JSON events also when no PMU is provided
@ 2025-01-09 22:21 Ian Rogers
  2025-01-09 22:21 ` [PATCH v5 1/4] perf evsel: Add pmu_name helper Ian Rogers
                   ` (5 more replies)
  0 siblings, 6 replies; 53+ messages in thread
From: Ian Rogers @ 2025-01-09 22:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe

At the RISC-V summit the topic of avoiding event data being in the
RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
events being the priority when no PMU is provided so that legacy
events maybe supported via json. Originally Mark Rutland also
expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
M? processors, but James Clark more recently tested this and believes
the driver issues there may not have existed or have been resolved. In
any case, it is inconsistent that with a PMU event names avoid legacy
encodings, but when wildcarding PMUs (ie without a PMU with the event
name) the legacy encodings have priority.

The patch doing this work was reverted in a v6.10 release candidate
as, even though the patch was posted for weeks and had been on
linux-next for weeks without issue, Linus was in the habit of using
explicit legacy events with unsupported precision options on his
Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
where ARM decided to call the events bus_cycles and cycles, the latter
being also a legacy event name. ARM haven't renamed the cycles event
to a more consistent cpu_cycles and avoided the problem. With these
changes the problematic event will now be skipped, a large warning
produced, and perf record will continue for the other PMU events. This
solution was proposed by Arnaldo.

Two minor changes have been added to help with the error message and
to work around issues occurring with "perf stat metrics (shadow stat)
test".

The patches have only been tested on my x86 non-hybrid laptop.

v5: Follow Namhyung's suggestion and ignore the case where command
    line dummy events fail to open alongside other events that all
    fail to open. Note, the Tested-by tags are left on the series as
    v4 and v5 were changing an error case that doesn't occur in
    testing but was manually tested by myself.

v4: Rework the no events opening change from v3 to make it handle
    multiple dummy events. Sadly an evlist isn't empty if it just
    contains dummy events as the dummy event may be used with "perf
    record -e dummy .." as a way to determine whether permission
    issues exist. Other software events like cpu-clock would suffice
    for this, but the using dummy genie has left the bottle.

    Another problem is that we appear to have an excessive number of
    dummy events added, for example, we can likely avoid a dummy event
    and add sideband data to the original event. For auxtrace more
    dummy events may be opened too. Anyway, this has led to the
    approach taken in patch 3 where the number of dummy parsed events
    is computed. If the number of removed/failing-to-open non-dummy
    events matches the number of non-dummy events then we want to
    fail, but only if there are no parsed dummy events or if there was
    one then it must have opened. The math here is hard to read, but
    passes my manual testing.

v3: Make no events opening for perf record a failure as suggested by
    James Clark and Aditya Bodkhe <Aditya.Bodkhe1@ibm.com>. Also,
    rebase.

v2: Rebase and add tested-by tags from James Clark, Leo Yan and Atish
    Patra who have tested on RISC-V and ARM CPUs, including the
    problem case from before.

Ian Rogers (4):
  perf evsel: Add pmu_name helper
  perf stat: Fix find_stat for mixed legacy/non-legacy events
  perf record: Skip don't fail for events that don't open
  perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
    legacy"

 tools/perf/builtin-record.c    | 47 ++++++++++++++++++---
 tools/perf/util/evsel.c        | 10 +++++
 tools/perf/util/evsel.h        |  1 +
 tools/perf/util/parse-events.c | 26 +++++++++---
 tools/perf/util/parse-events.l | 76 +++++++++++++++++-----------------
 tools/perf/util/parse-events.y | 60 ++++++++++++++++++---------
 tools/perf/util/pmus.c         | 20 +++++++--
 tools/perf/util/stat-shadow.c  |  3 +-
 8 files changed, 169 insertions(+), 74 deletions(-)

-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v5 1/4] perf evsel: Add pmu_name helper
  2025-01-09 22:21 [PATCH v5 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
@ 2025-01-09 22:21 ` Ian Rogers
  2025-01-09 22:21 ` [PATCH v5 2/4] perf stat: Fix find_stat for mixed legacy/non-legacy events Ian Rogers
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 53+ messages in thread
From: Ian Rogers @ 2025-01-09 22:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe
  Cc: Leo Yan, Atish Patra

Add helper to get the name of the evsel's PMU. This handles the case
where there's no sysfs PMU via parse_events event_type helper.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
---
 tools/perf/util/evsel.c | 10 ++++++++++
 tools/perf/util/evsel.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index bc144388f892..026cf9a9893c 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -237,6 +237,16 @@ int evsel__object_config(size_t object_size, int (*init)(struct evsel *evsel),
 	return 0;
 }
 
+const char *evsel__pmu_name(const struct evsel *evsel)
+{
+	struct perf_pmu *pmu = evsel__find_pmu(evsel);
+
+	if (pmu)
+		return pmu->name;
+
+	return event_type(evsel->core.attr.type);
+}
+
 #define FD(e, x, y) (*(int *)xyarray__entry(e->core.fd, x, y))
 
 int __evsel__sample_size(u64 sample_type)
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 5e789fa80590..2dd108a14b89 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -236,6 +236,7 @@ int evsel__object_config(size_t object_size,
 			 void (*fini)(struct evsel *evsel));
 
 struct perf_pmu *evsel__find_pmu(const struct evsel *evsel);
+const char *evsel__pmu_name(const struct evsel *evsel);
 bool evsel__is_aux_event(const struct evsel *evsel);
 
 struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx);
-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 2/4] perf stat: Fix find_stat for mixed legacy/non-legacy events
  2025-01-09 22:21 [PATCH v5 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
  2025-01-09 22:21 ` [PATCH v5 1/4] perf evsel: Add pmu_name helper Ian Rogers
@ 2025-01-09 22:21 ` Ian Rogers
  2025-01-09 22:21 ` [PATCH v5 3/4] perf record: Skip don't fail for events that don't open Ian Rogers
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 53+ messages in thread
From: Ian Rogers @ 2025-01-09 22:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe
  Cc: Leo Yan, Atish Patra

Legacy events typically don't have a PMU when added leading to
mismatched legacy/non-legacy cases in find_stat. Use evsel__find_pmu
to make sure the evsel PMU is looked up. Update the evsel__find_pmu
code to look for the PMU using the extended config type or, for legacy
hardware/hw_cache events on non-hybrid systems, just use the core PMU.

Before:
```
$ perf stat -e cycles,cpu/instructions/ -a sleep 1
 Performance counter stats for 'system wide':

       215,309,764      cycles
        44,326,491      cpu/instructions/

       1.002555314 seconds time elapsed
```
After:
```
$ perf stat -e cycles,cpu/instructions/ -a sleep 1

 Performance counter stats for 'system wide':

       990,676,332      cycles
     1,235,762,487      cpu/instructions/                #    1.25  insn per cycle

       1.002667198 seconds time elapsed
```

Fixes: 3612ca8e2935 ("perf stat: Fix the hard-coded metrics
calculation on the hybrid")
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
---
 tools/perf/util/pmus.c        | 20 +++++++++++++++++---
 tools/perf/util/stat-shadow.c |  3 ++-
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index b493da0d22ef..60d81d69503e 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -710,11 +710,25 @@ char *perf_pmus__default_pmu_name(void)
 struct perf_pmu *evsel__find_pmu(const struct evsel *evsel)
 {
 	struct perf_pmu *pmu = evsel->pmu;
+	bool legacy_core_type;
 
-	if (!pmu) {
-		pmu = perf_pmus__find_by_type(evsel->core.attr.type);
-		((struct evsel *)evsel)->pmu = pmu;
+	if (pmu)
+		return pmu;
+
+	pmu = perf_pmus__find_by_type(evsel->core.attr.type);
+	legacy_core_type =
+		evsel->core.attr.type == PERF_TYPE_HARDWARE ||
+		evsel->core.attr.type == PERF_TYPE_HW_CACHE;
+	if (!pmu && legacy_core_type) {
+		if (perf_pmus__supports_extended_type()) {
+			u32 type = evsel->core.attr.config >> PERF_PMU_TYPE_SHIFT;
+
+			pmu = perf_pmus__find_by_type(type);
+		} else {
+			pmu = perf_pmus__find_core_pmu();
+		}
 	}
+	((struct evsel *)evsel)->pmu = pmu;
 	return pmu;
 }
 
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index fa8b2a1048ff..d83bda5824d2 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -151,6 +151,7 @@ static double find_stat(const struct evsel *evsel, int aggr_idx, enum stat_type
 {
 	struct evsel *cur;
 	int evsel_ctx = evsel_context(evsel);
+	struct perf_pmu *evsel_pmu = evsel__find_pmu(evsel);
 
 	evlist__for_each_entry(evsel->evlist, cur) {
 		struct perf_stat_aggr *aggr;
@@ -177,7 +178,7 @@ static double find_stat(const struct evsel *evsel, int aggr_idx, enum stat_type
 		 * Except the SW CLOCK events,
 		 * ignore if not the PMU we're looking for.
 		 */
-		if ((type != STAT_NSECS) && (evsel->pmu != cur->pmu))
+		if ((type != STAT_NSECS) && (evsel_pmu != evsel__find_pmu(cur)))
 			continue;
 
 		aggr = &cur->stats->aggr[aggr_idx];
-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-09 22:21 [PATCH v5 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
  2025-01-09 22:21 ` [PATCH v5 1/4] perf evsel: Add pmu_name helper Ian Rogers
  2025-01-09 22:21 ` [PATCH v5 2/4] perf stat: Fix find_stat for mixed legacy/non-legacy events Ian Rogers
@ 2025-01-09 22:21 ` Ian Rogers
  2025-01-10  1:25   ` Namhyung Kim
  2025-01-09 22:21 ` [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy" Ian Rogers
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-01-09 22:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe
  Cc: Leo Yan, Atish Patra

Whilst for many tools it is an expected behavior that failure to open
a perf event is a failure, ARM decided to name PMU events the same as
legacy events and then failed to rename such events on a server uncore
SLC PMU. As perf's default behavior when no PMU is specified is to
open the event on all PMUs that advertise/"have" the event, this
yielded failures when trying to make the priority of legacy and
sysfs/json events uniform - something requested by RISC-V and ARM. A
legacy event user on ARM hardware may find their event opened on an
uncore PMU which for perf record will fail. Arnaldo suggested skipping
such events which this patch implements. Rather than have the skipping
conditional on running on ARM, the skipping is done on all
architectures as such a fundamental behavioral difference could lead
to problems with tools built/depending on perf.

An example of perf record failing to open events on x86 is:
```
$ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
"dmesg | grep -i perf" may provide additional information.

Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
"dmesg | grep -i perf" may provide additional information.

Error:
Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
The LLC-prefetch-read event is not supported.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]

$ perf report --stats
Aggregated stats:
               TOTAL events:      17255
                MMAP events:        284  ( 1.6%)
                COMM events:       1961  (11.4%)
                EXIT events:          1  ( 0.0%)
                FORK events:       1960  (11.4%)
              SAMPLE events:         87  ( 0.5%)
               MMAP2 events:      12836  (74.4%)
             KSYMBOL events:         83  ( 0.5%)
           BPF_EVENT events:         36  ( 0.2%)
      FINISHED_ROUND events:          2  ( 0.0%)
            ID_INDEX events:          1  ( 0.0%)
          THREAD_MAP events:          1  ( 0.0%)
             CPU_MAP events:          1  ( 0.0%)
           TIME_CONV events:          1  ( 0.0%)
       FINISHED_INIT events:          1  ( 0.0%)
cycles stats:
              SAMPLE events:         87
```

If all events fail to open then the perf record will fail:
```
$ perf record -e LLC-prefetch-read true
Error:
Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
The LLC-prefetch-read event is not supported.
Error:
Failure to open any events for recording
```

As an evlist may have dummy events that open when all command line
events fail we ignore dummy events when detecting if at least some
events open. This still permits the dummy event on its own to be used
as a permission check:
```
$ perf record -e dummy true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.046 MB perf.data ]
```
but allows failure when a dummy event is implicilty inserted or when
there are insufficient permissions to open it:
```
$ perf record -e LLC-prefetch-read -a true
Error:
Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
The LLC-prefetch-read event is not supported.
Error:
Failure to open any events for recording
```

The issue with legacy events is that on RISC-V they want the driver to
not have mappings from legacy to non-legacy config encodings for each
vendor/model due to size, complexity and difficulty to update. It was
reported that on ARM Apple-M? CPUs the legacy mapping in the driver
was broken and the sysfs/json events should always take precedent,
however, it isn't clear this is still the case. It is the case that
without working around this issue a legacy event like cycles without a
PMU can encode differently than when specified with a PMU - the
non-PMU version favoring legacy encodings, the PMU one avoiding legacy
encodings.

The patch removes events and then adjusts the idx value for each
evsel. This is done so that the dense xyarrays used for file
descriptors, etc. don't contain broken entries. As event opening
happens relatively late in the record process, use of the idx value
before the open will have become corrupted, so it is expected there
are latent bugs hidden behind this change - the change is best
effort. As the only vendor that has broken event names is ARM, this
will principally effect ARM users. They will also experience warning
messages like those above because of the uncore PMU advertising legacy
event names.

Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: James Clark <james.clark@linaro.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
---
 tools/perf/builtin-record.c | 47 ++++++++++++++++++++++++++++++++-----
 1 file changed, 41 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 5db1aedf48df..c0b8249a3787 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -961,7 +961,6 @@ static int record__config_tracking_events(struct record *rec)
 	 */
 	if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
 	    perf_pmus__num_core_pmus() > 1) {
-
 		/*
 		 * User space tasks can migrate between CPUs, so when tracing
 		 * selected CPUs, sideband for all CPUs is still needed.
@@ -1366,6 +1365,7 @@ static int record__open(struct record *rec)
 	struct perf_session *session = rec->session;
 	struct record_opts *opts = &rec->opts;
 	int rc = 0;
+	bool skipped = false;
 
 	evlist__for_each_entry(evlist, pos) {
 try_again:
@@ -1381,15 +1381,50 @@ static int record__open(struct record *rec)
 			        pos = evlist__reset_weak_group(evlist, pos, true);
 				goto try_again;
 			}
-			rc = -errno;
 			evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
-			ui__error("%s\n", msg);
-			goto out;
+			ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
+				  evsel__name(pos), evsel__pmu_name(pos), msg);
+			pos->skippable = true;
+			skipped = true;
+		} else {
+			pos->supported = true;
 		}
-
-		pos->supported = true;
 	}
 
+	if (skipped) {
+		struct evsel *tmp;
+		int idx = 0;
+		bool evlist_empty = true;
+
+		/* Remove evsels that failed to open and update indices. */
+		evlist__for_each_entry_safe(evlist, tmp, pos) {
+			if (pos->skippable) {
+				evlist__remove(evlist, pos);
+				continue;
+			}
+
+			/*
+			 * Note, dummy events may be command line parsed or
+			 * added by the tool. We care about supporting `perf
+			 * record -e dummy` which may be used as a permission
+			 * check. Dummy events that are added to the command
+			 * line and opened along with other events that fail,
+			 * will still fail as if the dummy events were tool
+			 * added events for the sake of code simplicity.
+			 */
+			if (!evsel__is_dummy_event(pos))
+				evlist_empty = false;
+		}
+		evlist__for_each_entry(evlist, pos) {
+			pos->core.idx = idx++;
+		}
+		/* If list is empty then fail. */
+		if (evlist_empty) {
+			ui__error("Failure to open any events for recording.\n");
+			rc = -1;
+			goto out;
+		}
+	}
 	if (symbol_conf.kptr_restrict && !evlist__exclude_kernel(evlist)) {
 		pr_warning(
 "WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,\n"
-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-09 22:21 [PATCH v5 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
                   ` (2 preceding siblings ...)
  2025-01-09 22:21 ` [PATCH v5 3/4] perf record: Skip don't fail for events that don't open Ian Rogers
@ 2025-01-09 22:21 ` Ian Rogers
  2025-01-10 19:40   ` Namhyung Kim
  2025-01-29 22:05 ` [PATCH v5 0/4] Prefer sysfs/JSON events also when no PMU is provided Namhyung Kim
  2025-01-30 17:46 ` Namhyung Kim
  5 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-01-09 22:21 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe
  Cc: Atish Patra, Leo Yan, Beeman Strong, Arnaldo Carvalho de Melo

Originally posted and merged from:
https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
This reverts commit 4f1b067359ac8364cdb7f9fda41085fa85789d0f although
the patch is now smaller due to related fixes being applied in commit
22a4db3c3603 ("perf evsel: Add alternate_hw_config and use in
evsel__match").
The original commit message was:

It was requested that RISC-V be able to add events to the perf tool so
the PMU driver didn't need to map legacy events to config encodings:
https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/

This change makes the priority of events specified without a PMU the
same as those specified with a PMU, namely sysfs and JSON events are
checked first before using the legacy encoding.

The hw_term is made more generic as a hardware_event that encodes a
pair of string and int value, allowing parse_events_multi_pmu_add to
fall back on a known encoding when the sysfs/JSON adding fails for
core events. As this covers PE_VALUE_SYM_HW, that token is removed and
related code simplified.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Tested-by: James Clark <james.clark@linaro.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/parse-events.c | 26 +++++++++---
 tools/perf/util/parse-events.l | 76 +++++++++++++++++-----------------
 tools/perf/util/parse-events.y | 60 ++++++++++++++++++---------
 3 files changed, 98 insertions(+), 64 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 1e23faa364b1..3a60fca53cfa 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1545,8 +1545,8 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
 	struct list_head *list = NULL;
 	struct perf_pmu *pmu = NULL;
 	YYLTYPE *loc = loc_;
-	int ok = 0;
-	const char *config;
+	int ok = 0, core_ok = 0;
+	const char *tmp;
 	struct parse_events_terms parsed_terms;
 
 	*listp = NULL;
@@ -1559,15 +1559,15 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
 			return ret;
 	}
 
-	config = strdup(event_name);
-	if (!config)
+	tmp = strdup(event_name);
+	if (!tmp)
 		goto out_err;
 
 	if (parse_events_term__num(&term,
 				   PARSE_EVENTS__TERM_TYPE_USER,
-				   config, /*num=*/1, /*novalue=*/true,
+				   tmp, /*num=*/1, /*novalue=*/true,
 				   loc, /*loc_val=*/NULL) < 0) {
-		zfree(&config);
+		zfree(&tmp);
 		goto out_err;
 	}
 	list_add_tail(&term->list, &parsed_terms.terms);
@@ -1598,6 +1598,8 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
 			pr_debug("%s -> %s/%s/\n", event_name, pmu->name, sb.buf);
 			strbuf_release(&sb);
 			ok++;
+			if (pmu->is_core)
+				core_ok++;
 		}
 	}
 
@@ -1614,6 +1616,18 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
 		}
 	}
 
+	if (hw_config != PERF_COUNT_HW_MAX && !core_ok) {
+		/*
+		 * The event wasn't found on core PMUs but it has a hardware
+		 * config version to try.
+		 */
+		if (!parse_events_add_numeric(parse_state, list,
+						PERF_TYPE_HARDWARE, hw_config,
+						const_parsed_terms,
+						/*wildcard=*/true))
+			ok++;
+	}
+
 out_err:
 	parse_events_terms__exit(&parsed_terms);
 	if (ok)
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index bf7f73548605..29082a22ccc9 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -113,12 +113,12 @@ do {								\
 	yyless(0);						\
 } while (0)
 
-static int sym(yyscan_t scanner, int type, int config)
+static int sym(yyscan_t scanner, int config)
 {
 	YYSTYPE *yylval = parse_events_get_lval(scanner);
 
-	yylval->num = (type << 16) + config;
-	return type == PERF_TYPE_HARDWARE ? PE_VALUE_SYM_HW : PE_VALUE_SYM_SW;
+	yylval->num = config;
+	return PE_VALUE_SYM_SW;
 }
 
 static int term(yyscan_t scanner, enum parse_events__term_type type)
@@ -129,13 +129,13 @@ static int term(yyscan_t scanner, enum parse_events__term_type type)
 	return PE_TERM;
 }
 
-static int hw_term(yyscan_t scanner, int config)
+static int hw(yyscan_t scanner, int config)
 {
 	YYSTYPE *yylval = parse_events_get_lval(scanner);
 	char *text = parse_events_get_text(scanner);
 
-	yylval->hardware_term.str = strdup(text);
-	yylval->hardware_term.num = PERF_TYPE_HARDWARE + config;
+	yylval->hardware_event.str = strdup(text);
+	yylval->hardware_event.num = config;
 	return PE_TERM_HW;
 }
 
@@ -324,16 +324,16 @@ aux-output		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); }
 aux-action		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_ACTION); }
 aux-sample-size		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
 metric-id		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
-cpu-cycles|cycles				{ return hw_term(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
-stalled-cycles-frontend|idle-cycles-frontend	{ return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
-stalled-cycles-backend|idle-cycles-backend	{ return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
-instructions					{ return hw_term(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
-cache-references				{ return hw_term(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
-cache-misses					{ return hw_term(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
-branch-instructions|branches			{ return hw_term(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
-branch-misses					{ return hw_term(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
-bus-cycles					{ return hw_term(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
-ref-cycles					{ return hw_term(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
+cpu-cycles|cycles				{ return hw(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
+stalled-cycles-frontend|idle-cycles-frontend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
+stalled-cycles-backend|idle-cycles-backend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
+instructions					{ return hw(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
+cache-references				{ return hw(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
+cache-misses					{ return hw(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
+branch-instructions|branches			{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
+branch-misses					{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
+bus-cycles					{ return hw(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
+ref-cycles					{ return hw(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
 r{num_raw_hex}		{ return str(yyscanner, PE_RAW); }
 r0x{num_raw_hex}	{ return str(yyscanner, PE_RAW); }
 ,			{ return ','; }
@@ -377,28 +377,28 @@ r0x{num_raw_hex}	{ return str(yyscanner, PE_RAW); }
 <<EOF>>			{ BEGIN(INITIAL); }
 }
 
-cpu-cycles|cycles				{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CPU_CYCLES); }
-stalled-cycles-frontend|idle-cycles-frontend	{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
-stalled-cycles-backend|idle-cycles-backend	{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
-instructions					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS); }
-cache-references				{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_REFERENCES); }
-cache-misses					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_MISSES); }
-branch-instructions|branches			{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
-branch-misses					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_MISSES); }
-bus-cycles					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BUS_CYCLES); }
-ref-cycles					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_REF_CPU_CYCLES); }
-cpu-clock					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_CLOCK); }
-task-clock					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_TASK_CLOCK); }
-page-faults|faults				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS); }
-minor-faults					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MIN); }
-major-faults					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MAJ); }
-context-switches|cs				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CONTEXT_SWITCHES); }
-cpu-migrations|migrations			{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_MIGRATIONS); }
-alignment-faults				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
-emulation-faults				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_EMULATION_FAULTS); }
-dummy						{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_DUMMY); }
-bpf-output					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_BPF_OUTPUT); }
-cgroup-switches					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CGROUP_SWITCHES); }
+cpu-cycles|cycles				{ return hw(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
+stalled-cycles-frontend|idle-cycles-frontend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
+stalled-cycles-backend|idle-cycles-backend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
+instructions					{ return hw(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
+cache-references				{ return hw(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
+cache-misses					{ return hw(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
+branch-instructions|branches			{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
+branch-misses					{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
+bus-cycles					{ return hw(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
+ref-cycles					{ return hw(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
+cpu-clock					{ return sym(yyscanner, PERF_COUNT_SW_CPU_CLOCK); }
+task-clock					{ return sym(yyscanner, PERF_COUNT_SW_TASK_CLOCK); }
+page-faults|faults				{ return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS); }
+minor-faults					{ return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MIN); }
+major-faults					{ return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MAJ); }
+context-switches|cs				{ return sym(yyscanner, PERF_COUNT_SW_CONTEXT_SWITCHES); }
+cpu-migrations|migrations			{ return sym(yyscanner, PERF_COUNT_SW_CPU_MIGRATIONS); }
+alignment-faults				{ return sym(yyscanner, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
+emulation-faults				{ return sym(yyscanner, PERF_COUNT_SW_EMULATION_FAULTS); }
+dummy						{ return sym(yyscanner, PERF_COUNT_SW_DUMMY); }
+bpf-output					{ return sym(yyscanner, PERF_COUNT_SW_BPF_OUTPUT); }
+cgroup-switches					{ return sym(yyscanner, PERF_COUNT_SW_CGROUP_SWITCHES); }
 
 {lc_type}			{ return str(yyscanner, PE_LEGACY_CACHE); }
 {lc_type}-{lc_op_result}	{ return str(yyscanner, PE_LEGACY_CACHE); }
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index f888cbb076d6..d2ef1890007e 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -55,7 +55,7 @@ static void free_list_evsel(struct list_head* list_evsel)
 %}
 
 %token PE_START_EVENTS PE_START_TERMS
-%token PE_VALUE PE_VALUE_SYM_HW PE_VALUE_SYM_SW PE_TERM
+%token PE_VALUE PE_VALUE_SYM_SW PE_TERM
 %token PE_EVENT_NAME
 %token PE_RAW PE_NAME
 %token PE_MODIFIER_EVENT PE_MODIFIER_BP PE_BP_COLON PE_BP_SLASH
@@ -65,11 +65,9 @@ static void free_list_evsel(struct list_head* list_evsel)
 %token PE_DRV_CFG_TERM
 %token PE_TERM_HW
 %type <num> PE_VALUE
-%type <num> PE_VALUE_SYM_HW
 %type <num> PE_VALUE_SYM_SW
 %type <mod> PE_MODIFIER_EVENT
 %type <term_type> PE_TERM
-%type <num> value_sym
 %type <str> PE_RAW
 %type <str> PE_NAME
 %type <str> PE_LEGACY_CACHE
@@ -85,6 +83,7 @@ static void free_list_evsel(struct list_head* list_evsel)
 %type <list_terms> opt_pmu_config
 %destructor { parse_events_terms__delete ($$); } <list_terms>
 %type <list_evsel> event_pmu
+%type <list_evsel> event_legacy_hardware
 %type <list_evsel> event_legacy_symbol
 %type <list_evsel> event_legacy_cache
 %type <list_evsel> event_legacy_mem
@@ -102,8 +101,8 @@ static void free_list_evsel(struct list_head* list_evsel)
 %destructor { free_list_evsel ($$); } <list_evsel>
 %type <tracepoint_name> tracepoint_name
 %destructor { free ($$.sys); free ($$.event); } <tracepoint_name>
-%type <hardware_term> PE_TERM_HW
-%destructor { free ($$.str); } <hardware_term>
+%type <hardware_event> PE_TERM_HW
+%destructor { free ($$.str); } <hardware_event>
 
 %union
 {
@@ -118,10 +117,10 @@ static void free_list_evsel(struct list_head* list_evsel)
 		char *sys;
 		char *event;
 	} tracepoint_name;
-	struct hardware_term {
+	struct hardware_event {
 		char *str;
 		u64 num;
-	} hardware_term;
+	} hardware_event;
 }
 %%
 
@@ -264,6 +263,7 @@ PE_EVENT_NAME event_def
 event_def
 
 event_def: event_pmu |
+	   event_legacy_hardware |
 	   event_legacy_symbol |
 	   event_legacy_cache sep_dc |
 	   event_legacy_mem sep_dc |
@@ -306,24 +306,45 @@ PE_NAME sep_dc
 	$$ = list;
 }
 
-value_sym:
-PE_VALUE_SYM_HW
+event_legacy_hardware:
+PE_TERM_HW opt_pmu_config
+{
+	/* List of created evsels. */
+	struct list_head *list = NULL;
+	int err = parse_events_multi_pmu_add(_parse_state, $1.str, $1.num, $2, &list, &@1);
+
+	free($1.str);
+	parse_events_terms__delete($2);
+	if (err)
+		PE_ABORT(err);
+
+	$$ = list;
+}
 |
-PE_VALUE_SYM_SW
+PE_TERM_HW sep_dc
+{
+	struct list_head *list;
+	int err;
+
+	err = parse_events_multi_pmu_add(_parse_state, $1.str, $1.num, NULL, &list, &@1);
+	free($1.str);
+	if (err)
+		PE_ABORT(err);
+	$$ = list;
+}
 
 event_legacy_symbol:
-value_sym '/' event_config '/'
+PE_VALUE_SYM_SW '/' event_config '/'
 {
 	struct list_head *list;
-	int type = $1 >> 16;
-	int config = $1 & 255;
 	int err;
-	bool wildcard = (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE);
 
 	list = alloc_list();
 	if (!list)
 		YYNOMEM;
-	err = parse_events_add_numeric(_parse_state, list, type, config, $3, wildcard);
+	err = parse_events_add_numeric(_parse_state, list,
+				/*type=*/PERF_TYPE_SOFTWARE, /*config=*/$1,
+				$3, /*wildcard=*/false);
 	parse_events_terms__delete($3);
 	if (err) {
 		free_list_evsel(list);
@@ -332,18 +353,17 @@ value_sym '/' event_config '/'
 	$$ = list;
 }
 |
-value_sym sep_slash_slash_dc
+PE_VALUE_SYM_SW sep_slash_slash_dc
 {
 	struct list_head *list;
-	int type = $1 >> 16;
-	int config = $1 & 255;
-	bool wildcard = (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE);
 	int err;
 
 	list = alloc_list();
 	if (!list)
 		YYNOMEM;
-	err = parse_events_add_numeric(_parse_state, list, type, config, /*head_config=*/NULL, wildcard);
+	err = parse_events_add_numeric(_parse_state, list,
+				/*type=*/PERF_TYPE_SOFTWARE, /*config=*/$1,
+				/*head_config=*/NULL, /*wildcard=*/false);
 	if (err)
 		PE_ABORT(err);
 	$$ = list;
-- 
2.47.1.613.gc27f4b7a9f-goog


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-09 22:21 ` [PATCH v5 3/4] perf record: Skip don't fail for events that don't open Ian Rogers
@ 2025-01-10  1:25   ` Namhyung Kim
  2025-01-10  4:44     ` Ian Rogers
  2025-01-10 14:18     ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 53+ messages in thread
From: Namhyung Kim @ 2025-01-10  1:25 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Leo Yan, Atish Patra

On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> Whilst for many tools it is an expected behavior that failure to open
> a perf event is a failure, ARM decided to name PMU events the same as
> legacy events and then failed to rename such events on a server uncore
> SLC PMU. As perf's default behavior when no PMU is specified is to
> open the event on all PMUs that advertise/"have" the event, this
> yielded failures when trying to make the priority of legacy and
> sysfs/json events uniform - something requested by RISC-V and ARM. A
> legacy event user on ARM hardware may find their event opened on an
> uncore PMU which for perf record will fail. Arnaldo suggested skipping
> such events which this patch implements. Rather than have the skipping
> conditional on running on ARM, the skipping is done on all
> architectures as such a fundamental behavioral difference could lead
> to problems with tools built/depending on perf.
> 
> An example of perf record failing to open events on x86 is:
> ```
> $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> Error:
> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> "dmesg | grep -i perf" may provide additional information.
> 
> Error:
> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> "dmesg | grep -i perf" may provide additional information.
> 
> Error:
> Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> The LLC-prefetch-read event is not supported.
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]

I'm afraid this can be too noisy.

> 
> $ perf report --stats
> Aggregated stats:
>                TOTAL events:      17255
>                 MMAP events:        284  ( 1.6%)
>                 COMM events:       1961  (11.4%)
>                 EXIT events:          1  ( 0.0%)
>                 FORK events:       1960  (11.4%)
>               SAMPLE events:         87  ( 0.5%)
>                MMAP2 events:      12836  (74.4%)
>              KSYMBOL events:         83  ( 0.5%)
>            BPF_EVENT events:         36  ( 0.2%)
>       FINISHED_ROUND events:          2  ( 0.0%)
>             ID_INDEX events:          1  ( 0.0%)
>           THREAD_MAP events:          1  ( 0.0%)
>              CPU_MAP events:          1  ( 0.0%)
>            TIME_CONV events:          1  ( 0.0%)
>        FINISHED_INIT events:          1  ( 0.0%)
> cycles stats:
>               SAMPLE events:         87
> ```
> 
> If all events fail to open then the perf record will fail:
> ```
> $ perf record -e LLC-prefetch-read true
> Error:
> Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> The LLC-prefetch-read event is not supported.
> Error:
> Failure to open any events for recording
> ```
> 
> As an evlist may have dummy events that open when all command line
> events fail we ignore dummy events when detecting if at least some
> events open. This still permits the dummy event on its own to be used
> as a permission check:
> ```
> $ perf record -e dummy true
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.046 MB perf.data ]
> ```
> but allows failure when a dummy event is implicilty inserted or when
> there are insufficient permissions to open it:
> ```
> $ perf record -e LLC-prefetch-read -a true
> Error:
> Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> The LLC-prefetch-read event is not supported.
> Error:
> Failure to open any events for recording
> ```
> 
> The issue with legacy events is that on RISC-V they want the driver to
> not have mappings from legacy to non-legacy config encodings for each
> vendor/model due to size, complexity and difficulty to update. It was
> reported that on ARM Apple-M? CPUs the legacy mapping in the driver
> was broken and the sysfs/json events should always take precedent,
> however, it isn't clear this is still the case. It is the case that
> without working around this issue a legacy event like cycles without a
> PMU can encode differently than when specified with a PMU - the
> non-PMU version favoring legacy encodings, the PMU one avoiding legacy
> encodings.
> 
> The patch removes events and then adjusts the idx value for each
> evsel. This is done so that the dense xyarrays used for file
> descriptors, etc. don't contain broken entries. As event opening
> happens relatively late in the record process, use of the idx value
> before the open will have become corrupted, so it is expected there
> are latent bugs hidden behind this change - the change is best
> effort. As the only vendor that has broken event names is ARM, this
> will principally effect ARM users. They will also experience warning
> messages like those above because of the uncore PMU advertising legacy
> event names.
> 
> Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> Signed-off-by: Ian Rogers <irogers@google.com>
> Tested-by: James Clark <james.clark@linaro.org>
> Tested-by: Leo Yan <leo.yan@arm.com>
> Tested-by: Atish Patra <atishp@rivosinc.com>
> ---
>  tools/perf/builtin-record.c | 47 ++++++++++++++++++++++++++++++++-----
>  1 file changed, 41 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index 5db1aedf48df..c0b8249a3787 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -961,7 +961,6 @@ static int record__config_tracking_events(struct record *rec)
>  	 */
>  	if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
>  	    perf_pmus__num_core_pmus() > 1) {
> -
>  		/*
>  		 * User space tasks can migrate between CPUs, so when tracing
>  		 * selected CPUs, sideband for all CPUs is still needed.
> @@ -1366,6 +1365,7 @@ static int record__open(struct record *rec)
>  	struct perf_session *session = rec->session;
>  	struct record_opts *opts = &rec->opts;
>  	int rc = 0;
> +	bool skipped = false;
>  
>  	evlist__for_each_entry(evlist, pos) {
>  try_again:
> @@ -1381,15 +1381,50 @@ static int record__open(struct record *rec)
>  			        pos = evlist__reset_weak_group(evlist, pos, true);
>  				goto try_again;
>  			}
> -			rc = -errno;
>  			evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
> -			ui__error("%s\n", msg);
> -			goto out;
> +			ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
> +				  evsel__name(pos), evsel__pmu_name(pos), msg);

How about changing it to pr_debug() and add below ...


> +			pos->skippable = true;
> +			skipped = true;
> +		} else {
> +			pos->supported = true;
>  		}
> -
> -		pos->supported = true;
>  	}
>  
> +	if (skipped) {
> +		struct evsel *tmp;
> +		int idx = 0;
> +		bool evlist_empty = true;
> +
> +		/* Remove evsels that failed to open and update indices. */
> +		evlist__for_each_entry_safe(evlist, tmp, pos) {
> +			if (pos->skippable) {
> +				evlist__remove(evlist, pos);
> +				continue;
> +			}
> +
> +			/*
> +			 * Note, dummy events may be command line parsed or
> +			 * added by the tool. We care about supporting `perf
> +			 * record -e dummy` which may be used as a permission
> +			 * check. Dummy events that are added to the command
> +			 * line and opened along with other events that fail,
> +			 * will still fail as if the dummy events were tool
> +			 * added events for the sake of code simplicity.
> +			 */
> +			if (!evsel__is_dummy_event(pos))
> +				evlist_empty = false;
> +		}
> +		evlist__for_each_entry(evlist, pos) {
> +			pos->core.idx = idx++;
> +		}
> +		/* If list is empty then fail. */
> +		if (evlist_empty) {
> +			ui__error("Failure to open any events for recording.\n");
> +			rc = -1;
> +			goto out;
> +		}

... ?

		if (!verbose)
			ui__warning("Removed some unsupported events, use -v for details.\n");

Thanks,
Namhyung


> +	}
>  	if (symbol_conf.kptr_restrict && !evlist__exclude_kernel(evlist)) {
>  		pr_warning(
>  "WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,\n"
> -- 
> 2.47.1.613.gc27f4b7a9f-goog
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-10  1:25   ` Namhyung Kim
@ 2025-01-10  4:44     ` Ian Rogers
  2025-01-10 18:55       ` Namhyung Kim
  2025-01-10 14:18     ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-01-10  4:44 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Leo Yan, Atish Patra

On Thu, Jan 9, 2025 at 5:25 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > Whilst for many tools it is an expected behavior that failure to open
> > a perf event is a failure, ARM decided to name PMU events the same as
> > legacy events and then failed to rename such events on a server uncore
> > SLC PMU. As perf's default behavior when no PMU is specified is to
> > open the event on all PMUs that advertise/"have" the event, this
> > yielded failures when trying to make the priority of legacy and
> > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > legacy event user on ARM hardware may find their event opened on an
> > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > such events which this patch implements. Rather than have the skipping
> > conditional on running on ARM, the skipping is done on all
> > architectures as such a fundamental behavioral difference could lead
> > to problems with tools built/depending on perf.
> >
> > An example of perf record failing to open events on x86 is:
> > ```
> > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > Error:
> > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > "dmesg | grep -i perf" may provide additional information.
> >
> > Error:
> > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > "dmesg | grep -i perf" may provide additional information.
> >
> > Error:
> > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > The LLC-prefetch-read event is not supported.
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
>
> I'm afraid this can be too noisy.

The intention is to be noisy:
1) it matches the existing behavior, anything else is potentially a regression;
2) it only happens if trying to record on a PMU/event that doesn't
support recording, something that is currently an error and so we're
not motivated to change the behavior as no-one should be using it;
3) for the wildcard case the only offender is ARM's SLC PMU and the
appropriate fix there has always been to make the CPU cycle's event
name match the bus_cycles event name by calling it cpu_cycles -
something that doesn't conflict with a core PMU event name, the thing
that has introduced all these problems, patches, long email exchanges,
unfixed inconsistencies, etc.. If the errors aren't noisy then there
is little motivation for the ARM SLC PMU's event name to be fixed.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-10  1:25   ` Namhyung Kim
  2025-01-10  4:44     ` Ian Rogers
@ 2025-01-10 14:18     ` Arnaldo Carvalho de Melo
  2025-01-10 16:42       ` Ian Rogers
  1 sibling, 1 reply; 53+ messages in thread
From: Arnaldo Carvalho de Melo @ 2025-01-10 14:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Linus Torvalds, Ian Rogers, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Leo Yan, Atish Patra

Adding Linus to the CC list as he participated in this discussion in the
past, so a heads up about changes in this area that are being further
discussed.

On Thu, Jan 09, 2025 at 05:25:03PM -0800, Namhyung Kim wrote:
> On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > Whilst for many tools it is an expected behavior that failure to open
> > a perf event is a failure, ARM decided to name PMU events the same as
> > legacy events and then failed to rename such events on a server uncore
> > SLC PMU. As perf's default behavior when no PMU is specified is to
> > open the event on all PMUs that advertise/"have" the event, this
> > yielded failures when trying to make the priority of legacy and
> > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > legacy event user on ARM hardware may find their event opened on an
> > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > such events which this patch implements. Rather than have the skipping
> > conditional on running on ARM, the skipping is done on all
> > architectures as such a fundamental behavioral difference could lead
> > to problems with tools built/depending on perf.
> > 
> > An example of perf record failing to open events on x86 is:
> > ```
> > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > Error:
> > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > "dmesg | grep -i perf" may provide additional information.
> > 
> > Error:
> > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > "dmesg | grep -i perf" may provide additional information.
> > 
> > Error:
> > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > The LLC-prefetch-read event is not supported.
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> 
> I'm afraid this can be too noisy.

Agreed.
 
> > $ perf report --stats
> > Aggregated stats:
> >                TOTAL events:      17255
> >                 MMAP events:        284  ( 1.6%)
> >                 COMM events:       1961  (11.4%)
> >                 EXIT events:          1  ( 0.0%)
> >                 FORK events:       1960  (11.4%)
> >               SAMPLE events:         87  ( 0.5%)
> >                MMAP2 events:      12836  (74.4%)
> >              KSYMBOL events:         83  ( 0.5%)
> >            BPF_EVENT events:         36  ( 0.2%)
> >       FINISHED_ROUND events:          2  ( 0.0%)
> >             ID_INDEX events:          1  ( 0.0%)
> >           THREAD_MAP events:          1  ( 0.0%)
> >              CPU_MAP events:          1  ( 0.0%)
> >            TIME_CONV events:          1  ( 0.0%)
> >        FINISHED_INIT events:          1  ( 0.0%)
> > cycles stats:
> >               SAMPLE events:         87
> > ```
> > 
> > If all events fail to open then the perf record will fail:
> > ```
> > $ perf record -e LLC-prefetch-read true
> > Error:
> > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > The LLC-prefetch-read event is not supported.
> > Error:
> > Failure to open any events for recording
> > ```
> > 
> > As an evlist may have dummy events that open when all command line
> > events fail we ignore dummy events when detecting if at least some
> > events open. This still permits the dummy event on its own to be used
> > as a permission check:
> > ```
> > $ perf record -e dummy true
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.046 MB perf.data ]
> > ```
> > but allows failure when a dummy event is implicilty inserted or when
> > there are insufficient permissions to open it:
> > ```
> > $ perf record -e LLC-prefetch-read -a true
> > Error:
> > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > The LLC-prefetch-read event is not supported.
> > Error:
> > Failure to open any events for recording
> > ```
> > 
> > The issue with legacy events is that on RISC-V they want the driver to
> > not have mappings from legacy to non-legacy config encodings for each
> > vendor/model due to size, complexity and difficulty to update. It was
> > reported that on ARM Apple-M? CPUs the legacy mapping in the driver
> > was broken and the sysfs/json events should always take precedent,
> > however, it isn't clear this is still the case. It is the case that
> > without working around this issue a legacy event like cycles without a
> > PMU can encode differently than when specified with a PMU - the
> > non-PMU version favoring legacy encodings, the PMU one avoiding legacy
> > encodings.
> > 
> > The patch removes events and then adjusts the idx value for each
> > evsel. This is done so that the dense xyarrays used for file
> > descriptors, etc. don't contain broken entries. As event opening
> > happens relatively late in the record process, use of the idx value
> > before the open will have become corrupted, so it is expected there
> > are latent bugs hidden behind this change - the change is best
> > effort. As the only vendor that has broken event names is ARM, this
> > will principally effect ARM users. They will also experience warning
> > messages like those above because of the uncore PMU advertising legacy
> > event names.
> > 
> > Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > Tested-by: James Clark <james.clark@linaro.org>
> > Tested-by: Leo Yan <leo.yan@arm.com>
> > Tested-by: Atish Patra <atishp@rivosinc.com>
> > ---
> >  tools/perf/builtin-record.c | 47 ++++++++++++++++++++++++++++++++-----
> >  1 file changed, 41 insertions(+), 6 deletions(-)
> > 
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index 5db1aedf48df..c0b8249a3787 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -961,7 +961,6 @@ static int record__config_tracking_events(struct record *rec)
> >  	 */
> >  	if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
> >  	    perf_pmus__num_core_pmus() > 1) {
> > -
> >  		/*
> >  		 * User space tasks can migrate between CPUs, so when tracing
> >  		 * selected CPUs, sideband for all CPUs is still needed.
> > @@ -1366,6 +1365,7 @@ static int record__open(struct record *rec)
> >  	struct perf_session *session = rec->session;
> >  	struct record_opts *opts = &rec->opts;
> >  	int rc = 0;
> > +	bool skipped = false;
> >  
> >  	evlist__for_each_entry(evlist, pos) {
> >  try_again:
> > @@ -1381,15 +1381,50 @@ static int record__open(struct record *rec)
> >  			        pos = evlist__reset_weak_group(evlist, pos, true);
> >  				goto try_again;
> >  			}
> > -			rc = -errno;
> >  			evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
> > -			ui__error("%s\n", msg);
> > -			goto out;
> > +			ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
> > +				  evsel__name(pos), evsel__pmu_name(pos), msg);
 
> How about changing it to pr_debug() and add below ...

That sounds better.
 
> > +			pos->skippable = true;
> > +			skipped = true;
> > +		} else {
> > +			pos->supported = true;
> >  		}
> > -
> > -		pos->supported = true;
> >  	}
> >  
> > +	if (skipped) {
> > +		struct evsel *tmp;
> > +		int idx = 0;
> > +		bool evlist_empty = true;
> > +
> > +		/* Remove evsels that failed to open and update indices. */
> > +		evlist__for_each_entry_safe(evlist, tmp, pos) {
> > +			if (pos->skippable) {
> > +				evlist__remove(evlist, pos);
> > +				continue;
> > +			}
> > +
> > +			/*
> > +			 * Note, dummy events may be command line parsed or
> > +			 * added by the tool. We care about supporting `perf
> > +			 * record -e dummy` which may be used as a permission
> > +			 * check. Dummy events that are added to the command
> > +			 * line and opened along with other events that fail,
> > +			 * will still fail as if the dummy events were tool
> > +			 * added events for the sake of code simplicity.
> > +			 */
> > +			if (!evsel__is_dummy_event(pos))
> > +				evlist_empty = false;
> > +		}
> > +		evlist__for_each_entry(evlist, pos) {
> > +			pos->core.idx = idx++;
> > +		}
> > +		/* If list is empty then fail. */
> > +		if (evlist_empty) {
> > +			ui__error("Failure to open any events for recording.\n");
> > +			rc = -1;
> > +			goto out;
> > +		}

> ... ?

> 		if (!verbose)
> 			ui__warning("Removed some unsupported events, use -v for details.\n");

And even this one would be best left for cases where we can determine
that its a new situation, i.e. one that should work and not the ones we
know that will not work already and thus so far didn't alarm the user
into thinking something is wrong.

Having the ones we know will fail as pr_debug() seems enough, I'd say.

- Arnaldo
 
> Thanks,
> Namhyung
> 
> 
> > +	}
> >  	if (symbol_conf.kptr_restrict && !evlist__exclude_kernel(evlist)) {
> >  		pr_warning(
> >  "WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,\n"
> > -- 
> > 2.47.1.613.gc27f4b7a9f-goog
> > 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-10 14:18     ` Arnaldo Carvalho de Melo
@ 2025-01-10 16:42       ` Ian Rogers
  2025-01-10 19:26         ` Namhyung Kim
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-01-10 16:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Namhyung Kim, Linus Torvalds, Peter Zijlstra, Ingo Molnar,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Leo Yan, Atish Patra

On Fri, Jan 10, 2025 at 6:18 AM Arnaldo Carvalho de Melo
<acme@kernel.org> wrote:
>
> Adding Linus to the CC list as he participated in this discussion in the
> past, so a heads up about changes in this area that are being further
> discussed.

Linus blocks my email so I'm not sure of the point.

> On Thu, Jan 09, 2025 at 05:25:03PM -0800, Namhyung Kim wrote:
> > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > Whilst for many tools it is an expected behavior that failure to open
> > > a perf event is a failure, ARM decided to name PMU events the same as
> > > legacy events and then failed to rename such events on a server uncore
> > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > open the event on all PMUs that advertise/"have" the event, this
> > > yielded failures when trying to make the priority of legacy and
> > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > legacy event user on ARM hardware may find their event opened on an
> > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > such events which this patch implements. Rather than have the skipping
> > > conditional on running on ARM, the skipping is done on all
> > > architectures as such a fundamental behavioral difference could lead
> > > to problems with tools built/depending on perf.
> > >
> > > An example of perf record failing to open events on x86 is:
> > > ```
> > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > Error:
> > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > "dmesg | grep -i perf" may provide additional information.
> > >
> > > Error:
> > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > "dmesg | grep -i perf" may provide additional information.
> > >
> > > Error:
> > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > The LLC-prefetch-read event is not supported.
> > > [ perf record: Woken up 1 times to write data ]
> > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> >
> > I'm afraid this can be too noisy.
>
> Agreed.
>
> > > $ perf report --stats
> > > Aggregated stats:
> > >                TOTAL events:      17255
> > >                 MMAP events:        284  ( 1.6%)
> > >                 COMM events:       1961  (11.4%)
> > >                 EXIT events:          1  ( 0.0%)
> > >                 FORK events:       1960  (11.4%)
> > >               SAMPLE events:         87  ( 0.5%)
> > >                MMAP2 events:      12836  (74.4%)
> > >              KSYMBOL events:         83  ( 0.5%)
> > >            BPF_EVENT events:         36  ( 0.2%)
> > >       FINISHED_ROUND events:          2  ( 0.0%)
> > >             ID_INDEX events:          1  ( 0.0%)
> > >           THREAD_MAP events:          1  ( 0.0%)
> > >              CPU_MAP events:          1  ( 0.0%)
> > >            TIME_CONV events:          1  ( 0.0%)
> > >        FINISHED_INIT events:          1  ( 0.0%)
> > > cycles stats:
> > >               SAMPLE events:         87
> > > ```
> > >
> > > If all events fail to open then the perf record will fail:
> > > ```
> > > $ perf record -e LLC-prefetch-read true
> > > Error:
> > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > The LLC-prefetch-read event is not supported.
> > > Error:
> > > Failure to open any events for recording
> > > ```
> > >
> > > As an evlist may have dummy events that open when all command line
> > > events fail we ignore dummy events when detecting if at least some
> > > events open. This still permits the dummy event on its own to be used
> > > as a permission check:
> > > ```
> > > $ perf record -e dummy true
> > > [ perf record: Woken up 1 times to write data ]
> > > [ perf record: Captured and wrote 0.046 MB perf.data ]
> > > ```
> > > but allows failure when a dummy event is implicilty inserted or when
> > > there are insufficient permissions to open it:
> > > ```
> > > $ perf record -e LLC-prefetch-read -a true
> > > Error:
> > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > The LLC-prefetch-read event is not supported.
> > > Error:
> > > Failure to open any events for recording
> > > ```
> > >
> > > The issue with legacy events is that on RISC-V they want the driver to
> > > not have mappings from legacy to non-legacy config encodings for each
> > > vendor/model due to size, complexity and difficulty to update. It was
> > > reported that on ARM Apple-M? CPUs the legacy mapping in the driver
> > > was broken and the sysfs/json events should always take precedent,
> > > however, it isn't clear this is still the case. It is the case that
> > > without working around this issue a legacy event like cycles without a
> > > PMU can encode differently than when specified with a PMU - the
> > > non-PMU version favoring legacy encodings, the PMU one avoiding legacy
> > > encodings.
> > >
> > > The patch removes events and then adjusts the idx value for each
> > > evsel. This is done so that the dense xyarrays used for file
> > > descriptors, etc. don't contain broken entries. As event opening
> > > happens relatively late in the record process, use of the idx value
> > > before the open will have become corrupted, so it is expected there
> > > are latent bugs hidden behind this change - the change is best
> > > effort. As the only vendor that has broken event names is ARM, this
> > > will principally effect ARM users. They will also experience warning
> > > messages like those above because of the uncore PMU advertising legacy
> > > event names.
> > >
> > > Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > Tested-by: James Clark <james.clark@linaro.org>
> > > Tested-by: Leo Yan <leo.yan@arm.com>
> > > Tested-by: Atish Patra <atishp@rivosinc.com>
> > > ---
> > >  tools/perf/builtin-record.c | 47 ++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 41 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > > index 5db1aedf48df..c0b8249a3787 100644
> > > --- a/tools/perf/builtin-record.c
> > > +++ b/tools/perf/builtin-record.c
> > > @@ -961,7 +961,6 @@ static int record__config_tracking_events(struct record *rec)
> > >      */
> > >     if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
> > >         perf_pmus__num_core_pmus() > 1) {
> > > -
> > >             /*
> > >              * User space tasks can migrate between CPUs, so when tracing
> > >              * selected CPUs, sideband for all CPUs is still needed.
> > > @@ -1366,6 +1365,7 @@ static int record__open(struct record *rec)
> > >     struct perf_session *session = rec->session;
> > >     struct record_opts *opts = &rec->opts;
> > >     int rc = 0;
> > > +   bool skipped = false;
> > >
> > >     evlist__for_each_entry(evlist, pos) {
> > >  try_again:
> > > @@ -1381,15 +1381,50 @@ static int record__open(struct record *rec)
> > >                             pos = evlist__reset_weak_group(evlist, pos, true);
> > >                             goto try_again;
> > >                     }
> > > -                   rc = -errno;
> > >                     evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
> > > -                   ui__error("%s\n", msg);
> > > -                   goto out;
> > > +                   ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
> > > +                             evsel__name(pos), evsel__pmu_name(pos), msg);
>
> > How about changing it to pr_debug() and add below ...
>
> That sounds better.
>
> > > +                   pos->skippable = true;
> > > +                   skipped = true;
> > > +           } else {
> > > +                   pos->supported = true;
> > >             }
> > > -
> > > -           pos->supported = true;
> > >     }
> > >
> > > +   if (skipped) {
> > > +           struct evsel *tmp;
> > > +           int idx = 0;
> > > +           bool evlist_empty = true;
> > > +
> > > +           /* Remove evsels that failed to open and update indices. */
> > > +           evlist__for_each_entry_safe(evlist, tmp, pos) {
> > > +                   if (pos->skippable) {
> > > +                           evlist__remove(evlist, pos);
> > > +                           continue;
> > > +                   }
> > > +
> > > +                   /*
> > > +                    * Note, dummy events may be command line parsed or
> > > +                    * added by the tool. We care about supporting `perf
> > > +                    * record -e dummy` which may be used as a permission
> > > +                    * check. Dummy events that are added to the command
> > > +                    * line and opened along with other events that fail,
> > > +                    * will still fail as if the dummy events were tool
> > > +                    * added events for the sake of code simplicity.
> > > +                    */
> > > +                   if (!evsel__is_dummy_event(pos))
> > > +                           evlist_empty = false;
> > > +           }
> > > +           evlist__for_each_entry(evlist, pos) {
> > > +                   pos->core.idx = idx++;
> > > +           }
> > > +           /* If list is empty then fail. */
> > > +           if (evlist_empty) {
> > > +                   ui__error("Failure to open any events for recording.\n");
> > > +                   rc = -1;
> > > +                   goto out;
> > > +           }
>
> > ... ?
>
> >               if (!verbose)
> >                       ui__warning("Removed some unsupported events, use -v for details.\n");
>
> And even this one would be best left for cases where we can determine
> that its a new situation, i.e. one that should work and not the ones we
> know that will not work already and thus so far didn't alarm the user
> into thinking something is wrong.
>
> Having the ones we know will fail as pr_debug() seems enough, I'd say.

This means that:
```
$ perf record -e data_read,LLC-prefetch-read -a sleep 0.1
```
will fail (as data_read is a memory controller event and the LLC
doesn't support sampling) with something like:
```
Error:
Failure to open any events for recording
```
Which feels a bit minimal. As I already mentioned, it is also a
behavior change and so has the potential to break scripts dependent on
the failure information.

A patch lowering the priority of error messages should be independent
of the 4 changes here. I'd be happy if someone follows this series
with a patch doing it.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-10  4:44     ` Ian Rogers
@ 2025-01-10 18:55       ` Namhyung Kim
  2025-01-10 19:18         ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Namhyung Kim @ 2025-01-10 18:55 UTC (permalink / raw)
  To: Ian Rogers, James Clark, Leo Yan
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Leo Yan, Atish Patra

On Thu, Jan 09, 2025 at 08:44:38PM -0800, Ian Rogers wrote:
> On Thu, Jan 9, 2025 at 5:25 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > Whilst for many tools it is an expected behavior that failure to open
> > > a perf event is a failure, ARM decided to name PMU events the same as
> > > legacy events and then failed to rename such events on a server uncore
> > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > open the event on all PMUs that advertise/"have" the event, this
> > > yielded failures when trying to make the priority of legacy and
> > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > legacy event user on ARM hardware may find their event opened on an
> > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > such events which this patch implements. Rather than have the skipping
> > > conditional on running on ARM, the skipping is done on all
> > > architectures as such a fundamental behavioral difference could lead
> > > to problems with tools built/depending on perf.
> > >
> > > An example of perf record failing to open events on x86 is:
> > > ```
> > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > Error:
> > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > "dmesg | grep -i perf" may provide additional information.
> > >
> > > Error:
> > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > "dmesg | grep -i perf" may provide additional information.
> > >
> > > Error:
> > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > The LLC-prefetch-read event is not supported.
> > > [ perf record: Woken up 1 times to write data ]
> > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> >
> > I'm afraid this can be too noisy.
> 
> The intention is to be noisy:
> 1) it matches the existing behavior, anything else is potentially a regression;

Well.. I think you're changing the behavior. :)  Also currently it just
fails on the first event so it won't be too much noisy.

  $ perf record -e data_read,data_write,LLC-prefetch-read -a sleep 0.1
  event syntax error: 'data_read,data_write,LLC-prefetch-read'
                       \___ Bad event name
  
  Unable to find event on a PMU of 'data_read'
  Run 'perf list' for a list of valid events
  
   Usage: perf record [<options>] [<command>]
      or: perf record [<options>] -- <command> [<options>]
  
      -e, --event <event>   event selector. use 'perf list' to list available events


> 2) it only happens if trying to record on a PMU/event that doesn't
> support recording, something that is currently an error and so we're
> not motivated to change the behavior as no-one should be using it;

It was caught by Linus, so we know at least one (very important) user.


> 3) for the wildcard case the only offender is ARM's SLC PMU and the
> appropriate fix there has always been to make the CPU cycle's event
> name match the bus_cycles event name by calling it cpu_cycles -
> something that doesn't conflict with a core PMU event name, the thing
> that has introduced all these problems, patches, long email exchanges,
> unfixed inconsistencies, etc.. If the errors aren't noisy then there
> is little motivation for the ARM SLC PMU's event name to be fixed.

I understand your concern but I'm not sure it's the best way to fix the
issue.

James, Leo, is there any chance you can rename the SLC cycles event in
the kernel driver?

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-10 18:55       ` Namhyung Kim
@ 2025-01-10 19:18         ` Ian Rogers
  2025-01-14 19:29           ` Namhyung Kim
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-01-10 19:18 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: James Clark, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, Ze Gao, Weilin Wang,
	Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Atish Patra

On Fri, Jan 10, 2025 at 10:55 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Thu, Jan 09, 2025 at 08:44:38PM -0800, Ian Rogers wrote:
> > On Thu, Jan 9, 2025 at 5:25 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > > Whilst for many tools it is an expected behavior that failure to open
> > > > a perf event is a failure, ARM decided to name PMU events the same as
> > > > legacy events and then failed to rename such events on a server uncore
> > > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > > open the event on all PMUs that advertise/"have" the event, this
> > > > yielded failures when trying to make the priority of legacy and
> > > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > > legacy event user on ARM hardware may find their event opened on an
> > > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > > such events which this patch implements. Rather than have the skipping
> > > > conditional on running on ARM, the skipping is done on all
> > > > architectures as such a fundamental behavioral difference could lead
> > > > to problems with tools built/depending on perf.
> > > >
> > > > An example of perf record failing to open events on x86 is:
> > > > ```
> > > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > > Error:
> > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > "dmesg | grep -i perf" may provide additional information.
> > > >
> > > > Error:
> > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > "dmesg | grep -i perf" may provide additional information.
> > > >
> > > > Error:
> > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > The LLC-prefetch-read event is not supported.
> > > > [ perf record: Woken up 1 times to write data ]
> > > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> > >
> > > I'm afraid this can be too noisy.
> >
> > The intention is to be noisy:
> > 1) it matches the existing behavior, anything else is potentially a regression;
>
> Well.. I think you're changing the behavior. :)  Also currently it just
> fails on the first event so it won't be too much noisy.
>
>   $ perf record -e data_read,data_write,LLC-prefetch-read -a sleep 0.1
>   event syntax error: 'data_read,data_write,LLC-prefetch-read'
>                        \___ Bad event name
>
>   Unable to find event on a PMU of 'data_read'
>   Run 'perf list' for a list of valid events
>
>    Usage: perf record [<options>] [<command>]
>       or: perf record [<options>] -- <command> [<options>]
>
>       -e, --event <event>   event selector. use 'perf list' to list available events

Fwiw, this error is an event parsing error not an event opening error.
You need to select an uncore event, I was using data_read which exists
in the uncore_imc_free_running PMUs on Intel tigerlake. Here is the
existing error message:
```
$ perf record -e data_read -a true
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (data_read).
"dmesg | grep -i perf" may provide additional information.
```
and here it with the series:
```
$ perf record -e data_read -a true
Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0'
which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (data_read).
"dmesg | grep -i perf" may provide additional information.

Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1'
which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (data_read).
"dmesg | grep -i perf" may provide additional information.

Error:
Failure to open any events for recording.
```
and here is what it would be with pr_debug:
```
$ perf record -e data_read -a true
Error:
Failure to open any events for recording.
```
I believe this last output is worst because:
1) If not all events fail to open there is no error reported unless I
know to run with -v, which will also bring a bunch more noise with it,
2) I don't see the PMU / event name and "Invalid argument" indicating
what has gone wrong again unless I know to run with -v and get all the
verbose noise with that.

Yes it is noisy on 1 platform for 1 event due to an ARM PMU event name
bug that ARM should have long ago fixed. That should be fixed rather
than hiding errors and making users think they are recording samples
when silently they're not - or they need to search through verbose
output to try to find out if something broke.

> > 2) it only happens if trying to record on a PMU/event that doesn't
> > support recording, something that is currently an error and so we're
> > not motivated to change the behavior as no-one should be using it;
>
> It was caught by Linus, so we know at least one (very important) user.

If they care enough then specifying the PMU with the event will avoid
any warning and has always been a fix for this issue. It was the first
proposed workaround for Linus.

> > 3) for the wildcard case the only offender is ARM's SLC PMU and the
> > appropriate fix there has always been to make the CPU cycle's event
> > name match the bus_cycles event name by calling it cpu_cycles -
> > something that doesn't conflict with a core PMU event name, the thing
> > that has introduced all these problems, patches, long email exchanges,
> > unfixed inconsistencies, etc.. If the errors aren't noisy then there
> > is little motivation for the ARM SLC PMU's event name to be fixed.
>
> I understand your concern but I'm not sure it's the best way to fix the
> issue.

Right, I'm similarly concerned about hiding legitimate warning/error
messages because of 1 event on 1 PMU on 1 architecture because of how
perf gets driven by 1 user. Yes, when you break you can wade through
the verbose output but imo the verbose output was never intended to be
used in that way.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-10 16:42       ` Ian Rogers
@ 2025-01-10 19:26         ` Namhyung Kim
  2025-01-10 21:33           ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Namhyung Kim @ 2025-01-10 19:26 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Linus Torvalds, Peter Zijlstra,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Kan Liang, James Clark, Ze Gao, Weilin Wang,
	Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Atish Patra

On Fri, Jan 10, 2025 at 08:42:02AM -0800, Ian Rogers wrote:
> On Fri, Jan 10, 2025 at 6:18 AM Arnaldo Carvalho de Melo
> <acme@kernel.org> wrote:
> >
> > Adding Linus to the CC list as he participated in this discussion in the
> > past, so a heads up about changes in this area that are being further
> > discussed.
> 
> Linus blocks my email so I'm not sure of the point.

That's unfortunate, but he should be able to see others' reply.

> 
> > On Thu, Jan 09, 2025 at 05:25:03PM -0800, Namhyung Kim wrote:
> > > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > > Whilst for many tools it is an expected behavior that failure to open
> > > > a perf event is a failure, ARM decided to name PMU events the same as
> > > > legacy events and then failed to rename such events on a server uncore
> > > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > > open the event on all PMUs that advertise/"have" the event, this
> > > > yielded failures when trying to make the priority of legacy and
> > > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > > legacy event user on ARM hardware may find their event opened on an
> > > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > > such events which this patch implements. Rather than have the skipping
> > > > conditional on running on ARM, the skipping is done on all
> > > > architectures as such a fundamental behavioral difference could lead
> > > > to problems with tools built/depending on perf.
> > > >
> > > > An example of perf record failing to open events on x86 is:
> > > > ```
> > > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > > Error:
> > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > "dmesg | grep -i perf" may provide additional information.
> > > >
> > > > Error:
> > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > "dmesg | grep -i perf" may provide additional information.
> > > >
> > > > Error:
> > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > The LLC-prefetch-read event is not supported.
> > > > [ perf record: Woken up 1 times to write data ]
> > > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> > >
> > > I'm afraid this can be too noisy.
> >
> > Agreed.
> >
> > > > $ perf report --stats
> > > > Aggregated stats:
> > > >                TOTAL events:      17255
> > > >                 MMAP events:        284  ( 1.6%)
> > > >                 COMM events:       1961  (11.4%)
> > > >                 EXIT events:          1  ( 0.0%)
> > > >                 FORK events:       1960  (11.4%)
> > > >               SAMPLE events:         87  ( 0.5%)
> > > >                MMAP2 events:      12836  (74.4%)
> > > >              KSYMBOL events:         83  ( 0.5%)
> > > >            BPF_EVENT events:         36  ( 0.2%)
> > > >       FINISHED_ROUND events:          2  ( 0.0%)
> > > >             ID_INDEX events:          1  ( 0.0%)
> > > >           THREAD_MAP events:          1  ( 0.0%)
> > > >              CPU_MAP events:          1  ( 0.0%)
> > > >            TIME_CONV events:          1  ( 0.0%)
> > > >        FINISHED_INIT events:          1  ( 0.0%)
> > > > cycles stats:
> > > >               SAMPLE events:         87
> > > > ```
> > > >
> > > > If all events fail to open then the perf record will fail:
> > > > ```
> > > > $ perf record -e LLC-prefetch-read true
> > > > Error:
> > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > The LLC-prefetch-read event is not supported.
> > > > Error:
> > > > Failure to open any events for recording
> > > > ```
> > > >
> > > > As an evlist may have dummy events that open when all command line
> > > > events fail we ignore dummy events when detecting if at least some
> > > > events open. This still permits the dummy event on its own to be used
> > > > as a permission check:
> > > > ```
> > > > $ perf record -e dummy true
> > > > [ perf record: Woken up 1 times to write data ]
> > > > [ perf record: Captured and wrote 0.046 MB perf.data ]
> > > > ```
> > > > but allows failure when a dummy event is implicilty inserted or when
> > > > there are insufficient permissions to open it:
> > > > ```
> > > > $ perf record -e LLC-prefetch-read -a true
> > > > Error:
> > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > The LLC-prefetch-read event is not supported.
> > > > Error:
> > > > Failure to open any events for recording
> > > > ```
> > > >
> > > > The issue with legacy events is that on RISC-V they want the driver to
> > > > not have mappings from legacy to non-legacy config encodings for each
> > > > vendor/model due to size, complexity and difficulty to update. It was
> > > > reported that on ARM Apple-M? CPUs the legacy mapping in the driver
> > > > was broken and the sysfs/json events should always take precedent,
> > > > however, it isn't clear this is still the case. It is the case that
> > > > without working around this issue a legacy event like cycles without a
> > > > PMU can encode differently than when specified with a PMU - the
> > > > non-PMU version favoring legacy encodings, the PMU one avoiding legacy
> > > > encodings.
> > > >
> > > > The patch removes events and then adjusts the idx value for each
> > > > evsel. This is done so that the dense xyarrays used for file
> > > > descriptors, etc. don't contain broken entries. As event opening
> > > > happens relatively late in the record process, use of the idx value
> > > > before the open will have become corrupted, so it is expected there
> > > > are latent bugs hidden behind this change - the change is best
> > > > effort. As the only vendor that has broken event names is ARM, this
> > > > will principally effect ARM users. They will also experience warning
> > > > messages like those above because of the uncore PMU advertising legacy
> > > > event names.
> > > >
> > > > Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> > > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > > Tested-by: James Clark <james.clark@linaro.org>
> > > > Tested-by: Leo Yan <leo.yan@arm.com>
> > > > Tested-by: Atish Patra <atishp@rivosinc.com>
> > > > ---
> > > >  tools/perf/builtin-record.c | 47 ++++++++++++++++++++++++++++++++-----
> > > >  1 file changed, 41 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > > > index 5db1aedf48df..c0b8249a3787 100644
> > > > --- a/tools/perf/builtin-record.c
> > > > +++ b/tools/perf/builtin-record.c
> > > > @@ -961,7 +961,6 @@ static int record__config_tracking_events(struct record *rec)
> > > >      */
> > > >     if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
> > > >         perf_pmus__num_core_pmus() > 1) {
> > > > -
> > > >             /*
> > > >              * User space tasks can migrate between CPUs, so when tracing
> > > >              * selected CPUs, sideband for all CPUs is still needed.
> > > > @@ -1366,6 +1365,7 @@ static int record__open(struct record *rec)
> > > >     struct perf_session *session = rec->session;
> > > >     struct record_opts *opts = &rec->opts;
> > > >     int rc = 0;
> > > > +   bool skipped = false;
> > > >
> > > >     evlist__for_each_entry(evlist, pos) {
> > > >  try_again:
> > > > @@ -1381,15 +1381,50 @@ static int record__open(struct record *rec)
> > > >                             pos = evlist__reset_weak_group(evlist, pos, true);
> > > >                             goto try_again;
> > > >                     }
> > > > -                   rc = -errno;
> > > >                     evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
> > > > -                   ui__error("%s\n", msg);
> > > > -                   goto out;
> > > > +                   ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
> > > > +                             evsel__name(pos), evsel__pmu_name(pos), msg);
> >
> > > How about changing it to pr_debug() and add below ...
> >
> > That sounds better.
> >
> > > > +                   pos->skippable = true;
> > > > +                   skipped = true;
> > > > +           } else {
> > > > +                   pos->supported = true;
> > > >             }
> > > > -
> > > > -           pos->supported = true;
> > > >     }
> > > >
> > > > +   if (skipped) {
> > > > +           struct evsel *tmp;
> > > > +           int idx = 0;
> > > > +           bool evlist_empty = true;
> > > > +
> > > > +           /* Remove evsels that failed to open and update indices. */
> > > > +           evlist__for_each_entry_safe(evlist, tmp, pos) {
> > > > +                   if (pos->skippable) {
> > > > +                           evlist__remove(evlist, pos);
> > > > +                           continue;
> > > > +                   }
> > > > +
> > > > +                   /*
> > > > +                    * Note, dummy events may be command line parsed or
> > > > +                    * added by the tool. We care about supporting `perf
> > > > +                    * record -e dummy` which may be used as a permission
> > > > +                    * check. Dummy events that are added to the command
> > > > +                    * line and opened along with other events that fail,
> > > > +                    * will still fail as if the dummy events were tool
> > > > +                    * added events for the sake of code simplicity.
> > > > +                    */
> > > > +                   if (!evsel__is_dummy_event(pos))
> > > > +                           evlist_empty = false;
> > > > +           }
> > > > +           evlist__for_each_entry(evlist, pos) {
> > > > +                   pos->core.idx = idx++;
> > > > +           }
> > > > +           /* If list is empty then fail. */
> > > > +           if (evlist_empty) {
> > > > +                   ui__error("Failure to open any events for recording.\n");
> > > > +                   rc = -1;
> > > > +                   goto out;
> > > > +           }
> >
> > > ... ?
> >
> > >               if (!verbose)
> > >                       ui__warning("Removed some unsupported events, use -v for details.\n");
> >
> > And even this one would be best left for cases where we can determine
> > that its a new situation, i.e. one that should work and not the ones we
> > know that will not work already and thus so far didn't alarm the user
> > into thinking something is wrong.
> >
> > Having the ones we know will fail as pr_debug() seems enough, I'd say.
> 
> This means that:
> ```
> $ perf record -e data_read,LLC-prefetch-read -a sleep 0.1
> ```
> will fail (as data_read is a memory controller event and the LLC
> doesn't support sampling) with something like:
> ```
> Error:
> Failure to open any events for recording
> ```
> Which feels a bit minimal. As I already mentioned, it is also a
> behavior change and so has the potential to break scripts dependent on
> the failure information.

I don't think it's about failure behavior, the concern is the error
messages.  It can take too much screen space when users give a long list
of invalid events.  And unfortunately the current error message for
checking dmesg is not very helpful.

Anyway you can add this line too: "Use -v to see the details."

> 
> A patch lowering the priority of error messages should be independent
> of the 4 changes here. I'd be happy if someone follows this series
> with a patch doing it.

I think the error behavior is a part of this change.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-09 22:21 ` [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy" Ian Rogers
@ 2025-01-10 19:40   ` Namhyung Kim
  2025-01-10 19:52     ` Atish Kumar Patra
  2025-01-10 22:15     ` Ian Rogers
  0 siblings, 2 replies; 53+ messages in thread
From: Namhyung Kim @ 2025-01-10 19:40 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Atish Patra, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Thu, Jan 09, 2025 at 02:21:09PM -0800, Ian Rogers wrote:
> Originally posted and merged from:
> https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> This reverts commit 4f1b067359ac8364cdb7f9fda41085fa85789d0f although
> the patch is now smaller due to related fixes being applied in commit
> 22a4db3c3603 ("perf evsel: Add alternate_hw_config and use in
> evsel__match").
> The original commit message was:
> 
> It was requested that RISC-V be able to add events to the perf tool so
> the PMU driver didn't need to map legacy events to config encodings:
> https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> 
> This change makes the priority of events specified without a PMU the
> same as those specified with a PMU, namely sysfs and JSON events are
> checked first before using the legacy encoding.

I'm still not convinced why we need this change despite of these
troubles.  If it's because RISC-V cannot define the lagacy hardware
events in the kernel driver, why not using a different name in JSON and
ask users to use the name specifically?  Something like:

  $ perf record -e riscv-cycles ...

Thanks,
Namhyung

> 
> The hw_term is made more generic as a hardware_event that encodes a
> pair of string and int value, allowing parse_events_multi_pmu_add to
> fall back on a known encoding when the sysfs/JSON adding fails for
> core events. As this covers PE_VALUE_SYM_HW, that token is removed and
> related code simplified.
> 
> Signed-off-by: Ian Rogers <irogers@google.com>
> Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
> Tested-by: Atish Patra <atishp@rivosinc.com>
> Tested-by: James Clark <james.clark@linaro.org>
> Tested-by: Leo Yan <leo.yan@arm.com>
> Cc: Adrian Hunter <adrian.hunter@intel.com>
> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> Cc: Beeman Strong <beeman@rivosinc.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Jiri Olsa <jolsa@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> ---
>  tools/perf/util/parse-events.c | 26 +++++++++---
>  tools/perf/util/parse-events.l | 76 +++++++++++++++++-----------------
>  tools/perf/util/parse-events.y | 60 ++++++++++++++++++---------
>  3 files changed, 98 insertions(+), 64 deletions(-)
> 
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 1e23faa364b1..3a60fca53cfa 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -1545,8 +1545,8 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
>  	struct list_head *list = NULL;
>  	struct perf_pmu *pmu = NULL;
>  	YYLTYPE *loc = loc_;
> -	int ok = 0;
> -	const char *config;
> +	int ok = 0, core_ok = 0;
> +	const char *tmp;
>  	struct parse_events_terms parsed_terms;
>  
>  	*listp = NULL;
> @@ -1559,15 +1559,15 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
>  			return ret;
>  	}
>  
> -	config = strdup(event_name);
> -	if (!config)
> +	tmp = strdup(event_name);
> +	if (!tmp)
>  		goto out_err;
>  
>  	if (parse_events_term__num(&term,
>  				   PARSE_EVENTS__TERM_TYPE_USER,
> -				   config, /*num=*/1, /*novalue=*/true,
> +				   tmp, /*num=*/1, /*novalue=*/true,
>  				   loc, /*loc_val=*/NULL) < 0) {
> -		zfree(&config);
> +		zfree(&tmp);
>  		goto out_err;
>  	}
>  	list_add_tail(&term->list, &parsed_terms.terms);
> @@ -1598,6 +1598,8 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
>  			pr_debug("%s -> %s/%s/\n", event_name, pmu->name, sb.buf);
>  			strbuf_release(&sb);
>  			ok++;
> +			if (pmu->is_core)
> +				core_ok++;
>  		}
>  	}
>  
> @@ -1614,6 +1616,18 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
>  		}
>  	}
>  
> +	if (hw_config != PERF_COUNT_HW_MAX && !core_ok) {
> +		/*
> +		 * The event wasn't found on core PMUs but it has a hardware
> +		 * config version to try.
> +		 */
> +		if (!parse_events_add_numeric(parse_state, list,
> +						PERF_TYPE_HARDWARE, hw_config,
> +						const_parsed_terms,
> +						/*wildcard=*/true))
> +			ok++;
> +	}
> +
>  out_err:
>  	parse_events_terms__exit(&parsed_terms);
>  	if (ok)
> diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
> index bf7f73548605..29082a22ccc9 100644
> --- a/tools/perf/util/parse-events.l
> +++ b/tools/perf/util/parse-events.l
> @@ -113,12 +113,12 @@ do {								\
>  	yyless(0);						\
>  } while (0)
>  
> -static int sym(yyscan_t scanner, int type, int config)
> +static int sym(yyscan_t scanner, int config)
>  {
>  	YYSTYPE *yylval = parse_events_get_lval(scanner);
>  
> -	yylval->num = (type << 16) + config;
> -	return type == PERF_TYPE_HARDWARE ? PE_VALUE_SYM_HW : PE_VALUE_SYM_SW;
> +	yylval->num = config;
> +	return PE_VALUE_SYM_SW;
>  }
>  
>  static int term(yyscan_t scanner, enum parse_events__term_type type)
> @@ -129,13 +129,13 @@ static int term(yyscan_t scanner, enum parse_events__term_type type)
>  	return PE_TERM;
>  }
>  
> -static int hw_term(yyscan_t scanner, int config)
> +static int hw(yyscan_t scanner, int config)
>  {
>  	YYSTYPE *yylval = parse_events_get_lval(scanner);
>  	char *text = parse_events_get_text(scanner);
>  
> -	yylval->hardware_term.str = strdup(text);
> -	yylval->hardware_term.num = PERF_TYPE_HARDWARE + config;
> +	yylval->hardware_event.str = strdup(text);
> +	yylval->hardware_event.num = config;
>  	return PE_TERM_HW;
>  }
>  
> @@ -324,16 +324,16 @@ aux-output		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); }
>  aux-action		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_ACTION); }
>  aux-sample-size		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
>  metric-id		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
> -cpu-cycles|cycles				{ return hw_term(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
> -stalled-cycles-frontend|idle-cycles-frontend	{ return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
> -stalled-cycles-backend|idle-cycles-backend	{ return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
> -instructions					{ return hw_term(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
> -cache-references				{ return hw_term(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
> -cache-misses					{ return hw_term(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
> -branch-instructions|branches			{ return hw_term(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
> -branch-misses					{ return hw_term(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
> -bus-cycles					{ return hw_term(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
> -ref-cycles					{ return hw_term(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
> +cpu-cycles|cycles				{ return hw(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
> +stalled-cycles-frontend|idle-cycles-frontend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
> +stalled-cycles-backend|idle-cycles-backend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
> +instructions					{ return hw(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
> +cache-references				{ return hw(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
> +cache-misses					{ return hw(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
> +branch-instructions|branches			{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
> +branch-misses					{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
> +bus-cycles					{ return hw(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
> +ref-cycles					{ return hw(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
>  r{num_raw_hex}		{ return str(yyscanner, PE_RAW); }
>  r0x{num_raw_hex}	{ return str(yyscanner, PE_RAW); }
>  ,			{ return ','; }
> @@ -377,28 +377,28 @@ r0x{num_raw_hex}	{ return str(yyscanner, PE_RAW); }
>  <<EOF>>			{ BEGIN(INITIAL); }
>  }
>  
> -cpu-cycles|cycles				{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CPU_CYCLES); }
> -stalled-cycles-frontend|idle-cycles-frontend	{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
> -stalled-cycles-backend|idle-cycles-backend	{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
> -instructions					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS); }
> -cache-references				{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_REFERENCES); }
> -cache-misses					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_MISSES); }
> -branch-instructions|branches			{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
> -branch-misses					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_MISSES); }
> -bus-cycles					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BUS_CYCLES); }
> -ref-cycles					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_REF_CPU_CYCLES); }
> -cpu-clock					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_CLOCK); }
> -task-clock					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_TASK_CLOCK); }
> -page-faults|faults				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS); }
> -minor-faults					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MIN); }
> -major-faults					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MAJ); }
> -context-switches|cs				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CONTEXT_SWITCHES); }
> -cpu-migrations|migrations			{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_MIGRATIONS); }
> -alignment-faults				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
> -emulation-faults				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_EMULATION_FAULTS); }
> -dummy						{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_DUMMY); }
> -bpf-output					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_BPF_OUTPUT); }
> -cgroup-switches					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CGROUP_SWITCHES); }
> +cpu-cycles|cycles				{ return hw(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
> +stalled-cycles-frontend|idle-cycles-frontend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
> +stalled-cycles-backend|idle-cycles-backend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
> +instructions					{ return hw(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
> +cache-references				{ return hw(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
> +cache-misses					{ return hw(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
> +branch-instructions|branches			{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
> +branch-misses					{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
> +bus-cycles					{ return hw(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
> +ref-cycles					{ return hw(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
> +cpu-clock					{ return sym(yyscanner, PERF_COUNT_SW_CPU_CLOCK); }
> +task-clock					{ return sym(yyscanner, PERF_COUNT_SW_TASK_CLOCK); }
> +page-faults|faults				{ return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS); }
> +minor-faults					{ return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MIN); }
> +major-faults					{ return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MAJ); }
> +context-switches|cs				{ return sym(yyscanner, PERF_COUNT_SW_CONTEXT_SWITCHES); }
> +cpu-migrations|migrations			{ return sym(yyscanner, PERF_COUNT_SW_CPU_MIGRATIONS); }
> +alignment-faults				{ return sym(yyscanner, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
> +emulation-faults				{ return sym(yyscanner, PERF_COUNT_SW_EMULATION_FAULTS); }
> +dummy						{ return sym(yyscanner, PERF_COUNT_SW_DUMMY); }
> +bpf-output					{ return sym(yyscanner, PERF_COUNT_SW_BPF_OUTPUT); }
> +cgroup-switches					{ return sym(yyscanner, PERF_COUNT_SW_CGROUP_SWITCHES); }
>  
>  {lc_type}			{ return str(yyscanner, PE_LEGACY_CACHE); }
>  {lc_type}-{lc_op_result}	{ return str(yyscanner, PE_LEGACY_CACHE); }
> diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
> index f888cbb076d6..d2ef1890007e 100644
> --- a/tools/perf/util/parse-events.y
> +++ b/tools/perf/util/parse-events.y
> @@ -55,7 +55,7 @@ static void free_list_evsel(struct list_head* list_evsel)
>  %}
>  
>  %token PE_START_EVENTS PE_START_TERMS
> -%token PE_VALUE PE_VALUE_SYM_HW PE_VALUE_SYM_SW PE_TERM
> +%token PE_VALUE PE_VALUE_SYM_SW PE_TERM
>  %token PE_EVENT_NAME
>  %token PE_RAW PE_NAME
>  %token PE_MODIFIER_EVENT PE_MODIFIER_BP PE_BP_COLON PE_BP_SLASH
> @@ -65,11 +65,9 @@ static void free_list_evsel(struct list_head* list_evsel)
>  %token PE_DRV_CFG_TERM
>  %token PE_TERM_HW
>  %type <num> PE_VALUE
> -%type <num> PE_VALUE_SYM_HW
>  %type <num> PE_VALUE_SYM_SW
>  %type <mod> PE_MODIFIER_EVENT
>  %type <term_type> PE_TERM
> -%type <num> value_sym
>  %type <str> PE_RAW
>  %type <str> PE_NAME
>  %type <str> PE_LEGACY_CACHE
> @@ -85,6 +83,7 @@ static void free_list_evsel(struct list_head* list_evsel)
>  %type <list_terms> opt_pmu_config
>  %destructor { parse_events_terms__delete ($$); } <list_terms>
>  %type <list_evsel> event_pmu
> +%type <list_evsel> event_legacy_hardware
>  %type <list_evsel> event_legacy_symbol
>  %type <list_evsel> event_legacy_cache
>  %type <list_evsel> event_legacy_mem
> @@ -102,8 +101,8 @@ static void free_list_evsel(struct list_head* list_evsel)
>  %destructor { free_list_evsel ($$); } <list_evsel>
>  %type <tracepoint_name> tracepoint_name
>  %destructor { free ($$.sys); free ($$.event); } <tracepoint_name>
> -%type <hardware_term> PE_TERM_HW
> -%destructor { free ($$.str); } <hardware_term>
> +%type <hardware_event> PE_TERM_HW
> +%destructor { free ($$.str); } <hardware_event>
>  
>  %union
>  {
> @@ -118,10 +117,10 @@ static void free_list_evsel(struct list_head* list_evsel)
>  		char *sys;
>  		char *event;
>  	} tracepoint_name;
> -	struct hardware_term {
> +	struct hardware_event {
>  		char *str;
>  		u64 num;
> -	} hardware_term;
> +	} hardware_event;
>  }
>  %%
>  
> @@ -264,6 +263,7 @@ PE_EVENT_NAME event_def
>  event_def
>  
>  event_def: event_pmu |
> +	   event_legacy_hardware |
>  	   event_legacy_symbol |
>  	   event_legacy_cache sep_dc |
>  	   event_legacy_mem sep_dc |
> @@ -306,24 +306,45 @@ PE_NAME sep_dc
>  	$$ = list;
>  }
>  
> -value_sym:
> -PE_VALUE_SYM_HW
> +event_legacy_hardware:
> +PE_TERM_HW opt_pmu_config
> +{
> +	/* List of created evsels. */
> +	struct list_head *list = NULL;
> +	int err = parse_events_multi_pmu_add(_parse_state, $1.str, $1.num, $2, &list, &@1);
> +
> +	free($1.str);
> +	parse_events_terms__delete($2);
> +	if (err)
> +		PE_ABORT(err);
> +
> +	$$ = list;
> +}
>  |
> -PE_VALUE_SYM_SW
> +PE_TERM_HW sep_dc
> +{
> +	struct list_head *list;
> +	int err;
> +
> +	err = parse_events_multi_pmu_add(_parse_state, $1.str, $1.num, NULL, &list, &@1);
> +	free($1.str);
> +	if (err)
> +		PE_ABORT(err);
> +	$$ = list;
> +}
>  
>  event_legacy_symbol:
> -value_sym '/' event_config '/'
> +PE_VALUE_SYM_SW '/' event_config '/'
>  {
>  	struct list_head *list;
> -	int type = $1 >> 16;
> -	int config = $1 & 255;
>  	int err;
> -	bool wildcard = (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE);
>  
>  	list = alloc_list();
>  	if (!list)
>  		YYNOMEM;
> -	err = parse_events_add_numeric(_parse_state, list, type, config, $3, wildcard);
> +	err = parse_events_add_numeric(_parse_state, list,
> +				/*type=*/PERF_TYPE_SOFTWARE, /*config=*/$1,
> +				$3, /*wildcard=*/false);
>  	parse_events_terms__delete($3);
>  	if (err) {
>  		free_list_evsel(list);
> @@ -332,18 +353,17 @@ value_sym '/' event_config '/'
>  	$$ = list;
>  }
>  |
> -value_sym sep_slash_slash_dc
> +PE_VALUE_SYM_SW sep_slash_slash_dc
>  {
>  	struct list_head *list;
> -	int type = $1 >> 16;
> -	int config = $1 & 255;
> -	bool wildcard = (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE);
>  	int err;
>  
>  	list = alloc_list();
>  	if (!list)
>  		YYNOMEM;
> -	err = parse_events_add_numeric(_parse_state, list, type, config, /*head_config=*/NULL, wildcard);
> +	err = parse_events_add_numeric(_parse_state, list,
> +				/*type=*/PERF_TYPE_SOFTWARE, /*config=*/$1,
> +				/*head_config=*/NULL, /*wildcard=*/false);
>  	if (err)
>  		PE_ABORT(err);
>  	$$ = list;
> -- 
> 2.47.1.613.gc27f4b7a9f-goog
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-10 19:40   ` Namhyung Kim
@ 2025-01-10 19:52     ` Atish Kumar Patra
  2025-01-13 20:56       ` Namhyung Kim
  2025-01-10 22:15     ` Ian Rogers
  1 sibling, 1 reply; 53+ messages in thread
From: Atish Kumar Patra @ 2025-01-10 19:52 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Fri, Jan 10, 2025 at 11:40 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Thu, Jan 09, 2025 at 02:21:09PM -0800, Ian Rogers wrote:
> > Originally posted and merged from:
> > https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> > This reverts commit 4f1b067359ac8364cdb7f9fda41085fa85789d0f although
> > the patch is now smaller due to related fixes being applied in commit
> > 22a4db3c3603 ("perf evsel: Add alternate_hw_config and use in
> > evsel__match").
> > The original commit message was:
> >
> > It was requested that RISC-V be able to add events to the perf tool so
> > the PMU driver didn't need to map legacy events to config encodings:
> > https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> >
> > This change makes the priority of events specified without a PMU the
> > same as those specified with a PMU, namely sysfs and JSON events are
> > checked first before using the legacy encoding.
>
> I'm still not convinced why we need this change despite of these
> troubles.  If it's because RISC-V cannot define the lagacy hardware
> events in the kernel driver, why not using a different name in JSON and

When the discussion happened a year back. we tried to avoid defining
the legacy hardware events in
the kernel driver. However, we agreed that we have to define it
anyways for other reasons (legacy usage + virtualization)
as described here[1]. I have improved the driver in such a way that it
can handle both legacy events from the
driver or json file (via this patch) if available. If this patch is
available, a platform vendor can choose to encode the legacy events in
json.
Otherwise, it has to specify them in the driver. I will try to send
the series today/tomorrow.

This patch will help avoid proliferation of usage of legacy events in
the long run. But it is no longer absolutely necessary for RISC-V.
If this patch is accepted, there is a hope that we can get rid of the
specifying encodings in the driver in the distant future. However, we
have
to define them in the driver for reasons described in[1].

[1] https://lore.kernel.org/lkml/20241026121758.143259-1-irogers@google.com/T/#m653a6b98919a365a361a698032502bd26af9f6ba
> ask users to use the name specifically?  Something like:
>
>   $ perf record -e riscv-cycles ...
>

That was the first alternative I proposed back in 2022 plumbers :).
But it was concluded that we don't want users to learn new ways
of running perf in RISC-V which makes sense to me as well.

> Thanks,
> Namhyung
>
> >
> > The hw_term is made more generic as a hardware_event that encodes a
> > pair of string and int value, allowing parse_events_multi_pmu_add to
> > fall back on a known encoding when the sysfs/JSON adding fails for
> > core events. As this covers PE_VALUE_SYM_HW, that token is removed and
> > related code simplified.
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
> > Tested-by: Atish Patra <atishp@rivosinc.com>
> > Tested-by: James Clark <james.clark@linaro.org>
> > Tested-by: Leo Yan <leo.yan@arm.com>
> > Cc: Adrian Hunter <adrian.hunter@intel.com>
> > Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> > Cc: Beeman Strong <beeman@rivosinc.com>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Jiri Olsa <jolsa@kernel.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> > ---
> >  tools/perf/util/parse-events.c | 26 +++++++++---
> >  tools/perf/util/parse-events.l | 76 +++++++++++++++++-----------------
> >  tools/perf/util/parse-events.y | 60 ++++++++++++++++++---------
> >  3 files changed, 98 insertions(+), 64 deletions(-)
> >
> > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> > index 1e23faa364b1..3a60fca53cfa 100644
> > --- a/tools/perf/util/parse-events.c
> > +++ b/tools/perf/util/parse-events.c
> > @@ -1545,8 +1545,8 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
> >       struct list_head *list = NULL;
> >       struct perf_pmu *pmu = NULL;
> >       YYLTYPE *loc = loc_;
> > -     int ok = 0;
> > -     const char *config;
> > +     int ok = 0, core_ok = 0;
> > +     const char *tmp;
> >       struct parse_events_terms parsed_terms;
> >
> >       *listp = NULL;
> > @@ -1559,15 +1559,15 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
> >                       return ret;
> >       }
> >
> > -     config = strdup(event_name);
> > -     if (!config)
> > +     tmp = strdup(event_name);
> > +     if (!tmp)
> >               goto out_err;
> >
> >       if (parse_events_term__num(&term,
> >                                  PARSE_EVENTS__TERM_TYPE_USER,
> > -                                config, /*num=*/1, /*novalue=*/true,
> > +                                tmp, /*num=*/1, /*novalue=*/true,
> >                                  loc, /*loc_val=*/NULL) < 0) {
> > -             zfree(&config);
> > +             zfree(&tmp);
> >               goto out_err;
> >       }
> >       list_add_tail(&term->list, &parsed_terms.terms);
> > @@ -1598,6 +1598,8 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
> >                       pr_debug("%s -> %s/%s/\n", event_name, pmu->name, sb.buf);
> >                       strbuf_release(&sb);
> >                       ok++;
> > +                     if (pmu->is_core)
> > +                             core_ok++;
> >               }
> >       }
> >
> > @@ -1614,6 +1616,18 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
> >               }
> >       }
> >
> > +     if (hw_config != PERF_COUNT_HW_MAX && !core_ok) {
> > +             /*
> > +              * The event wasn't found on core PMUs but it has a hardware
> > +              * config version to try.
> > +              */
> > +             if (!parse_events_add_numeric(parse_state, list,
> > +                                             PERF_TYPE_HARDWARE, hw_config,
> > +                                             const_parsed_terms,
> > +                                             /*wildcard=*/true))
> > +                     ok++;
> > +     }
> > +
> >  out_err:
> >       parse_events_terms__exit(&parsed_terms);
> >       if (ok)
> > diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
> > index bf7f73548605..29082a22ccc9 100644
> > --- a/tools/perf/util/parse-events.l
> > +++ b/tools/perf/util/parse-events.l
> > @@ -113,12 +113,12 @@ do {                                                            \
> >       yyless(0);                                              \
> >  } while (0)
> >
> > -static int sym(yyscan_t scanner, int type, int config)
> > +static int sym(yyscan_t scanner, int config)
> >  {
> >       YYSTYPE *yylval = parse_events_get_lval(scanner);
> >
> > -     yylval->num = (type << 16) + config;
> > -     return type == PERF_TYPE_HARDWARE ? PE_VALUE_SYM_HW : PE_VALUE_SYM_SW;
> > +     yylval->num = config;
> > +     return PE_VALUE_SYM_SW;
> >  }
> >
> >  static int term(yyscan_t scanner, enum parse_events__term_type type)
> > @@ -129,13 +129,13 @@ static int term(yyscan_t scanner, enum parse_events__term_type type)
> >       return PE_TERM;
> >  }
> >
> > -static int hw_term(yyscan_t scanner, int config)
> > +static int hw(yyscan_t scanner, int config)
> >  {
> >       YYSTYPE *yylval = parse_events_get_lval(scanner);
> >       char *text = parse_events_get_text(scanner);
> >
> > -     yylval->hardware_term.str = strdup(text);
> > -     yylval->hardware_term.num = PERF_TYPE_HARDWARE + config;
> > +     yylval->hardware_event.str = strdup(text);
> > +     yylval->hardware_event.num = config;
> >       return PE_TERM_HW;
> >  }
> >
> > @@ -324,16 +324,16 @@ aux-output              { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); }
> >  aux-action           { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_ACTION); }
> >  aux-sample-size              { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
> >  metric-id            { return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
> > -cpu-cycles|cycles                            { return hw_term(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
> > -stalled-cycles-frontend|idle-cycles-frontend { return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
> > -stalled-cycles-backend|idle-cycles-backend   { return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
> > -instructions                                 { return hw_term(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
> > -cache-references                             { return hw_term(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
> > -cache-misses                                 { return hw_term(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
> > -branch-instructions|branches                 { return hw_term(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
> > -branch-misses                                        { return hw_term(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
> > -bus-cycles                                   { return hw_term(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
> > -ref-cycles                                   { return hw_term(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
> > +cpu-cycles|cycles                            { return hw(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
> > +stalled-cycles-frontend|idle-cycles-frontend { return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
> > +stalled-cycles-backend|idle-cycles-backend   { return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
> > +instructions                                 { return hw(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
> > +cache-references                             { return hw(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
> > +cache-misses                                 { return hw(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
> > +branch-instructions|branches                 { return hw(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
> > +branch-misses                                        { return hw(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
> > +bus-cycles                                   { return hw(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
> > +ref-cycles                                   { return hw(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
> >  r{num_raw_hex}               { return str(yyscanner, PE_RAW); }
> >  r0x{num_raw_hex}     { return str(yyscanner, PE_RAW); }
> >  ,                    { return ','; }
> > @@ -377,28 +377,28 @@ r0x{num_raw_hex}        { return str(yyscanner, PE_RAW); }
> >  <<EOF>>                      { BEGIN(INITIAL); }
> >  }
> >
> > -cpu-cycles|cycles                            { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CPU_CYCLES); }
> > -stalled-cycles-frontend|idle-cycles-frontend { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
> > -stalled-cycles-backend|idle-cycles-backend   { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
> > -instructions                                 { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS); }
> > -cache-references                             { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_REFERENCES); }
> > -cache-misses                                 { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_MISSES); }
> > -branch-instructions|branches                 { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
> > -branch-misses                                        { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_MISSES); }
> > -bus-cycles                                   { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BUS_CYCLES); }
> > -ref-cycles                                   { return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_REF_CPU_CYCLES); }
> > -cpu-clock                                    { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_CLOCK); }
> > -task-clock                                   { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_TASK_CLOCK); }
> > -page-faults|faults                           { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS); }
> > -minor-faults                                 { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MIN); }
> > -major-faults                                 { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MAJ); }
> > -context-switches|cs                          { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CONTEXT_SWITCHES); }
> > -cpu-migrations|migrations                    { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_MIGRATIONS); }
> > -alignment-faults                             { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
> > -emulation-faults                             { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_EMULATION_FAULTS); }
> > -dummy                                                { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_DUMMY); }
> > -bpf-output                                   { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_BPF_OUTPUT); }
> > -cgroup-switches                                      { return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CGROUP_SWITCHES); }
> > +cpu-cycles|cycles                            { return hw(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
> > +stalled-cycles-frontend|idle-cycles-frontend { return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
> > +stalled-cycles-backend|idle-cycles-backend   { return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
> > +instructions                                 { return hw(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
> > +cache-references                             { return hw(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
> > +cache-misses                                 { return hw(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
> > +branch-instructions|branches                 { return hw(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
> > +branch-misses                                        { return hw(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
> > +bus-cycles                                   { return hw(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
> > +ref-cycles                                   { return hw(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
> > +cpu-clock                                    { return sym(yyscanner, PERF_COUNT_SW_CPU_CLOCK); }
> > +task-clock                                   { return sym(yyscanner, PERF_COUNT_SW_TASK_CLOCK); }
> > +page-faults|faults                           { return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS); }
> > +minor-faults                                 { return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MIN); }
> > +major-faults                                 { return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MAJ); }
> > +context-switches|cs                          { return sym(yyscanner, PERF_COUNT_SW_CONTEXT_SWITCHES); }
> > +cpu-migrations|migrations                    { return sym(yyscanner, PERF_COUNT_SW_CPU_MIGRATIONS); }
> > +alignment-faults                             { return sym(yyscanner, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
> > +emulation-faults                             { return sym(yyscanner, PERF_COUNT_SW_EMULATION_FAULTS); }
> > +dummy                                                { return sym(yyscanner, PERF_COUNT_SW_DUMMY); }
> > +bpf-output                                   { return sym(yyscanner, PERF_COUNT_SW_BPF_OUTPUT); }
> > +cgroup-switches                                      { return sym(yyscanner, PERF_COUNT_SW_CGROUP_SWITCHES); }
> >
> >  {lc_type}                    { return str(yyscanner, PE_LEGACY_CACHE); }
> >  {lc_type}-{lc_op_result}     { return str(yyscanner, PE_LEGACY_CACHE); }
> > diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
> > index f888cbb076d6..d2ef1890007e 100644
> > --- a/tools/perf/util/parse-events.y
> > +++ b/tools/perf/util/parse-events.y
> > @@ -55,7 +55,7 @@ static void free_list_evsel(struct list_head* list_evsel)
> >  %}
> >
> >  %token PE_START_EVENTS PE_START_TERMS
> > -%token PE_VALUE PE_VALUE_SYM_HW PE_VALUE_SYM_SW PE_TERM
> > +%token PE_VALUE PE_VALUE_SYM_SW PE_TERM
> >  %token PE_EVENT_NAME
> >  %token PE_RAW PE_NAME
> >  %token PE_MODIFIER_EVENT PE_MODIFIER_BP PE_BP_COLON PE_BP_SLASH
> > @@ -65,11 +65,9 @@ static void free_list_evsel(struct list_head* list_evsel)
> >  %token PE_DRV_CFG_TERM
> >  %token PE_TERM_HW
> >  %type <num> PE_VALUE
> > -%type <num> PE_VALUE_SYM_HW
> >  %type <num> PE_VALUE_SYM_SW
> >  %type <mod> PE_MODIFIER_EVENT
> >  %type <term_type> PE_TERM
> > -%type <num> value_sym
> >  %type <str> PE_RAW
> >  %type <str> PE_NAME
> >  %type <str> PE_LEGACY_CACHE
> > @@ -85,6 +83,7 @@ static void free_list_evsel(struct list_head* list_evsel)
> >  %type <list_terms> opt_pmu_config
> >  %destructor { parse_events_terms__delete ($$); } <list_terms>
> >  %type <list_evsel> event_pmu
> > +%type <list_evsel> event_legacy_hardware
> >  %type <list_evsel> event_legacy_symbol
> >  %type <list_evsel> event_legacy_cache
> >  %type <list_evsel> event_legacy_mem
> > @@ -102,8 +101,8 @@ static void free_list_evsel(struct list_head* list_evsel)
> >  %destructor { free_list_evsel ($$); } <list_evsel>
> >  %type <tracepoint_name> tracepoint_name
> >  %destructor { free ($$.sys); free ($$.event); } <tracepoint_name>
> > -%type <hardware_term> PE_TERM_HW
> > -%destructor { free ($$.str); } <hardware_term>
> > +%type <hardware_event> PE_TERM_HW
> > +%destructor { free ($$.str); } <hardware_event>
> >
> >  %union
> >  {
> > @@ -118,10 +117,10 @@ static void free_list_evsel(struct list_head* list_evsel)
> >               char *sys;
> >               char *event;
> >       } tracepoint_name;
> > -     struct hardware_term {
> > +     struct hardware_event {
> >               char *str;
> >               u64 num;
> > -     } hardware_term;
> > +     } hardware_event;
> >  }
> >  %%
> >
> > @@ -264,6 +263,7 @@ PE_EVENT_NAME event_def
> >  event_def
> >
> >  event_def: event_pmu |
> > +        event_legacy_hardware |
> >          event_legacy_symbol |
> >          event_legacy_cache sep_dc |
> >          event_legacy_mem sep_dc |
> > @@ -306,24 +306,45 @@ PE_NAME sep_dc
> >       $$ = list;
> >  }
> >
> > -value_sym:
> > -PE_VALUE_SYM_HW
> > +event_legacy_hardware:
> > +PE_TERM_HW opt_pmu_config
> > +{
> > +     /* List of created evsels. */
> > +     struct list_head *list = NULL;
> > +     int err = parse_events_multi_pmu_add(_parse_state, $1.str, $1.num, $2, &list, &@1);
> > +
> > +     free($1.str);
> > +     parse_events_terms__delete($2);
> > +     if (err)
> > +             PE_ABORT(err);
> > +
> > +     $$ = list;
> > +}
> >  |
> > -PE_VALUE_SYM_SW
> > +PE_TERM_HW sep_dc
> > +{
> > +     struct list_head *list;
> > +     int err;
> > +
> > +     err = parse_events_multi_pmu_add(_parse_state, $1.str, $1.num, NULL, &list, &@1);
> > +     free($1.str);
> > +     if (err)
> > +             PE_ABORT(err);
> > +     $$ = list;
> > +}
> >
> >  event_legacy_symbol:
> > -value_sym '/' event_config '/'
> > +PE_VALUE_SYM_SW '/' event_config '/'
> >  {
> >       struct list_head *list;
> > -     int type = $1 >> 16;
> > -     int config = $1 & 255;
> >       int err;
> > -     bool wildcard = (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE);
> >
> >       list = alloc_list();
> >       if (!list)
> >               YYNOMEM;
> > -     err = parse_events_add_numeric(_parse_state, list, type, config, $3, wildcard);
> > +     err = parse_events_add_numeric(_parse_state, list,
> > +                             /*type=*/PERF_TYPE_SOFTWARE, /*config=*/$1,
> > +                             $3, /*wildcard=*/false);
> >       parse_events_terms__delete($3);
> >       if (err) {
> >               free_list_evsel(list);
> > @@ -332,18 +353,17 @@ value_sym '/' event_config '/'
> >       $$ = list;
> >  }
> >  |
> > -value_sym sep_slash_slash_dc
> > +PE_VALUE_SYM_SW sep_slash_slash_dc
> >  {
> >       struct list_head *list;
> > -     int type = $1 >> 16;
> > -     int config = $1 & 255;
> > -     bool wildcard = (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE);
> >       int err;
> >
> >       list = alloc_list();
> >       if (!list)
> >               YYNOMEM;
> > -     err = parse_events_add_numeric(_parse_state, list, type, config, /*head_config=*/NULL, wildcard);
> > +     err = parse_events_add_numeric(_parse_state, list,
> > +                             /*type=*/PERF_TYPE_SOFTWARE, /*config=*/$1,
> > +                             /*head_config=*/NULL, /*wildcard=*/false);
> >       if (err)
> >               PE_ABORT(err);
> >       $$ = list;
> > --
> > 2.47.1.613.gc27f4b7a9f-goog
> >

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-10 19:26         ` Namhyung Kim
@ 2025-01-10 21:33           ` Ian Rogers
  2025-01-13 20:51             ` Namhyung Kim
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-01-10 21:33 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Linus Torvalds, Peter Zijlstra,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Kan Liang, James Clark, Ze Gao, Weilin Wang,
	Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Atish Patra

On Fri, Jan 10, 2025 at 11:26 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Jan 10, 2025 at 08:42:02AM -0800, Ian Rogers wrote:
> > On Fri, Jan 10, 2025 at 6:18 AM Arnaldo Carvalho de Melo
> > <acme@kernel.org> wrote:
> > >
> > > Adding Linus to the CC list as he participated in this discussion in the
> > > past, so a heads up about changes in this area that are being further
> > > discussed.
> >
> > Linus blocks my email so I'm not sure of the point.
>
> That's unfortunate, but he should be able to see others' reply.
>
> >
> > > On Thu, Jan 09, 2025 at 05:25:03PM -0800, Namhyung Kim wrote:
> > > > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > > > Whilst for many tools it is an expected behavior that failure to open
> > > > > a perf event is a failure, ARM decided to name PMU events the same as
> > > > > legacy events and then failed to rename such events on a server uncore
> > > > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > > > open the event on all PMUs that advertise/"have" the event, this
> > > > > yielded failures when trying to make the priority of legacy and
> > > > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > > > legacy event user on ARM hardware may find their event opened on an
> > > > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > > > such events which this patch implements. Rather than have the skipping
> > > > > conditional on running on ARM, the skipping is done on all
> > > > > architectures as such a fundamental behavioral difference could lead
> > > > > to problems with tools built/depending on perf.
> > > > >
> > > > > An example of perf record failing to open events on x86 is:
> > > > > ```
> > > > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > > > Error:
> > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > "dmesg | grep -i perf" may provide additional information.
> > > > >
> > > > > Error:
> > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > "dmesg | grep -i perf" may provide additional information.
> > > > >
> > > > > Error:
> > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > The LLC-prefetch-read event is not supported.
> > > > > [ perf record: Woken up 1 times to write data ]
> > > > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> > > >
> > > > I'm afraid this can be too noisy.
> > >
> > > Agreed.
> > >
> > > > > $ perf report --stats
> > > > > Aggregated stats:
> > > > >                TOTAL events:      17255
> > > > >                 MMAP events:        284  ( 1.6%)
> > > > >                 COMM events:       1961  (11.4%)
> > > > >                 EXIT events:          1  ( 0.0%)
> > > > >                 FORK events:       1960  (11.4%)
> > > > >               SAMPLE events:         87  ( 0.5%)
> > > > >                MMAP2 events:      12836  (74.4%)
> > > > >              KSYMBOL events:         83  ( 0.5%)
> > > > >            BPF_EVENT events:         36  ( 0.2%)
> > > > >       FINISHED_ROUND events:          2  ( 0.0%)
> > > > >             ID_INDEX events:          1  ( 0.0%)
> > > > >           THREAD_MAP events:          1  ( 0.0%)
> > > > >              CPU_MAP events:          1  ( 0.0%)
> > > > >            TIME_CONV events:          1  ( 0.0%)
> > > > >        FINISHED_INIT events:          1  ( 0.0%)
> > > > > cycles stats:
> > > > >               SAMPLE events:         87
> > > > > ```
> > > > >
> > > > > If all events fail to open then the perf record will fail:
> > > > > ```
> > > > > $ perf record -e LLC-prefetch-read true
> > > > > Error:
> > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > The LLC-prefetch-read event is not supported.
> > > > > Error:
> > > > > Failure to open any events for recording
> > > > > ```
> > > > >
> > > > > As an evlist may have dummy events that open when all command line
> > > > > events fail we ignore dummy events when detecting if at least some
> > > > > events open. This still permits the dummy event on its own to be used
> > > > > as a permission check:
> > > > > ```
> > > > > $ perf record -e dummy true
> > > > > [ perf record: Woken up 1 times to write data ]
> > > > > [ perf record: Captured and wrote 0.046 MB perf.data ]
> > > > > ```
> > > > > but allows failure when a dummy event is implicilty inserted or when
> > > > > there are insufficient permissions to open it:
> > > > > ```
> > > > > $ perf record -e LLC-prefetch-read -a true
> > > > > Error:
> > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > The LLC-prefetch-read event is not supported.
> > > > > Error:
> > > > > Failure to open any events for recording
> > > > > ```
> > > > >
> > > > > The issue with legacy events is that on RISC-V they want the driver to
> > > > > not have mappings from legacy to non-legacy config encodings for each
> > > > > vendor/model due to size, complexity and difficulty to update. It was
> > > > > reported that on ARM Apple-M? CPUs the legacy mapping in the driver
> > > > > was broken and the sysfs/json events should always take precedent,
> > > > > however, it isn't clear this is still the case. It is the case that
> > > > > without working around this issue a legacy event like cycles without a
> > > > > PMU can encode differently than when specified with a PMU - the
> > > > > non-PMU version favoring legacy encodings, the PMU one avoiding legacy
> > > > > encodings.
> > > > >
> > > > > The patch removes events and then adjusts the idx value for each
> > > > > evsel. This is done so that the dense xyarrays used for file
> > > > > descriptors, etc. don't contain broken entries. As event opening
> > > > > happens relatively late in the record process, use of the idx value
> > > > > before the open will have become corrupted, so it is expected there
> > > > > are latent bugs hidden behind this change - the change is best
> > > > > effort. As the only vendor that has broken event names is ARM, this
> > > > > will principally effect ARM users. They will also experience warning
> > > > > messages like those above because of the uncore PMU advertising legacy
> > > > > event names.
> > > > >
> > > > > Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> > > > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > > > Tested-by: James Clark <james.clark@linaro.org>
> > > > > Tested-by: Leo Yan <leo.yan@arm.com>
> > > > > Tested-by: Atish Patra <atishp@rivosinc.com>
> > > > > ---
> > > > >  tools/perf/builtin-record.c | 47 ++++++++++++++++++++++++++++++++-----
> > > > >  1 file changed, 41 insertions(+), 6 deletions(-)
> > > > >
> > > > > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > > > > index 5db1aedf48df..c0b8249a3787 100644
> > > > > --- a/tools/perf/builtin-record.c
> > > > > +++ b/tools/perf/builtin-record.c
> > > > > @@ -961,7 +961,6 @@ static int record__config_tracking_events(struct record *rec)
> > > > >      */
> > > > >     if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
> > > > >         perf_pmus__num_core_pmus() > 1) {
> > > > > -
> > > > >             /*
> > > > >              * User space tasks can migrate between CPUs, so when tracing
> > > > >              * selected CPUs, sideband for all CPUs is still needed.
> > > > > @@ -1366,6 +1365,7 @@ static int record__open(struct record *rec)
> > > > >     struct perf_session *session = rec->session;
> > > > >     struct record_opts *opts = &rec->opts;
> > > > >     int rc = 0;
> > > > > +   bool skipped = false;
> > > > >
> > > > >     evlist__for_each_entry(evlist, pos) {
> > > > >  try_again:
> > > > > @@ -1381,15 +1381,50 @@ static int record__open(struct record *rec)
> > > > >                             pos = evlist__reset_weak_group(evlist, pos, true);
> > > > >                             goto try_again;
> > > > >                     }
> > > > > -                   rc = -errno;
> > > > >                     evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
> > > > > -                   ui__error("%s\n", msg);
> > > > > -                   goto out;
> > > > > +                   ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
> > > > > +                             evsel__name(pos), evsel__pmu_name(pos), msg);
> > >
> > > > How about changing it to pr_debug() and add below ...
> > >
> > > That sounds better.
> > >
> > > > > +                   pos->skippable = true;
> > > > > +                   skipped = true;
> > > > > +           } else {
> > > > > +                   pos->supported = true;
> > > > >             }
> > > > > -
> > > > > -           pos->supported = true;
> > > > >     }
> > > > >
> > > > > +   if (skipped) {
> > > > > +           struct evsel *tmp;
> > > > > +           int idx = 0;
> > > > > +           bool evlist_empty = true;
> > > > > +
> > > > > +           /* Remove evsels that failed to open and update indices. */
> > > > > +           evlist__for_each_entry_safe(evlist, tmp, pos) {
> > > > > +                   if (pos->skippable) {
> > > > > +                           evlist__remove(evlist, pos);
> > > > > +                           continue;
> > > > > +                   }
> > > > > +
> > > > > +                   /*
> > > > > +                    * Note, dummy events may be command line parsed or
> > > > > +                    * added by the tool. We care about supporting `perf
> > > > > +                    * record -e dummy` which may be used as a permission
> > > > > +                    * check. Dummy events that are added to the command
> > > > > +                    * line and opened along with other events that fail,
> > > > > +                    * will still fail as if the dummy events were tool
> > > > > +                    * added events for the sake of code simplicity.
> > > > > +                    */
> > > > > +                   if (!evsel__is_dummy_event(pos))
> > > > > +                           evlist_empty = false;
> > > > > +           }
> > > > > +           evlist__for_each_entry(evlist, pos) {
> > > > > +                   pos->core.idx = idx++;
> > > > > +           }
> > > > > +           /* If list is empty then fail. */
> > > > > +           if (evlist_empty) {
> > > > > +                   ui__error("Failure to open any events for recording.\n");
> > > > > +                   rc = -1;
> > > > > +                   goto out;
> > > > > +           }
> > >
> > > > ... ?
> > >
> > > >               if (!verbose)
> > > >                       ui__warning("Removed some unsupported events, use -v for details.\n");
> > >
> > > And even this one would be best left for cases where we can determine
> > > that its a new situation, i.e. one that should work and not the ones we
> > > know that will not work already and thus so far didn't alarm the user
> > > into thinking something is wrong.
> > >
> > > Having the ones we know will fail as pr_debug() seems enough, I'd say.
> >
> > This means that:
> > ```
> > $ perf record -e data_read,LLC-prefetch-read -a sleep 0.1
> > ```
> > will fail (as data_read is a memory controller event and the LLC
> > doesn't support sampling) with something like:
> > ```
> > Error:
> > Failure to open any events for recording
> > ```
> > Which feels a bit minimal. As I already mentioned, it is also a
> > behavior change and so has the potential to break scripts dependent on
> > the failure information.
>
> I don't think it's about failure behavior, the concern is the error
> messages.  It can take too much screen space when users give a long list
> of invalid events.  And unfortunately the current error message for
> checking dmesg is not very helpful.

Making the dmesg message more useful is a separate issue. The error
message only happens when things are broken and I think having an
error message is better than none, or somehow having to know to wade
through verbose output. I think this is very clear in:
https://lore.kernel.org/lkml/CAP-5=fVr43v8gkqi8SXVaNKnkO+cooQVqx3xUFJ-BtgxGHX90g@mail.gmail.com/

> Anyway you can add this line too: "Use -v to see the details."

So silently failing and then expecting users to scrape verbose output
is a fairly significant behavior change for the tool.

> >
> > A patch lowering the priority of error messages should be independent
> > of the 4 changes here. I'd be happy if someone follows this series
> > with a patch doing it.
>
> I think the error behavior is a part of this change.

I disagree with it, so I think you need to address my comments.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-10 19:40   ` Namhyung Kim
  2025-01-10 19:52     ` Atish Kumar Patra
@ 2025-01-10 22:15     ` Ian Rogers
  2025-01-13 22:01       ` Namhyung Kim
  1 sibling, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-01-10 22:15 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Atish Patra, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Fri, Jan 10, 2025 at 11:40 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Thu, Jan 09, 2025 at 02:21:09PM -0800, Ian Rogers wrote:
> > Originally posted and merged from:
> > https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> > This reverts commit 4f1b067359ac8364cdb7f9fda41085fa85789d0f although
> > the patch is now smaller due to related fixes being applied in commit
> > 22a4db3c3603 ("perf evsel: Add alternate_hw_config and use in
> > evsel__match").
> > The original commit message was:
> >
> > It was requested that RISC-V be able to add events to the perf tool so
> > the PMU driver didn't need to map legacy events to config encodings:
> > https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> >
> > This change makes the priority of events specified without a PMU the
> > same as those specified with a PMU, namely sysfs and JSON events are
> > checked first before using the legacy encoding.
>
> I'm still not convinced why we need this change despite of these
> troubles.  If it's because RISC-V cannot define the lagacy hardware
> events in the kernel driver, why not using a different name in JSON and
> ask users to use the name specifically?  Something like:
>
>   $ perf record -e riscv-cycles ...

So ARM and RISC-V are more than able to speak for themselves and have
their tags on the series, but let's recap why I'm motivated to do this
change:

1) perf supported legacy events;
2) perf supported sysfs and json events, but at a lower priority than
legacy events;
3) hybrid support was added but in a way where all the hybrid PMUs
needed to be known, assumptions about PMU were implicit and baked into
the tool;
4) metric support for hybrid was going in a similar implicit direction
and I objected, what would cycles mean in a metric if the core PMU was
implicit? Rather than pursue this the hybrid code was overhauled, PMUs
became more of a thing and we added a notion of a "core" PMU which
would support legacy events;
5) ARM core PMUs differ in naming, etc. than just about every other
platform. Their core events had been being programmed as if they were
uncore events - ie without the legacy priority. Fixing hybrid, and
fixing ARM PMUs to know they supported legacy events, broke perf on
Apple-M? series due to a PMU driver issue with legacy events:
https://lore.kernel.org/lkml/08f1f185-e259-4014-9ca4-6411d5c1bc65@marcan.st/
"Perf broke on all Apple ARM64 systems (tested almost everything), and
according to maz also on Juno (so, probably all big.LITTLE) since
v6.5."
6) sysfs/json events were made the priority over legacy to unbreak
perf on Apple-M? CPUs, but only if the PMU is specified:
https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com
   Reported-by: Hector Martin <marcan@marcan.st>
   Signed-off-by: Ian Rogers <irogers@google.com>
   Tested-by: Hector Martin <marcan@marcan.st>
   Tested-by: Marc Zyngier <maz@kernel.org>
   Acked-by: Mark Rutland <mark.rutland@arm.com>

This gets us to the current code where I can trivially get an
inconsistency. Here on Intel with no PMU in the event name:
```
$ perf stat -vv -e cpu-cycles true
Using CPUID GenuineIntel-6-8D-1
Control descriptor is not initialized
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  size                             136
  config                           0 (PERF_COUNT_HW_CPU_CYCLES)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 752915  cpu -1  group_fd -1  flags 0x8 = 3
cpu-cycles: -1: 1293076 273429 273429
cpu-cycles: 1293076 273429 273429

 Performance counter stats for 'true':

         1,293,076      cpu-cycles

       0.000809752 seconds time elapsed

       0.000841000 seconds user
       0.000000000 seconds sys
```

Here with a PMU event name:
```
$ sudo perf stat -vv -e cpu/cpu-cycles/ true
Using CPUID GenuineIntel-6-8D-1
Attempt to add: cpu/cpu-cycles=0/
..after resolving event: cpu/event=0x3c/
Control descriptor is not initialized
------------------------------------------------------------
perf_event_attr:
  type                             4 (cpu)
  size                             136
  config                           0x3c (cpu-cycles)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid 752839  cpu -1  group_fd -1  flags 0x8 = 3
cpu/cpu-cycles/: -1: 1421235 531150 531150
cpu/cpu-cycles/: 1421235 531150 531150

 Performance counter stats for 'true':

         1,421,235      cpu/cpu-cycles/

       0.001292908 seconds time elapsed

       0.001340000 seconds user
       0.000000000 seconds sys
```

That is the no PMU event is opened as type=0/config=0 (legacy) while
the PMU event is opened as type=4/config=0x3c (sysfs encoding). Now
let's cross our fingers and hope that in the driver they are really
the same thing. I take objection to the idea that there should be two
different priorities for sysfs/json and legacy depending on whether a
PMU is or isn't specified in the event name. The priority could be
legacy then sysfs/json, or it could be sysfs/json then legacy, but it
should be the same regardless of whether the PMU is put in the event
name. The PMU in the event name should be optional, for example we may
or may not show it in the stat output. The encoding being consistent
was the case prior to the Apple-M? fix and this patch aims to make it
consistent once more. Given the ARM bug mentioned above it should also
fix specifying or not the PMU on Apple-M? CPUs as it will avoid the
same legacy event issue that may only exist on old kernels. RISC-V is
motivated because of not wanting hard coded legacy events in the
driver for all potential vendors and models.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-10 21:33           ` Ian Rogers
@ 2025-01-13 20:51             ` Namhyung Kim
  2025-01-13 23:04               ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Namhyung Kim @ 2025-01-13 20:51 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Linus Torvalds, Peter Zijlstra,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Kan Liang, James Clark, Ze Gao, Weilin Wang,
	Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Atish Patra

Hi Ian,

On Fri, Jan 10, 2025 at 01:33:57PM -0800, Ian Rogers wrote:
> On Fri, Jan 10, 2025 at 11:26 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, Jan 10, 2025 at 08:42:02AM -0800, Ian Rogers wrote:
> > > On Fri, Jan 10, 2025 at 6:18 AM Arnaldo Carvalho de Melo
> > > <acme@kernel.org> wrote:
> > > >
> > > > Adding Linus to the CC list as he participated in this discussion in the
> > > > past, so a heads up about changes in this area that are being further
> > > > discussed.
> > >
> > > Linus blocks my email so I'm not sure of the point.
> >
> > That's unfortunate, but he should be able to see others' reply.
> >
> > >
> > > > On Thu, Jan 09, 2025 at 05:25:03PM -0800, Namhyung Kim wrote:
> > > > > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > > > > Whilst for many tools it is an expected behavior that failure to open
> > > > > > a perf event is a failure, ARM decided to name PMU events the same as
> > > > > > legacy events and then failed to rename such events on a server uncore
> > > > > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > > > > open the event on all PMUs that advertise/"have" the event, this
> > > > > > yielded failures when trying to make the priority of legacy and
> > > > > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > > > > legacy event user on ARM hardware may find their event opened on an
> > > > > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > > > > such events which this patch implements. Rather than have the skipping
> > > > > > conditional on running on ARM, the skipping is done on all
> > > > > > architectures as such a fundamental behavioral difference could lead
> > > > > > to problems with tools built/depending on perf.
> > > > > >
> > > > > > An example of perf record failing to open events on x86 is:
> > > > > > ```
> > > > > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > > > > Error:
> > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > >
> > > > > > Error:
> > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > >
> > > > > > Error:
> > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > The LLC-prefetch-read event is not supported.
> > > > > > [ perf record: Woken up 1 times to write data ]
> > > > > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> > > > >
> > > > > I'm afraid this can be too noisy.
> > > >
> > > > Agreed.
> > > >
> > > > > > $ perf report --stats
> > > > > > Aggregated stats:
> > > > > >                TOTAL events:      17255
> > > > > >                 MMAP events:        284  ( 1.6%)
> > > > > >                 COMM events:       1961  (11.4%)
> > > > > >                 EXIT events:          1  ( 0.0%)
> > > > > >                 FORK events:       1960  (11.4%)
> > > > > >               SAMPLE events:         87  ( 0.5%)
> > > > > >                MMAP2 events:      12836  (74.4%)
> > > > > >              KSYMBOL events:         83  ( 0.5%)
> > > > > >            BPF_EVENT events:         36  ( 0.2%)
> > > > > >       FINISHED_ROUND events:          2  ( 0.0%)
> > > > > >             ID_INDEX events:          1  ( 0.0%)
> > > > > >           THREAD_MAP events:          1  ( 0.0%)
> > > > > >              CPU_MAP events:          1  ( 0.0%)
> > > > > >            TIME_CONV events:          1  ( 0.0%)
> > > > > >        FINISHED_INIT events:          1  ( 0.0%)
> > > > > > cycles stats:
> > > > > >               SAMPLE events:         87
> > > > > > ```
> > > > > >
> > > > > > If all events fail to open then the perf record will fail:
> > > > > > ```
> > > > > > $ perf record -e LLC-prefetch-read true
> > > > > > Error:
> > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > The LLC-prefetch-read event is not supported.
> > > > > > Error:
> > > > > > Failure to open any events for recording
> > > > > > ```
> > > > > >
> > > > > > As an evlist may have dummy events that open when all command line
> > > > > > events fail we ignore dummy events when detecting if at least some
> > > > > > events open. This still permits the dummy event on its own to be used
> > > > > > as a permission check:
> > > > > > ```
> > > > > > $ perf record -e dummy true
> > > > > > [ perf record: Woken up 1 times to write data ]
> > > > > > [ perf record: Captured and wrote 0.046 MB perf.data ]
> > > > > > ```
> > > > > > but allows failure when a dummy event is implicilty inserted or when
> > > > > > there are insufficient permissions to open it:
> > > > > > ```
> > > > > > $ perf record -e LLC-prefetch-read -a true
> > > > > > Error:
> > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > The LLC-prefetch-read event is not supported.
> > > > > > Error:
> > > > > > Failure to open any events for recording
> > > > > > ```
> > > > > >
> > > > > > The issue with legacy events is that on RISC-V they want the driver to
> > > > > > not have mappings from legacy to non-legacy config encodings for each
> > > > > > vendor/model due to size, complexity and difficulty to update. It was
> > > > > > reported that on ARM Apple-M? CPUs the legacy mapping in the driver
> > > > > > was broken and the sysfs/json events should always take precedent,
> > > > > > however, it isn't clear this is still the case. It is the case that
> > > > > > without working around this issue a legacy event like cycles without a
> > > > > > PMU can encode differently than when specified with a PMU - the
> > > > > > non-PMU version favoring legacy encodings, the PMU one avoiding legacy
> > > > > > encodings.
> > > > > >
> > > > > > The patch removes events and then adjusts the idx value for each
> > > > > > evsel. This is done so that the dense xyarrays used for file
> > > > > > descriptors, etc. don't contain broken entries. As event opening
> > > > > > happens relatively late in the record process, use of the idx value
> > > > > > before the open will have become corrupted, so it is expected there
> > > > > > are latent bugs hidden behind this change - the change is best
> > > > > > effort. As the only vendor that has broken event names is ARM, this
> > > > > > will principally effect ARM users. They will also experience warning
> > > > > > messages like those above because of the uncore PMU advertising legacy
> > > > > > event names.
> > > > > >
> > > > > > Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> > > > > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > > > > Tested-by: James Clark <james.clark@linaro.org>
> > > > > > Tested-by: Leo Yan <leo.yan@arm.com>
> > > > > > Tested-by: Atish Patra <atishp@rivosinc.com>
> > > > > > ---
> > > > > >  tools/perf/builtin-record.c | 47 ++++++++++++++++++++++++++++++++-----
> > > > > >  1 file changed, 41 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > > > > > index 5db1aedf48df..c0b8249a3787 100644
> > > > > > --- a/tools/perf/builtin-record.c
> > > > > > +++ b/tools/perf/builtin-record.c
> > > > > > @@ -961,7 +961,6 @@ static int record__config_tracking_events(struct record *rec)
> > > > > >      */
> > > > > >     if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
> > > > > >         perf_pmus__num_core_pmus() > 1) {
> > > > > > -
> > > > > >             /*
> > > > > >              * User space tasks can migrate between CPUs, so when tracing
> > > > > >              * selected CPUs, sideband for all CPUs is still needed.
> > > > > > @@ -1366,6 +1365,7 @@ static int record__open(struct record *rec)
> > > > > >     struct perf_session *session = rec->session;
> > > > > >     struct record_opts *opts = &rec->opts;
> > > > > >     int rc = 0;
> > > > > > +   bool skipped = false;
> > > > > >
> > > > > >     evlist__for_each_entry(evlist, pos) {
> > > > > >  try_again:
> > > > > > @@ -1381,15 +1381,50 @@ static int record__open(struct record *rec)
> > > > > >                             pos = evlist__reset_weak_group(evlist, pos, true);
> > > > > >                             goto try_again;
> > > > > >                     }
> > > > > > -                   rc = -errno;
> > > > > >                     evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
> > > > > > -                   ui__error("%s\n", msg);
> > > > > > -                   goto out;
> > > > > > +                   ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
> > > > > > +                             evsel__name(pos), evsel__pmu_name(pos), msg);
> > > >
> > > > > How about changing it to pr_debug() and add below ...
> > > >
> > > > That sounds better.
> > > >
> > > > > > +                   pos->skippable = true;
> > > > > > +                   skipped = true;
> > > > > > +           } else {
> > > > > > +                   pos->supported = true;
> > > > > >             }
> > > > > > -
> > > > > > -           pos->supported = true;
> > > > > >     }
> > > > > >
> > > > > > +   if (skipped) {
> > > > > > +           struct evsel *tmp;
> > > > > > +           int idx = 0;
> > > > > > +           bool evlist_empty = true;
> > > > > > +
> > > > > > +           /* Remove evsels that failed to open and update indices. */
> > > > > > +           evlist__for_each_entry_safe(evlist, tmp, pos) {
> > > > > > +                   if (pos->skippable) {
> > > > > > +                           evlist__remove(evlist, pos);
> > > > > > +                           continue;
> > > > > > +                   }
> > > > > > +
> > > > > > +                   /*
> > > > > > +                    * Note, dummy events may be command line parsed or
> > > > > > +                    * added by the tool. We care about supporting `perf
> > > > > > +                    * record -e dummy` which may be used as a permission
> > > > > > +                    * check. Dummy events that are added to the command
> > > > > > +                    * line and opened along with other events that fail,
> > > > > > +                    * will still fail as if the dummy events were tool
> > > > > > +                    * added events for the sake of code simplicity.
> > > > > > +                    */
> > > > > > +                   if (!evsel__is_dummy_event(pos))
> > > > > > +                           evlist_empty = false;
> > > > > > +           }
> > > > > > +           evlist__for_each_entry(evlist, pos) {
> > > > > > +                   pos->core.idx = idx++;
> > > > > > +           }
> > > > > > +           /* If list is empty then fail. */
> > > > > > +           if (evlist_empty) {
> > > > > > +                   ui__error("Failure to open any events for recording.\n");
> > > > > > +                   rc = -1;
> > > > > > +                   goto out;
> > > > > > +           }
> > > >
> > > > > ... ?
> > > >
> > > > >               if (!verbose)
> > > > >                       ui__warning("Removed some unsupported events, use -v for details.\n");
> > > >
> > > > And even this one would be best left for cases where we can determine
> > > > that its a new situation, i.e. one that should work and not the ones we
> > > > know that will not work already and thus so far didn't alarm the user
> > > > into thinking something is wrong.
> > > >
> > > > Having the ones we know will fail as pr_debug() seems enough, I'd say.
> > >
> > > This means that:
> > > ```
> > > $ perf record -e data_read,LLC-prefetch-read -a sleep 0.1
> > > ```
> > > will fail (as data_read is a memory controller event and the LLC
> > > doesn't support sampling) with something like:
> > > ```
> > > Error:
> > > Failure to open any events for recording
> > > ```
> > > Which feels a bit minimal. As I already mentioned, it is also a
> > > behavior change and so has the potential to break scripts dependent on
> > > the failure information.
> >
> > I don't think it's about failure behavior, the concern is the error
> > messages.  It can take too much screen space when users give a long list
> > of invalid events.  And unfortunately the current error message for
> > checking dmesg is not very helpful.
> 
> Making the dmesg message more useful is a separate issue. The error

Sure.

> message only happens when things are broken and I think having an
> error message is better than none, or somehow having to know to wade
> through verbose output. I think this is very clear in:
> https://lore.kernel.org/lkml/CAP-5=fVr43v8gkqi8SXVaNKnkO+cooQVqx3xUFJ-BtgxGHX90g@mail.gmail.com/
> 
> > Anyway you can add this line too: "Use -v to see the details."
> 
> So silently failing and then expecting users to scrape verbose output
> is a fairly significant behavior change for the tool.

I'm not saying I want silent failures.  It should say it fails to parse
or open some events.  But I think it needs to care about repeating
failure messages.

> 
> > >
> > > A patch lowering the priority of error messages should be independent
> > > of the 4 changes here. I'd be happy if someone follows this series
> > > with a patch doing it.
> >
> > I think the error behavior is a part of this change.
> 
> I disagree with it, so I think you need to address my comments.

You are changing the error behavior by skipping failed events then the
relevant error messages should be handled properly in this patchset.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-10 19:52     ` Atish Kumar Patra
@ 2025-01-13 20:56       ` Namhyung Kim
  0 siblings, 0 replies; 53+ messages in thread
From: Namhyung Kim @ 2025-01-13 20:56 UTC (permalink / raw)
  To: Atish Kumar Patra
  Cc: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

Hello,

On Fri, Jan 10, 2025 at 11:52:47AM -0800, Atish Kumar Patra wrote:
> On Fri, Jan 10, 2025 at 11:40 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Thu, Jan 09, 2025 at 02:21:09PM -0800, Ian Rogers wrote:
> > > Originally posted and merged from:
> > > https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> > > This reverts commit 4f1b067359ac8364cdb7f9fda41085fa85789d0f although
> > > the patch is now smaller due to related fixes being applied in commit
> > > 22a4db3c3603 ("perf evsel: Add alternate_hw_config and use in
> > > evsel__match").
> > > The original commit message was:
> > >
> > > It was requested that RISC-V be able to add events to the perf tool so
> > > the PMU driver didn't need to map legacy events to config encodings:
> > > https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> > >
> > > This change makes the priority of events specified without a PMU the
> > > same as those specified with a PMU, namely sysfs and JSON events are
> > > checked first before using the legacy encoding.
> >
> > I'm still not convinced why we need this change despite of these
> > troubles.  If it's because RISC-V cannot define the lagacy hardware
> > events in the kernel driver, why not using a different name in JSON and
> 
> When the discussion happened a year back. we tried to avoid defining
> the legacy hardware events in
> the kernel driver. However, we agreed that we have to define it
> anyways for other reasons (legacy usage + virtualization)
> as described here[1]. I have improved the driver in such a way that it
> can handle both legacy events from the
> driver or json file (via this patch) if available. If this patch is
> available, a platform vendor can choose to encode the legacy events in
> json.
> Otherwise, it has to specify them in the driver. I will try to send
> the series today/tomorrow.

Ok, thanks for the update.

> 
> This patch will help avoid proliferation of usage of legacy events in
> the long run. But it is no longer absolutely necessary for RISC-V.
> If this patch is accepted, there is a hope that we can get rid of the
> specifying encodings in the driver in the distant future. However, we
> have
> to define them in the driver for reasons described in[1].
> 
> [1] https://lore.kernel.org/lkml/20241026121758.143259-1-irogers@google.com/T/#m653a6b98919a365a361a698032502bd26af9f6ba
> > ask users to use the name specifically?  Something like:
> >
> >   $ perf record -e riscv-cycles ...
> >
> 
> That was the first alternative I proposed back in 2022 plumbers :).

I see, sorry for missing the earlier discussion.

Thanks,
Namhyung


> But it was concluded that we don't want users to learn new ways
> of running perf in RISC-V which makes sense to me as well.


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-10 22:15     ` Ian Rogers
@ 2025-01-13 22:01       ` Namhyung Kim
  2025-01-13 22:51         ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Namhyung Kim @ 2025-01-13 22:01 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Atish Patra, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Fri, Jan 10, 2025 at 02:15:18PM -0800, Ian Rogers wrote:
> On Fri, Jan 10, 2025 at 11:40 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Thu, Jan 09, 2025 at 02:21:09PM -0800, Ian Rogers wrote:
> > > Originally posted and merged from:
> > > https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> > > This reverts commit 4f1b067359ac8364cdb7f9fda41085fa85789d0f although
> > > the patch is now smaller due to related fixes being applied in commit
> > > 22a4db3c3603 ("perf evsel: Add alternate_hw_config and use in
> > > evsel__match").
> > > The original commit message was:
> > >
> > > It was requested that RISC-V be able to add events to the perf tool so
> > > the PMU driver didn't need to map legacy events to config encodings:
> > > https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> > >
> > > This change makes the priority of events specified without a PMU the
> > > same as those specified with a PMU, namely sysfs and JSON events are
> > > checked first before using the legacy encoding.
> >
> > I'm still not convinced why we need this change despite of these
> > troubles.  If it's because RISC-V cannot define the lagacy hardware
> > events in the kernel driver, why not using a different name in JSON and
> > ask users to use the name specifically?  Something like:
> >
> >   $ perf record -e riscv-cycles ...
> 
> So ARM and RISC-V are more than able to speak for themselves and have
> their tags on the series, but let's recap why I'm motivated to do this
> change:
> 
> 1) perf supported legacy events;
> 2) perf supported sysfs and json events, but at a lower priority than
> legacy events;
> 3) hybrid support was added but in a way where all the hybrid PMUs
> needed to be known, assumptions about PMU were implicit and baked into
> the tool;
> 4) metric support for hybrid was going in a similar implicit direction
> and I objected, what would cycles mean in a metric if the core PMU was

If the legacy cycles event in a metric is a problem, can we change the
metric to be more specific?


> implicit? Rather than pursue this the hybrid code was overhauled, PMUs
> became more of a thing and we added a notion of a "core" PMU which
> would support legacy events;
> 5) ARM core PMUs differ in naming, etc. than just about every other
> platform. Their core events had been being programmed as if they were
> uncore events - ie without the legacy priority. Fixing hybrid, and
> fixing ARM PMUs to know they supported legacy events, broke perf on
> Apple-M? series due to a PMU driver issue with legacy events:
> https://lore.kernel.org/lkml/08f1f185-e259-4014-9ca4-6411d5c1bc65@marcan.st/
> "Perf broke on all Apple ARM64 systems (tested almost everything), and
> according to maz also on Juno (so, probably all big.LITTLE) since
> v6.5."
> 6) sysfs/json events were made the priority over legacy to unbreak
> perf on Apple-M? CPUs, but only if the PMU is specified:
> https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com
>    Reported-by: Hector Martin <marcan@marcan.st>
>    Signed-off-by: Ian Rogers <irogers@google.com>
>    Tested-by: Hector Martin <marcan@marcan.st>
>    Tested-by: Marc Zyngier <maz@kernel.org>
>    Acked-by: Mark Rutland <mark.rutland@arm.com>

I think ARM/Apple-Mx is fine without this change, right?

> 
> This gets us to the current code where I can trivially get an
> inconsistency. Here on Intel with no PMU in the event name:
> ```
> $ perf stat -vv -e cpu-cycles true
> Using CPUID GenuineIntel-6-8D-1
> Control descriptor is not initialized
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             136
>   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 752915  cpu -1  group_fd -1  flags 0x8 = 3
> cpu-cycles: -1: 1293076 273429 273429
> cpu-cycles: 1293076 273429 273429
> 
>  Performance counter stats for 'true':
> 
>          1,293,076      cpu-cycles
> 
>        0.000809752 seconds time elapsed
> 
>        0.000841000 seconds user
>        0.000000000 seconds sys
> ```
> 
> Here with a PMU event name:
> ```
> $ sudo perf stat -vv -e cpu/cpu-cycles/ true
> Using CPUID GenuineIntel-6-8D-1
> Attempt to add: cpu/cpu-cycles=0/
> ..after resolving event: cpu/event=0x3c/
> Control descriptor is not initialized
> ------------------------------------------------------------
> perf_event_attr:
>   type                             4 (cpu)
>   size                             136
>   config                           0x3c (cpu-cycles)
>   sample_type                      IDENTIFIER
>   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid 752839  cpu -1  group_fd -1  flags 0x8 = 3
> cpu/cpu-cycles/: -1: 1421235 531150 531150
> cpu/cpu-cycles/: 1421235 531150 531150
> 
>  Performance counter stats for 'true':
> 
>          1,421,235      cpu/cpu-cycles/
> 
>        0.001292908 seconds time elapsed
> 
>        0.001340000 seconds user
>        0.000000000 seconds sys
> ```
> 
> That is the no PMU event is opened as type=0/config=0 (legacy) while
> the PMU event is opened as type=4/config=0x3c (sysfs encoding). Now

I'm not sure it's a problem.  I think it works as expected...?


> let's cross our fingers and hope that in the driver they are really
> the same thing. I take objection to the idea that there should be two
> different priorities for sysfs/json and legacy depending on whether a
> PMU is or isn't specified in the event name. The priority could be
> legacy then sysfs/json, or it could be sysfs/json then legacy, but it
> should be the same regardless of whether the PMU is put in the event

Well, I think having PMU name in the event is a big difference.  Legacy
events were there since Day 1, I guess it's natural to think that an
event without PMU name means a legacy event and others should come with
PMU names explicitly.

Thanks,
Namhyung


> name. The PMU in the event name should be optional, for example we may
> or may not show it in the stat output. The encoding being consistent
> was the case prior to the Apple-M? fix and this patch aims to make it
> consistent once more. Given the ARM bug mentioned above it should also
> fix specifying or not the PMU on Apple-M? CPUs as it will avoid the
> same legacy event issue that may only exist on old kernels. RISC-V is
> motivated because of not wanting hard coded legacy events in the
> driver for all potential vendors and models.
> 
> Thanks,
> Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-13 22:01       ` Namhyung Kim
@ 2025-01-13 22:51         ` Ian Rogers
  2025-01-14  2:31           ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-01-13 22:51 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Atish Patra, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Mon, Jan 13, 2025 at 2:01 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Jan 10, 2025 at 02:15:18PM -0800, Ian Rogers wrote:
> > On Fri, Jan 10, 2025 at 11:40 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Thu, Jan 09, 2025 at 02:21:09PM -0800, Ian Rogers wrote:
> > > > Originally posted and merged from:
> > > > https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> > > > This reverts commit 4f1b067359ac8364cdb7f9fda41085fa85789d0f although
> > > > the patch is now smaller due to related fixes being applied in commit
> > > > 22a4db3c3603 ("perf evsel: Add alternate_hw_config and use in
> > > > evsel__match").
> > > > The original commit message was:
> > > >
> > > > It was requested that RISC-V be able to add events to the perf tool so
> > > > the PMU driver didn't need to map legacy events to config encodings:
> > > > https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> > > >
> > > > This change makes the priority of events specified without a PMU the
> > > > same as those specified with a PMU, namely sysfs and JSON events are
> > > > checked first before using the legacy encoding.
> > >
> > > I'm still not convinced why we need this change despite of these
> > > troubles.  If it's because RISC-V cannot define the lagacy hardware
> > > events in the kernel driver, why not using a different name in JSON and
> > > ask users to use the name specifically?  Something like:
> > >
> > >   $ perf record -e riscv-cycles ...
> >
> > So ARM and RISC-V are more than able to speak for themselves and have
> > their tags on the series, but let's recap why I'm motivated to do this
> > change:
> >
> > 1) perf supported legacy events;
> > 2) perf supported sysfs and json events, but at a lower priority than
> > legacy events;
> > 3) hybrid support was added but in a way where all the hybrid PMUs
> > needed to be known, assumptions about PMU were implicit and baked into
> > the tool;
> > 4) metric support for hybrid was going in a similar implicit direction
> > and I objected, what would cycles mean in a metric if the core PMU was
>
> If the legacy cycles event in a metric is a problem, can we change the
> metric to be more specific?
>
>
> > implicit? Rather than pursue this the hybrid code was overhauled, PMUs
> > became more of a thing and we added a notion of a "core" PMU which
> > would support legacy events;
> > 5) ARM core PMUs differ in naming, etc. than just about every other
> > platform. Their core events had been being programmed as if they were
> > uncore events - ie without the legacy priority. Fixing hybrid, and
> > fixing ARM PMUs to know they supported legacy events, broke perf on
> > Apple-M? series due to a PMU driver issue with legacy events:
> > https://lore.kernel.org/lkml/08f1f185-e259-4014-9ca4-6411d5c1bc65@marcan.st/
> > "Perf broke on all Apple ARM64 systems (tested almost everything), and
> > according to maz also on Juno (so, probably all big.LITTLE) since
> > v6.5."
> > 6) sysfs/json events were made the priority over legacy to unbreak
> > perf on Apple-M? CPUs, but only if the PMU is specified:
> > https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com
> >    Reported-by: Hector Martin <marcan@marcan.st>
> >    Signed-off-by: Ian Rogers <irogers@google.com>
> >    Tested-by: Hector Martin <marcan@marcan.st>
> >    Tested-by: Marc Zyngier <maz@kernel.org>
> >    Acked-by: Mark Rutland <mark.rutland@arm.com>
>
> I think ARM/Apple-Mx is fine without this change, right?
>
> >
> > This gets us to the current code where I can trivially get an
> > inconsistency. Here on Intel with no PMU in the event name:
> > ```
> > $ perf stat -vv -e cpu-cycles true
> > Using CPUID GenuineIntel-6-8D-1
> > Control descriptor is not initialized
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             0 (PERF_TYPE_HARDWARE)
> >   size                             136
> >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 752915  cpu -1  group_fd -1  flags 0x8 = 3
> > cpu-cycles: -1: 1293076 273429 273429
> > cpu-cycles: 1293076 273429 273429
> >
> >  Performance counter stats for 'true':
> >
> >          1,293,076      cpu-cycles
> >
> >        0.000809752 seconds time elapsed
> >
> >        0.000841000 seconds user
> >        0.000000000 seconds sys
> > ```
> >
> > Here with a PMU event name:
> > ```
> > $ sudo perf stat -vv -e cpu/cpu-cycles/ true
> > Using CPUID GenuineIntel-6-8D-1
> > Attempt to add: cpu/cpu-cycles=0/
> > ..after resolving event: cpu/event=0x3c/
> > Control descriptor is not initialized
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             4 (cpu)
> >   size                             136
> >   config                           0x3c (cpu-cycles)
> >   sample_type                      IDENTIFIER
> >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> >   exclude_guest                    1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 752839  cpu -1  group_fd -1  flags 0x8 = 3
> > cpu/cpu-cycles/: -1: 1421235 531150 531150
> > cpu/cpu-cycles/: 1421235 531150 531150
> >
> >  Performance counter stats for 'true':
> >
> >          1,421,235      cpu/cpu-cycles/
> >
> >        0.001292908 seconds time elapsed
> >
> >        0.001340000 seconds user
> >        0.000000000 seconds sys
> > ```
> >
> > That is the no PMU event is opened as type=0/config=0 (legacy) while
> > the PMU event is opened as type=4/config=0x3c (sysfs encoding). Now
>
> I'm not sure it's a problem.  I think it works as expected...?
>
>
> > let's cross our fingers and hope that in the driver they are really
> > the same thing. I take objection to the idea that there should be two
> > different priorities for sysfs/json and legacy depending on whether a
> > PMU is or isn't specified in the event name. The priority could be
> > legacy then sysfs/json, or it could be sysfs/json then legacy, but it
> > should be the same regardless of whether the PMU is put in the event
>
> Well, I think having PMU name in the event is a big difference.  Legacy
> events were there since Day 1, I guess it's natural to think that an
> event without PMU name means a legacy event and others should come with
> PMU names explicitly.

So then we're breaking the event names by inserting a PMU name in
uniquify in the stat output:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/stat-display.c?h=perf-tools-next#n932

There was an explicit, and reviewed by Jiri and Arnaldo, intent with
the hybrid work that using a legacy event with a hybrid PMU, even
though the PMU doesn't advertise through json or sysfs the legacy
event, the perf tool supports it.

Making it so that events without PMUs are only legacy events just
doesn't work. There are far too many existing uses of non-legacy
events without PMU, the metrics contain 100s of examples.

Prior to switching json/sysfs to being the priority when a PMU is
specified, it was the case that all encodings were the same, with or
without a PMU.

I don't think there is anything natural about assuming things about
event names. Take cycles, cpu-cycles and cpu_cycles:
 - cycles on x86 is only encoded via a legacy event;
 - cpu-cycles on Intel exists as a sysfs event, but cpu-cycles is also
a legacy event name;
 - cpu_cycles exists as a sysfs event on ARM but doesn't have a
corresponding legacy event name.

The difference in meaning of an event name can be as subtle as the
difference between a hyphen and an underscore. Given that we can't
break everybody's `perf <command> -e <event name> ..` command name nor
should we break all the metrics, I think the most intuitive thing is
cycles behave the same with or without a PMU. For example, there may
be differences in accuracy between a fixed and generic counter and the
legacy event may only work with one counter because of this while the
sysfs/json event uses all the counters, or vice versa. As explained,
in output code the tool will or will not insert PMU names treating
them as not mattering. Currently they do matter as the parsing will
give different perf_event_attr and those can have differing kernel
behaviors. This patch fixes this.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-13 20:51             ` Namhyung Kim
@ 2025-01-13 23:04               ` Ian Rogers
  2025-01-15 17:31                 ` Namhyung Kim
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-01-13 23:04 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Linus Torvalds, Peter Zijlstra,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Kan Liang, James Clark, Ze Gao, Weilin Wang,
	Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Atish Patra

On Mon, Jan 13, 2025 at 12:51 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Ian,
>
> On Fri, Jan 10, 2025 at 01:33:57PM -0800, Ian Rogers wrote:
> > On Fri, Jan 10, 2025 at 11:26 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Fri, Jan 10, 2025 at 08:42:02AM -0800, Ian Rogers wrote:
> > > > On Fri, Jan 10, 2025 at 6:18 AM Arnaldo Carvalho de Melo
> > > > <acme@kernel.org> wrote:
> > > > >
> > > > > Adding Linus to the CC list as he participated in this discussion in the
> > > > > past, so a heads up about changes in this area that are being further
> > > > > discussed.
> > > >
> > > > Linus blocks my email so I'm not sure of the point.
> > >
> > > That's unfortunate, but he should be able to see others' reply.
> > >
> > > >
> > > > > On Thu, Jan 09, 2025 at 05:25:03PM -0800, Namhyung Kim wrote:
> > > > > > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > > > > > Whilst for many tools it is an expected behavior that failure to open
> > > > > > > a perf event is a failure, ARM decided to name PMU events the same as
> > > > > > > legacy events and then failed to rename such events on a server uncore
> > > > > > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > > > > > open the event on all PMUs that advertise/"have" the event, this
> > > > > > > yielded failures when trying to make the priority of legacy and
> > > > > > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > > > > > legacy event user on ARM hardware may find their event opened on an
> > > > > > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > > > > > such events which this patch implements. Rather than have the skipping
> > > > > > > conditional on running on ARM, the skipping is done on all
> > > > > > > architectures as such a fundamental behavioral difference could lead
> > > > > > > to problems with tools built/depending on perf.
> > > > > > >
> > > > > > > An example of perf record failing to open events on x86 is:
> > > > > > > ```
> > > > > > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > > > > > Error:
> > > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > > >
> > > > > > > Error:
> > > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > > >
> > > > > > > Error:
> > > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > > The LLC-prefetch-read event is not supported.
> > > > > > > [ perf record: Woken up 1 times to write data ]
> > > > > > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> > > > > >
> > > > > > I'm afraid this can be too noisy.
> > > > >
> > > > > Agreed.
> > > > >
> > > > > > > $ perf report --stats
> > > > > > > Aggregated stats:
> > > > > > >                TOTAL events:      17255
> > > > > > >                 MMAP events:        284  ( 1.6%)
> > > > > > >                 COMM events:       1961  (11.4%)
> > > > > > >                 EXIT events:          1  ( 0.0%)
> > > > > > >                 FORK events:       1960  (11.4%)
> > > > > > >               SAMPLE events:         87  ( 0.5%)
> > > > > > >                MMAP2 events:      12836  (74.4%)
> > > > > > >              KSYMBOL events:         83  ( 0.5%)
> > > > > > >            BPF_EVENT events:         36  ( 0.2%)
> > > > > > >       FINISHED_ROUND events:          2  ( 0.0%)
> > > > > > >             ID_INDEX events:          1  ( 0.0%)
> > > > > > >           THREAD_MAP events:          1  ( 0.0%)
> > > > > > >              CPU_MAP events:          1  ( 0.0%)
> > > > > > >            TIME_CONV events:          1  ( 0.0%)
> > > > > > >        FINISHED_INIT events:          1  ( 0.0%)
> > > > > > > cycles stats:
> > > > > > >               SAMPLE events:         87
> > > > > > > ```
> > > > > > >
> > > > > > > If all events fail to open then the perf record will fail:
> > > > > > > ```
> > > > > > > $ perf record -e LLC-prefetch-read true
> > > > > > > Error:
> > > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > > The LLC-prefetch-read event is not supported.
> > > > > > > Error:
> > > > > > > Failure to open any events for recording
> > > > > > > ```
> > > > > > >
> > > > > > > As an evlist may have dummy events that open when all command line
> > > > > > > events fail we ignore dummy events when detecting if at least some
> > > > > > > events open. This still permits the dummy event on its own to be used
> > > > > > > as a permission check:
> > > > > > > ```
> > > > > > > $ perf record -e dummy true
> > > > > > > [ perf record: Woken up 1 times to write data ]
> > > > > > > [ perf record: Captured and wrote 0.046 MB perf.data ]
> > > > > > > ```
> > > > > > > but allows failure when a dummy event is implicilty inserted or when
> > > > > > > there are insufficient permissions to open it:
> > > > > > > ```
> > > > > > > $ perf record -e LLC-prefetch-read -a true
> > > > > > > Error:
> > > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > > The LLC-prefetch-read event is not supported.
> > > > > > > Error:
> > > > > > > Failure to open any events for recording
> > > > > > > ```
> > > > > > >
> > > > > > > The issue with legacy events is that on RISC-V they want the driver to
> > > > > > > not have mappings from legacy to non-legacy config encodings for each
> > > > > > > vendor/model due to size, complexity and difficulty to update. It was
> > > > > > > reported that on ARM Apple-M? CPUs the legacy mapping in the driver
> > > > > > > was broken and the sysfs/json events should always take precedent,
> > > > > > > however, it isn't clear this is still the case. It is the case that
> > > > > > > without working around this issue a legacy event like cycles without a
> > > > > > > PMU can encode differently than when specified with a PMU - the
> > > > > > > non-PMU version favoring legacy encodings, the PMU one avoiding legacy
> > > > > > > encodings.
> > > > > > >
> > > > > > > The patch removes events and then adjusts the idx value for each
> > > > > > > evsel. This is done so that the dense xyarrays used for file
> > > > > > > descriptors, etc. don't contain broken entries. As event opening
> > > > > > > happens relatively late in the record process, use of the idx value
> > > > > > > before the open will have become corrupted, so it is expected there
> > > > > > > are latent bugs hidden behind this change - the change is best
> > > > > > > effort. As the only vendor that has broken event names is ARM, this
> > > > > > > will principally effect ARM users. They will also experience warning
> > > > > > > messages like those above because of the uncore PMU advertising legacy
> > > > > > > event names.
> > > > > > >
> > > > > > > Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> > > > > > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > > > > > Tested-by: James Clark <james.clark@linaro.org>
> > > > > > > Tested-by: Leo Yan <leo.yan@arm.com>
> > > > > > > Tested-by: Atish Patra <atishp@rivosinc.com>
> > > > > > > ---
> > > > > > >  tools/perf/builtin-record.c | 47 ++++++++++++++++++++++++++++++++-----
> > > > > > >  1 file changed, 41 insertions(+), 6 deletions(-)
> > > > > > >
> > > > > > > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > > > > > > index 5db1aedf48df..c0b8249a3787 100644
> > > > > > > --- a/tools/perf/builtin-record.c
> > > > > > > +++ b/tools/perf/builtin-record.c
> > > > > > > @@ -961,7 +961,6 @@ static int record__config_tracking_events(struct record *rec)
> > > > > > >      */
> > > > > > >     if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
> > > > > > >         perf_pmus__num_core_pmus() > 1) {
> > > > > > > -
> > > > > > >             /*
> > > > > > >              * User space tasks can migrate between CPUs, so when tracing
> > > > > > >              * selected CPUs, sideband for all CPUs is still needed.
> > > > > > > @@ -1366,6 +1365,7 @@ static int record__open(struct record *rec)
> > > > > > >     struct perf_session *session = rec->session;
> > > > > > >     struct record_opts *opts = &rec->opts;
> > > > > > >     int rc = 0;
> > > > > > > +   bool skipped = false;
> > > > > > >
> > > > > > >     evlist__for_each_entry(evlist, pos) {
> > > > > > >  try_again:
> > > > > > > @@ -1381,15 +1381,50 @@ static int record__open(struct record *rec)
> > > > > > >                             pos = evlist__reset_weak_group(evlist, pos, true);
> > > > > > >                             goto try_again;
> > > > > > >                     }
> > > > > > > -                   rc = -errno;
> > > > > > >                     evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
> > > > > > > -                   ui__error("%s\n", msg);
> > > > > > > -                   goto out;
> > > > > > > +                   ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
> > > > > > > +                             evsel__name(pos), evsel__pmu_name(pos), msg);
> > > > >
> > > > > > How about changing it to pr_debug() and add below ...
> > > > >
> > > > > That sounds better.
> > > > >
> > > > > > > +                   pos->skippable = true;
> > > > > > > +                   skipped = true;
> > > > > > > +           } else {
> > > > > > > +                   pos->supported = true;
> > > > > > >             }
> > > > > > > -
> > > > > > > -           pos->supported = true;
> > > > > > >     }
> > > > > > >
> > > > > > > +   if (skipped) {
> > > > > > > +           struct evsel *tmp;
> > > > > > > +           int idx = 0;
> > > > > > > +           bool evlist_empty = true;
> > > > > > > +
> > > > > > > +           /* Remove evsels that failed to open and update indices. */
> > > > > > > +           evlist__for_each_entry_safe(evlist, tmp, pos) {
> > > > > > > +                   if (pos->skippable) {
> > > > > > > +                           evlist__remove(evlist, pos);
> > > > > > > +                           continue;
> > > > > > > +                   }
> > > > > > > +
> > > > > > > +                   /*
> > > > > > > +                    * Note, dummy events may be command line parsed or
> > > > > > > +                    * added by the tool. We care about supporting `perf
> > > > > > > +                    * record -e dummy` which may be used as a permission
> > > > > > > +                    * check. Dummy events that are added to the command
> > > > > > > +                    * line and opened along with other events that fail,
> > > > > > > +                    * will still fail as if the dummy events were tool
> > > > > > > +                    * added events for the sake of code simplicity.
> > > > > > > +                    */
> > > > > > > +                   if (!evsel__is_dummy_event(pos))
> > > > > > > +                           evlist_empty = false;
> > > > > > > +           }
> > > > > > > +           evlist__for_each_entry(evlist, pos) {
> > > > > > > +                   pos->core.idx = idx++;
> > > > > > > +           }
> > > > > > > +           /* If list is empty then fail. */
> > > > > > > +           if (evlist_empty) {
> > > > > > > +                   ui__error("Failure to open any events for recording.\n");
> > > > > > > +                   rc = -1;
> > > > > > > +                   goto out;
> > > > > > > +           }
> > > > >
> > > > > > ... ?
> > > > >
> > > > > >               if (!verbose)
> > > > > >                       ui__warning("Removed some unsupported events, use -v for details.\n");
> > > > >
> > > > > And even this one would be best left for cases where we can determine
> > > > > that its a new situation, i.e. one that should work and not the ones we
> > > > > know that will not work already and thus so far didn't alarm the user
> > > > > into thinking something is wrong.
> > > > >
> > > > > Having the ones we know will fail as pr_debug() seems enough, I'd say.
> > > >
> > > > This means that:
> > > > ```
> > > > $ perf record -e data_read,LLC-prefetch-read -a sleep 0.1
> > > > ```
> > > > will fail (as data_read is a memory controller event and the LLC
> > > > doesn't support sampling) with something like:
> > > > ```
> > > > Error:
> > > > Failure to open any events for recording
> > > > ```
> > > > Which feels a bit minimal. As I already mentioned, it is also a
> > > > behavior change and so has the potential to break scripts dependent on
> > > > the failure information.
> > >
> > > I don't think it's about failure behavior, the concern is the error
> > > messages.  It can take too much screen space when users give a long list
> > > of invalid events.  And unfortunately the current error message for
> > > checking dmesg is not very helpful.
> >
> > Making the dmesg message more useful is a separate issue. The error
>
> Sure.
>
> > message only happens when things are broken and I think having an
> > error message is better than none, or somehow having to know to wade
> > through verbose output. I think this is very clear in:
> > https://lore.kernel.org/lkml/CAP-5=fVr43v8gkqi8SXVaNKnkO+cooQVqx3xUFJ-BtgxGHX90g@mail.gmail.com/
> >
> > > Anyway you can add this line too: "Use -v to see the details."
> >
> > So silently failing and then expecting users to scrape verbose output
> > is a fairly significant behavior change for the tool.
>
> I'm not saying I want silent failures.  It should say it fails to parse
> or open some events.  But I think it needs to care about repeating
> failure messages.
>
> >
> > > >
> > > > A patch lowering the priority of error messages should be independent
> > > > of the 4 changes here. I'd be happy if someone follows this series
> > > > with a patch doing it.
> > >
> > > I think the error behavior is a part of this change.
> >
> > I disagree with it, so I think you need to address my comments.
>
> You are changing the error behavior by skipping failed events then the
> relevant error messages should be handled properly in this patchset.

I'm not sure what you are asking and I'm not sure why it matters?
Previously you'd asked for all the output to be moved under verbose.

If I specify an event that doesn't work with perf record today then it
fails. With this patch it fails too. If that event is a core PMU event
then there will be an error message for each core PMU that doesn't
support the event. So I get 2 error messages on hybrid. This doesn't
feel egregious or warrant a new error message mechanism. I would like
it so that evsels supported 1 or more PMUs, in which case this would
be 1 error message.

If I specify perf record today on an uncore event then perf record
fails and I get 1 error message for the uncore PMU. The new behavior
will be to get 1 error message per uncore PMU. If I'm on a server with
10s of uncore PMUs then maybe the message is spammy, but the command
fails today and will continue to fail with this series. I don't see a
motivation to change or optimize for this case and again, evsels that
support >1 PMU would be the most appropriate fix.

The only case where there is no message today but would be with this
patch series is for cycles on ARM's neoverse. There will be one
warning for the evsel on the SLC PMU. That's one warning and not many.

As I've said, if you want a more elaborate error reporting system then
take these patches and add it to them. There's a larger refactor to
make evsels support >1 PMU that would clean up the many events on
server uncore PMUs issue, but that shouldn't be part of this series
nor gate it. If you are trying to perf record on uncore PMUs then you
already have problems and optimizing the error messages for your
mistake, I don't get why it matters?

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-13 22:51         ` Ian Rogers
@ 2025-01-14  2:31           ` Ian Rogers
  2025-01-15 17:59             ` Namhyung Kim
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-01-14  2:31 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Atish Patra, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Mon, Jan 13, 2025 at 2:51 PM Ian Rogers <irogers@google.com> wrote:
>
> On Mon, Jan 13, 2025 at 2:01 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, Jan 10, 2025 at 02:15:18PM -0800, Ian Rogers wrote:
> > > On Fri, Jan 10, 2025 at 11:40 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Thu, Jan 09, 2025 at 02:21:09PM -0800, Ian Rogers wrote:
> > > > > Originally posted and merged from:
> > > > > https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> > > > > This reverts commit 4f1b067359ac8364cdb7f9fda41085fa85789d0f although
> > > > > the patch is now smaller due to related fixes being applied in commit
> > > > > 22a4db3c3603 ("perf evsel: Add alternate_hw_config and use in
> > > > > evsel__match").
> > > > > The original commit message was:
> > > > >
> > > > > It was requested that RISC-V be able to add events to the perf tool so
> > > > > the PMU driver didn't need to map legacy events to config encodings:
> > > > > https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> > > > >
> > > > > This change makes the priority of events specified without a PMU the
> > > > > same as those specified with a PMU, namely sysfs and JSON events are
> > > > > checked first before using the legacy encoding.
> > > >
> > > > I'm still not convinced why we need this change despite of these
> > > > troubles.  If it's because RISC-V cannot define the lagacy hardware
> > > > events in the kernel driver, why not using a different name in JSON and
> > > > ask users to use the name specifically?  Something like:
> > > >
> > > >   $ perf record -e riscv-cycles ...
> > >
> > > So ARM and RISC-V are more than able to speak for themselves and have
> > > their tags on the series, but let's recap why I'm motivated to do this
> > > change:
> > >
> > > 1) perf supported legacy events;
> > > 2) perf supported sysfs and json events, but at a lower priority than
> > > legacy events;
> > > 3) hybrid support was added but in a way where all the hybrid PMUs
> > > needed to be known, assumptions about PMU were implicit and baked into
> > > the tool;
> > > 4) metric support for hybrid was going in a similar implicit direction
> > > and I objected, what would cycles mean in a metric if the core PMU was
> >
> > If the legacy cycles event in a metric is a problem, can we change the
> > metric to be more specific?
> >
> >
> > > implicit? Rather than pursue this the hybrid code was overhauled, PMUs
> > > became more of a thing and we added a notion of a "core" PMU which
> > > would support legacy events;
> > > 5) ARM core PMUs differ in naming, etc. than just about every other
> > > platform. Their core events had been being programmed as if they were
> > > uncore events - ie without the legacy priority. Fixing hybrid, and
> > > fixing ARM PMUs to know they supported legacy events, broke perf on
> > > Apple-M? series due to a PMU driver issue with legacy events:
> > > https://lore.kernel.org/lkml/08f1f185-e259-4014-9ca4-6411d5c1bc65@marcan.st/
> > > "Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > according to maz also on Juno (so, probably all big.LITTLE) since
> > > v6.5."
> > > 6) sysfs/json events were made the priority over legacy to unbreak
> > > perf on Apple-M? CPUs, but only if the PMU is specified:
> > > https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com
> > >    Reported-by: Hector Martin <marcan@marcan.st>
> > >    Signed-off-by: Ian Rogers <irogers@google.com>
> > >    Tested-by: Hector Martin <marcan@marcan.st>
> > >    Tested-by: Marc Zyngier <maz@kernel.org>
> > >    Acked-by: Mark Rutland <mark.rutland@arm.com>
> >
> > I think ARM/Apple-Mx is fine without this change, right?
> >
> > >
> > > This gets us to the current code where I can trivially get an
> > > inconsistency. Here on Intel with no PMU in the event name:
> > > ```
> > > $ perf stat -vv -e cpu-cycles true
> > > Using CPUID GenuineIntel-6-8D-1
> > > Control descriptor is not initialized
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             0 (PERF_TYPE_HARDWARE)
> > >   size                             136
> > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   enable_on_exec                   1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid 752915  cpu -1  group_fd -1  flags 0x8 = 3
> > > cpu-cycles: -1: 1293076 273429 273429
> > > cpu-cycles: 1293076 273429 273429
> > >
> > >  Performance counter stats for 'true':
> > >
> > >          1,293,076      cpu-cycles
> > >
> > >        0.000809752 seconds time elapsed
> > >
> > >        0.000841000 seconds user
> > >        0.000000000 seconds sys
> > > ```
> > >
> > > Here with a PMU event name:
> > > ```
> > > $ sudo perf stat -vv -e cpu/cpu-cycles/ true
> > > Using CPUID GenuineIntel-6-8D-1
> > > Attempt to add: cpu/cpu-cycles=0/
> > > ..after resolving event: cpu/event=0x3c/
> > > Control descriptor is not initialized
> > > ------------------------------------------------------------
> > > perf_event_attr:
> > >   type                             4 (cpu)
> > >   size                             136
> > >   config                           0x3c (cpu-cycles)
> > >   sample_type                      IDENTIFIER
> > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > >   disabled                         1
> > >   inherit                          1
> > >   enable_on_exec                   1
> > >   exclude_guest                    1
> > > ------------------------------------------------------------
> > > sys_perf_event_open: pid 752839  cpu -1  group_fd -1  flags 0x8 = 3
> > > cpu/cpu-cycles/: -1: 1421235 531150 531150
> > > cpu/cpu-cycles/: 1421235 531150 531150
> > >
> > >  Performance counter stats for 'true':
> > >
> > >          1,421,235      cpu/cpu-cycles/
> > >
> > >        0.001292908 seconds time elapsed
> > >
> > >        0.001340000 seconds user
> > >        0.000000000 seconds sys
> > > ```
> > >
> > > That is the no PMU event is opened as type=0/config=0 (legacy) while
> > > the PMU event is opened as type=4/config=0x3c (sysfs encoding). Now
> >
> > I'm not sure it's a problem.  I think it works as expected...?
> >
> >
> > > let's cross our fingers and hope that in the driver they are really
> > > the same thing. I take objection to the idea that there should be two
> > > different priorities for sysfs/json and legacy depending on whether a
> > > PMU is or isn't specified in the event name. The priority could be
> > > legacy then sysfs/json, or it could be sysfs/json then legacy, but it
> > > should be the same regardless of whether the PMU is put in the event
> >
> > Well, I think having PMU name in the event is a big difference.  Legacy
> > events were there since Day 1, I guess it's natural to think that an
> > event without PMU name means a legacy event and others should come with
> > PMU names explicitly.
>
> So then we're breaking the event names by inserting a PMU name in
> uniquify in the stat output:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/stat-display.c?h=perf-tools-next#n932
>
> There was an explicit, and reviewed by Jiri and Arnaldo, intent with
> the hybrid work that using a legacy event with a hybrid PMU, even
> though the PMU doesn't advertise through json or sysfs the legacy
> event, the perf tool supports it.
>
> Making it so that events without PMUs are only legacy events just
> doesn't work. There are far too many existing uses of non-legacy
> events without PMU, the metrics contain 100s of examples.
>
> Prior to switching json/sysfs to being the priority when a PMU is
> specified, it was the case that all encodings were the same, with or
> without a PMU.
>
> I don't think there is anything natural about assuming things about
> event names. Take cycles, cpu-cycles and cpu_cycles:
>  - cycles on x86 is only encoded via a legacy event;
>  - cpu-cycles on Intel exists as a sysfs event, but cpu-cycles is also
> a legacy event name;
>  - cpu_cycles exists as a sysfs event on ARM but doesn't have a
> corresponding legacy event name.
>
> The difference in meaning of an event name can be as subtle as the
> difference between a hyphen and an underscore. Given that we can't
> break everybody's `perf <command> -e <event name> ..` command name nor
> should we break all the metrics, I think the most intuitive thing is
> cycles behave the same with or without a PMU. For example, there may
> be differences in accuracy between a fixed and generic counter and the
> legacy event may only work with one counter because of this while the
> sysfs/json event uses all the counters, or vice versa. As explained,
> in output code the tool will or will not insert PMU names treating
> them as not mattering. Currently they do matter as the parsing will
> give different perf_event_attr and those can have differing kernel
> behaviors. This patch fixes this.

An extra thought and I may be special. I specify event names without
PMUs first (less typing*), I may then see multiple outputs in
primarily perf stat or see it when adding --per-core or -A, if I care
I can specify the event name with the PMU to reduce the perf stat
output. Having it that the event encoding changes between those two
executions I think is surprising and inconsistent behavior. I don't
mind if the behavior is sysfs/json then legacy (current behavior) or
legacy then sysfs/json (behavior before the ARM Apple-M fix), ARM and
RISC-V prefer (or have preferred) the sysfs/json then legacy approach
hence pursuing it here.

Thanks,
Ian

* The bash completion of events:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/perf-completion.sh?h=perf-tools-next#n172
also skips PMU names. I suspect it is only a minority of users who
specify a PMU when specifying an event and it would be a pretty major
behavior change for them to have to switch from say inst_retired.any
to cpu/inst_retired.any/, listing all PMUs for hybrid, etc. Tbh, I'm
not sure what consistent alternative is really being presented as
things get mentioned that are either obviously breaking existing users
(all non-legacy events needing a PMU..) or obviously confusing (like
making the difference between a dash and underscore significant).

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-10 19:18         ` Ian Rogers
@ 2025-01-14 19:29           ` Namhyung Kim
  2025-01-14 23:55             ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Namhyung Kim @ 2025-01-14 19:29 UTC (permalink / raw)
  To: Ian Rogers
  Cc: James Clark, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, Ze Gao, Weilin Wang,
	Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Atish Patra

On Fri, Jan 10, 2025 at 11:18:53AM -0800, Ian Rogers wrote:
> On Fri, Jan 10, 2025 at 10:55 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Thu, Jan 09, 2025 at 08:44:38PM -0800, Ian Rogers wrote:
> > > On Thu, Jan 9, 2025 at 5:25 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > > > Whilst for many tools it is an expected behavior that failure to open
> > > > > a perf event is a failure, ARM decided to name PMU events the same as
> > > > > legacy events and then failed to rename such events on a server uncore
> > > > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > > > open the event on all PMUs that advertise/"have" the event, this
> > > > > yielded failures when trying to make the priority of legacy and
> > > > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > > > legacy event user on ARM hardware may find their event opened on an
> > > > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > > > such events which this patch implements. Rather than have the skipping
> > > > > conditional on running on ARM, the skipping is done on all
> > > > > architectures as such a fundamental behavioral difference could lead
> > > > > to problems with tools built/depending on perf.
> > > > >
> > > > > An example of perf record failing to open events on x86 is:
> > > > > ```
> > > > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > > > Error:
> > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > "dmesg | grep -i perf" may provide additional information.
> > > > >
> > > > > Error:
> > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > "dmesg | grep -i perf" may provide additional information.
> > > > >
> > > > > Error:
> > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > The LLC-prefetch-read event is not supported.
> > > > > [ perf record: Woken up 1 times to write data ]
> > > > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> > > >
> > > > I'm afraid this can be too noisy.
> > >
> > > The intention is to be noisy:
> > > 1) it matches the existing behavior, anything else is potentially a regression;
> >
> > Well.. I think you're changing the behavior. :)  Also currently it just
> > fails on the first event so it won't be too much noisy.
> >
> >   $ perf record -e data_read,data_write,LLC-prefetch-read -a sleep 0.1
> >   event syntax error: 'data_read,data_write,LLC-prefetch-read'
> >                        \___ Bad event name
> >
> >   Unable to find event on a PMU of 'data_read'
> >   Run 'perf list' for a list of valid events
> >
> >    Usage: perf record [<options>] [<command>]
> >       or: perf record [<options>] -- <command> [<options>]
> >
> >       -e, --event <event>   event selector. use 'perf list' to list available events
> 
> Fwiw, this error is an event parsing error not an event opening error.
> You need to select an uncore event, I was using data_read which exists
> in the uncore_imc_free_running PMUs on Intel tigerlake. Here is the
> existing error message:
> ```
> $ perf record -e data_read -a true
> Error:
> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (data_read).
> "dmesg | grep -i perf" may provide additional information.
> ```
> and here it with the series:
> ```
> $ perf record -e data_read -a true
> Error:
> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0'
> which will be removed.
> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (data_read).
> "dmesg | grep -i perf" may provide additional information.
> 
> Error:
> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1'
> which will be removed.
> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (data_read).
> "dmesg | grep -i perf" may provide additional information.
> 
> Error:
> Failure to open any events for recording.
> ```
> and here is what it would be with pr_debug:
> ```
> $ perf record -e data_read -a true
> Error:
> Failure to open any events for recording.
> ```
> I believe this last output is worst because:
> 1) If not all events fail to open there is no error reported unless I
> know to run with -v, which will also bring a bunch more noise with it,

I suggested to add a warning if any (not all) of events failed to open.

  "Removed some unsupported events, use -v for details."


> 2) I don't see the PMU / event name and "Invalid argument" indicating
> what has gone wrong again unless I know to run with -v and get all the
> verbose noise with that.

I don't think single -v adds a lot of noise in the output.

> 
> Yes it is noisy on 1 platform for 1 event due to an ARM PMU event name
> bug that ARM should have long ago fixed. That should be fixed rather
> than hiding errors and making users think they are recording samples
> when silently they're not - or they need to search through verbose
> output to try to find out if something broke.

I'm not sure if it's a bug in the driver.  It happens because perf tool
changed the way it finds events - it used to look at the core PMUs only
if no PMU name was given, but now it searches every PMU, right?

> 
> > > 2) it only happens if trying to record on a PMU/event that doesn't
> > > support recording, something that is currently an error and so we're
> > > not motivated to change the behavior as no-one should be using it;
> >
> > It was caught by Linus, so we know at least one (very important) user.
> 
> If they care enough then specifying the PMU with the event will avoid
> any warning and has always been a fix for this issue. It was the first
> proposed workaround for Linus.

I guess that's what Linus said regression.

> 
> > > 3) for the wildcard case the only offender is ARM's SLC PMU and the
> > > appropriate fix there has always been to make the CPU cycle's event
> > > name match the bus_cycles event name by calling it cpu_cycles -
> > > something that doesn't conflict with a core PMU event name, the thing
> > > that has introduced all these problems, patches, long email exchanges,
> > > unfixed inconsistencies, etc.. If the errors aren't noisy then there
> > > is little motivation for the ARM SLC PMU's event name to be fixed.
> >
> > I understand your concern but I'm not sure it's the best way to fix the
> > issue.
> 
> Right, I'm similarly concerned about hiding legitimate warning/error
> messages because of 1 event on 1 PMU on 1 architecture because of how
> perf gets driven by 1 user. Yes, when you break you can wade through
> the verbose output but imo the verbose output was never intended to be
> used in that way.

Well, the verbose output is to debug when something doesn't go well, no?

Thanks,
Namhyung



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-14 19:29           ` Namhyung Kim
@ 2025-01-14 23:55             ` Ian Rogers
  2025-01-15 22:14               ` Namhyung Kim
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-01-14 23:55 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: James Clark, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, Ze Gao, Weilin Wang,
	Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Atish Patra

On Tue, Jan 14, 2025 at 11:29 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Jan 10, 2025 at 11:18:53AM -0800, Ian Rogers wrote:
> > On Fri, Jan 10, 2025 at 10:55 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Thu, Jan 09, 2025 at 08:44:38PM -0800, Ian Rogers wrote:
> > > > On Thu, Jan 9, 2025 at 5:25 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > > > > Whilst for many tools it is an expected behavior that failure to open
> > > > > > a perf event is a failure, ARM decided to name PMU events the same as
> > > > > > legacy events and then failed to rename such events on a server uncore
> > > > > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > > > > open the event on all PMUs that advertise/"have" the event, this
> > > > > > yielded failures when trying to make the priority of legacy and
> > > > > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > > > > legacy event user on ARM hardware may find their event opened on an
> > > > > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > > > > such events which this patch implements. Rather than have the skipping
> > > > > > conditional on running on ARM, the skipping is done on all
> > > > > > architectures as such a fundamental behavioral difference could lead
> > > > > > to problems with tools built/depending on perf.
> > > > > >
> > > > > > An example of perf record failing to open events on x86 is:
> > > > > > ```
> > > > > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > > > > Error:
> > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > >
> > > > > > Error:
> > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > >
> > > > > > Error:
> > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > The LLC-prefetch-read event is not supported.
> > > > > > [ perf record: Woken up 1 times to write data ]
> > > > > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> > > > >
> > > > > I'm afraid this can be too noisy.
> > > >
> > > > The intention is to be noisy:
> > > > 1) it matches the existing behavior, anything else is potentially a regression;
> > >
> > > Well.. I think you're changing the behavior. :)  Also currently it just
> > > fails on the first event so it won't be too much noisy.
> > >
> > >   $ perf record -e data_read,data_write,LLC-prefetch-read -a sleep 0.1
> > >   event syntax error: 'data_read,data_write,LLC-prefetch-read'
> > >                        \___ Bad event name
> > >
> > >   Unable to find event on a PMU of 'data_read'
> > >   Run 'perf list' for a list of valid events
> > >
> > >    Usage: perf record [<options>] [<command>]
> > >       or: perf record [<options>] -- <command> [<options>]
> > >
> > >       -e, --event <event>   event selector. use 'perf list' to list available events
> >
> > Fwiw, this error is an event parsing error not an event opening error.
> > You need to select an uncore event, I was using data_read which exists
> > in the uncore_imc_free_running PMUs on Intel tigerlake. Here is the
> > existing error message:
> > ```
> > $ perf record -e data_read -a true
> > Error:
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > for event (data_read).
> > "dmesg | grep -i perf" may provide additional information.
> > ```
> > and here it with the series:
> > ```
> > $ perf record -e data_read -a true
> > Error:
> > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0'
> > which will be removed.
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > for event (data_read).
> > "dmesg | grep -i perf" may provide additional information.
> >
> > Error:
> > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1'
> > which will be removed.
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > for event (data_read).
> > "dmesg | grep -i perf" may provide additional information.
> >
> > Error:
> > Failure to open any events for recording.
> > ```
> > and here is what it would be with pr_debug:
> > ```
> > $ perf record -e data_read -a true
> > Error:
> > Failure to open any events for recording.
> > ```
> > I believe this last output is worst because:
> > 1) If not all events fail to open there is no error reported unless I
> > know to run with -v, which will also bring a bunch more noise with it,
>
> I suggested to add a warning if any (not all) of events failed to open.
>
>   "Removed some unsupported events, use -v for details."
>
>
> > 2) I don't see the PMU / event name and "Invalid argument" indicating
> > what has gone wrong again unless I know to run with -v and get all the
> > verbose noise with that.
>
> I don't think single -v adds a lot of noise in the output.
>
> >
> > Yes it is noisy on 1 platform for 1 event due to an ARM PMU event name
> > bug that ARM should have long ago fixed. That should be fixed rather
> > than hiding errors and making users think they are recording samples
> > when silently they're not - or they need to search through verbose
> > output to try to find out if something broke.
>
> I'm not sure if it's a bug in the driver.  It happens because perf tool
> changed the way it finds events - it used to look at the core PMUs only
> if no PMU name was given, but now it searches every PMU, right?

So there is the ARM bug in the PMU driver that caused an issue with
the hybrid fixes done because of wanting to have metrics work for
hybrid. The bug is reported here:
https://lore.kernel.org/lkml/08f1f185-e259-4014-9ca4-6411d5c1bc65@marcan.st/
The events are apple_icestorm_pmu/cycles/ and
apple_firestorm_pmu/cycles/. The issue is that prior to fixing hybrid
the ARM PMUs looked like uncore PMUs and couldn't open a legacy event,
which was fine as they has sysfs events. When hybrid was fixed in the
tool, the tool would then try to open apple_icestorm_pmu/cycles/ and
apple_firestorm_pmu/cycles/ as legacy events - legacy having priority
over sysfs/json back then. The legacy mapping was broken in the PMU
driver. Now were everything working as intended, just the cycles event
would be specified on the command line and the event would be wildcard
opened on the apple_icestorm_pmu and apple_firestorm_pmu. I believe
this way would already use a legacy encoding and so to work around the
PMU driver bug people were specifying the PMU name to get the sysfs
encoding, but that only worked as the PMUs appeared to be uncore.

> >
> > > > 2) it only happens if trying to record on a PMU/event that doesn't
> > > > support recording, something that is currently an error and so we're
> > > > not motivated to change the behavior as no-one should be using it;
> > >
> > > It was caught by Linus, so we know at least one (very important) user.
> >
> > If they care enough then specifying the PMU with the event will avoid
> > any warning and has always been a fix for this issue. It was the first
> > proposed workaround for Linus.
>
> I guess that's what Linus said regression.

But a regression where? The tool's behavior is pretty clear, no PMU
the event will be tried on every PMU, give it a PMU and the event will
only be tried on that PMU, give it a PMU without a suffix and the
event will be opened on all PMUs that match the name with different
suffixes. I dislike the idea of  cpu-cycles implicitly being just for
core PMUs, but cpu_cycles being for all PMUs as the hyphen is a legacy
name and the underscore not. I dislike the idea of specifying a PMU
with uncore events as uncore events often already have a PMU within
their event name and it also breaks the universe. When trying to find
out what people mean by event names being implicitly associated with
PMUs I get told I'm throwing out "what ifs," when all I'm doing is
reading the code (that I wrote and I'm trying to fix) and trying to
figure out what behavior people want. What I don't want is
inconsistencies, events behaving differently in different scenarios
and the perf output's use of event names being inconsistent with the
parsing. RISC-V and ARM have wanted the syfs/json over legacy
priority, so I'm trying to get that landed.

Ultimately the original regression comes back to the ARM SLC PMU
advertising a cycles event when it should have been named cpu_cycles,
if for no other reason than uniformity with the bus_cycles name on the
same PMU. The change in perf's wildcard behavior exposed the latent
bug, that doesn't make the SLC PMU's event name not a bug. The change
here is to make seeing that bug non-terminal to running the program.

> >
> > > > 3) for the wildcard case the only offender is ARM's SLC PMU and the
> > > > appropriate fix there has always been to make the CPU cycle's event
> > > > name match the bus_cycles event name by calling it cpu_cycles -
> > > > something that doesn't conflict with a core PMU event name, the thing
> > > > that has introduced all these problems, patches, long email exchanges,
> > > > unfixed inconsistencies, etc.. If the errors aren't noisy then there
> > > > is little motivation for the ARM SLC PMU's event name to be fixed.
> > >
> > > I understand your concern but I'm not sure it's the best way to fix the
> > > issue.
> >
> > Right, I'm similarly concerned about hiding legitimate warning/error
> > messages because of 1 event on 1 PMU on 1 architecture because of how
> > perf gets driven by 1 user. Yes, when you break you can wade through
> > the verbose output but imo the verbose output was never intended to be
> > used in that way.
>
> Well, the verbose output is to debug when something doesn't go well, no?

The output isn't currently only enabled in verbose mode, so is this
wrong? You will only get extra warnings with this change if you do
anything wrong. For a hybrid system maybe you've gone from 1 warning
to 2, I fail to see a big deal. Yes if you try to do perf record on an
uncore server PMU with many instances you will potentially get many
warnings, but the behavior before and after is to fail and the user is
likely to figure out what the fix is in both cases, with more errors
they may appreciate better that the event was getting opened on many
PMUs. The trend for event parsing errors is to have more error
messages. We went from 1 to 2 in commit
a910e4666d61712840c78de33cc7f89de8affa78 and from 2 to many in commit
fd7b8e8fb20f51d60dfee7792806548f3c6a4c2c. The trend isn't to try to
move things into verbose only output and for things to silently (or
with little detail) fail for the user.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-13 23:04               ` Ian Rogers
@ 2025-01-15 17:31                 ` Namhyung Kim
  2025-01-15 17:56                   ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Namhyung Kim @ 2025-01-15 17:31 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Linus Torvalds, Peter Zijlstra,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Kan Liang, James Clark, Ze Gao, Weilin Wang,
	Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Atish Patra

On Mon, Jan 13, 2025 at 03:04:26PM -0800, Ian Rogers wrote:
> On Mon, Jan 13, 2025 at 12:51 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi Ian,
> >
> > On Fri, Jan 10, 2025 at 01:33:57PM -0800, Ian Rogers wrote:
> > > On Fri, Jan 10, 2025 at 11:26 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Fri, Jan 10, 2025 at 08:42:02AM -0800, Ian Rogers wrote:
> > > > > On Fri, Jan 10, 2025 at 6:18 AM Arnaldo Carvalho de Melo
> > > > > <acme@kernel.org> wrote:
> > > > > >
> > > > > > Adding Linus to the CC list as he participated in this discussion in the
> > > > > > past, so a heads up about changes in this area that are being further
> > > > > > discussed.
> > > > >
> > > > > Linus blocks my email so I'm not sure of the point.
> > > >
> > > > That's unfortunate, but he should be able to see others' reply.
> > > >
> > > > >
> > > > > > On Thu, Jan 09, 2025 at 05:25:03PM -0800, Namhyung Kim wrote:
> > > > > > > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > > > > > > Whilst for many tools it is an expected behavior that failure to open
> > > > > > > > a perf event is a failure, ARM decided to name PMU events the same as
> > > > > > > > legacy events and then failed to rename such events on a server uncore
> > > > > > > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > > > > > > open the event on all PMUs that advertise/"have" the event, this
> > > > > > > > yielded failures when trying to make the priority of legacy and
> > > > > > > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > > > > > > legacy event user on ARM hardware may find their event opened on an
> > > > > > > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > > > > > > such events which this patch implements. Rather than have the skipping
> > > > > > > > conditional on running on ARM, the skipping is done on all
> > > > > > > > architectures as such a fundamental behavioral difference could lead
> > > > > > > > to problems with tools built/depending on perf.
> > > > > > > >
> > > > > > > > An example of perf record failing to open events on x86 is:
> > > > > > > > ```
> > > > > > > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > > > > > > Error:
> > > > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > > > >
> > > > > > > > Error:
> > > > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > > > >
> > > > > > > > Error:
> > > > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > > > The LLC-prefetch-read event is not supported.
> > > > > > > > [ perf record: Woken up 1 times to write data ]
> > > > > > > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> > > > > > >
> > > > > > > I'm afraid this can be too noisy.
> > > > > >
> > > > > > Agreed.
> > > > > >
> > > > > > > > $ perf report --stats
> > > > > > > > Aggregated stats:
> > > > > > > >                TOTAL events:      17255
> > > > > > > >                 MMAP events:        284  ( 1.6%)
> > > > > > > >                 COMM events:       1961  (11.4%)
> > > > > > > >                 EXIT events:          1  ( 0.0%)
> > > > > > > >                 FORK events:       1960  (11.4%)
> > > > > > > >               SAMPLE events:         87  ( 0.5%)
> > > > > > > >                MMAP2 events:      12836  (74.4%)
> > > > > > > >              KSYMBOL events:         83  ( 0.5%)
> > > > > > > >            BPF_EVENT events:         36  ( 0.2%)
> > > > > > > >       FINISHED_ROUND events:          2  ( 0.0%)
> > > > > > > >             ID_INDEX events:          1  ( 0.0%)
> > > > > > > >           THREAD_MAP events:          1  ( 0.0%)
> > > > > > > >              CPU_MAP events:          1  ( 0.0%)
> > > > > > > >            TIME_CONV events:          1  ( 0.0%)
> > > > > > > >        FINISHED_INIT events:          1  ( 0.0%)
> > > > > > > > cycles stats:
> > > > > > > >               SAMPLE events:         87
> > > > > > > > ```
> > > > > > > >
> > > > > > > > If all events fail to open then the perf record will fail:
> > > > > > > > ```
> > > > > > > > $ perf record -e LLC-prefetch-read true
> > > > > > > > Error:
> > > > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > > > The LLC-prefetch-read event is not supported.
> > > > > > > > Error:
> > > > > > > > Failure to open any events for recording
> > > > > > > > ```
> > > > > > > >
> > > > > > > > As an evlist may have dummy events that open when all command line
> > > > > > > > events fail we ignore dummy events when detecting if at least some
> > > > > > > > events open. This still permits the dummy event on its own to be used
> > > > > > > > as a permission check:
> > > > > > > > ```
> > > > > > > > $ perf record -e dummy true
> > > > > > > > [ perf record: Woken up 1 times to write data ]
> > > > > > > > [ perf record: Captured and wrote 0.046 MB perf.data ]
> > > > > > > > ```
> > > > > > > > but allows failure when a dummy event is implicilty inserted or when
> > > > > > > > there are insufficient permissions to open it:
> > > > > > > > ```
> > > > > > > > $ perf record -e LLC-prefetch-read -a true
> > > > > > > > Error:
> > > > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > > > The LLC-prefetch-read event is not supported.
> > > > > > > > Error:
> > > > > > > > Failure to open any events for recording
> > > > > > > > ```
> > > > > > > >
> > > > > > > > The issue with legacy events is that on RISC-V they want the driver to
> > > > > > > > not have mappings from legacy to non-legacy config encodings for each
> > > > > > > > vendor/model due to size, complexity and difficulty to update. It was
> > > > > > > > reported that on ARM Apple-M? CPUs the legacy mapping in the driver
> > > > > > > > was broken and the sysfs/json events should always take precedent,
> > > > > > > > however, it isn't clear this is still the case. It is the case that
> > > > > > > > without working around this issue a legacy event like cycles without a
> > > > > > > > PMU can encode differently than when specified with a PMU - the
> > > > > > > > non-PMU version favoring legacy encodings, the PMU one avoiding legacy
> > > > > > > > encodings.
> > > > > > > >
> > > > > > > > The patch removes events and then adjusts the idx value for each
> > > > > > > > evsel. This is done so that the dense xyarrays used for file
> > > > > > > > descriptors, etc. don't contain broken entries. As event opening
> > > > > > > > happens relatively late in the record process, use of the idx value
> > > > > > > > before the open will have become corrupted, so it is expected there
> > > > > > > > are latent bugs hidden behind this change - the change is best
> > > > > > > > effort. As the only vendor that has broken event names is ARM, this
> > > > > > > > will principally effect ARM users. They will also experience warning
> > > > > > > > messages like those above because of the uncore PMU advertising legacy
> > > > > > > > event names.
> > > > > > > >
> > > > > > > > Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> > > > > > > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > > > > > > Tested-by: James Clark <james.clark@linaro.org>
> > > > > > > > Tested-by: Leo Yan <leo.yan@arm.com>
> > > > > > > > Tested-by: Atish Patra <atishp@rivosinc.com>
> > > > > > > > ---
> > > > > > > >  tools/perf/builtin-record.c | 47 ++++++++++++++++++++++++++++++++-----
> > > > > > > >  1 file changed, 41 insertions(+), 6 deletions(-)
> > > > > > > >
> > > > > > > > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > > > > > > > index 5db1aedf48df..c0b8249a3787 100644
> > > > > > > > --- a/tools/perf/builtin-record.c
> > > > > > > > +++ b/tools/perf/builtin-record.c
> > > > > > > > @@ -961,7 +961,6 @@ static int record__config_tracking_events(struct record *rec)
> > > > > > > >      */
> > > > > > > >     if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
> > > > > > > >         perf_pmus__num_core_pmus() > 1) {
> > > > > > > > -
> > > > > > > >             /*
> > > > > > > >              * User space tasks can migrate between CPUs, so when tracing
> > > > > > > >              * selected CPUs, sideband for all CPUs is still needed.
> > > > > > > > @@ -1366,6 +1365,7 @@ static int record__open(struct record *rec)
> > > > > > > >     struct perf_session *session = rec->session;
> > > > > > > >     struct record_opts *opts = &rec->opts;
> > > > > > > >     int rc = 0;
> > > > > > > > +   bool skipped = false;
> > > > > > > >
> > > > > > > >     evlist__for_each_entry(evlist, pos) {
> > > > > > > >  try_again:
> > > > > > > > @@ -1381,15 +1381,50 @@ static int record__open(struct record *rec)
> > > > > > > >                             pos = evlist__reset_weak_group(evlist, pos, true);
> > > > > > > >                             goto try_again;
> > > > > > > >                     }
> > > > > > > > -                   rc = -errno;
> > > > > > > >                     evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
> > > > > > > > -                   ui__error("%s\n", msg);
> > > > > > > > -                   goto out;
> > > > > > > > +                   ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
> > > > > > > > +                             evsel__name(pos), evsel__pmu_name(pos), msg);
> > > > > >
> > > > > > > How about changing it to pr_debug() and add below ...
> > > > > >
> > > > > > That sounds better.
> > > > > >
> > > > > > > > +                   pos->skippable = true;
> > > > > > > > +                   skipped = true;
> > > > > > > > +           } else {
> > > > > > > > +                   pos->supported = true;
> > > > > > > >             }
> > > > > > > > -
> > > > > > > > -           pos->supported = true;
> > > > > > > >     }
> > > > > > > >
> > > > > > > > +   if (skipped) {
> > > > > > > > +           struct evsel *tmp;
> > > > > > > > +           int idx = 0;
> > > > > > > > +           bool evlist_empty = true;
> > > > > > > > +
> > > > > > > > +           /* Remove evsels that failed to open and update indices. */
> > > > > > > > +           evlist__for_each_entry_safe(evlist, tmp, pos) {
> > > > > > > > +                   if (pos->skippable) {
> > > > > > > > +                           evlist__remove(evlist, pos);
> > > > > > > > +                           continue;
> > > > > > > > +                   }
> > > > > > > > +
> > > > > > > > +                   /*
> > > > > > > > +                    * Note, dummy events may be command line parsed or
> > > > > > > > +                    * added by the tool. We care about supporting `perf
> > > > > > > > +                    * record -e dummy` which may be used as a permission
> > > > > > > > +                    * check. Dummy events that are added to the command
> > > > > > > > +                    * line and opened along with other events that fail,
> > > > > > > > +                    * will still fail as if the dummy events were tool
> > > > > > > > +                    * added events for the sake of code simplicity.
> > > > > > > > +                    */
> > > > > > > > +                   if (!evsel__is_dummy_event(pos))
> > > > > > > > +                           evlist_empty = false;
> > > > > > > > +           }
> > > > > > > > +           evlist__for_each_entry(evlist, pos) {
> > > > > > > > +                   pos->core.idx = idx++;
> > > > > > > > +           }
> > > > > > > > +           /* If list is empty then fail. */
> > > > > > > > +           if (evlist_empty) {
> > > > > > > > +                   ui__error("Failure to open any events for recording.\n");
> > > > > > > > +                   rc = -1;
> > > > > > > > +                   goto out;
> > > > > > > > +           }
> > > > > >
> > > > > > > ... ?
> > > > > >
> > > > > > >               if (!verbose)
> > > > > > >                       ui__warning("Removed some unsupported events, use -v for details.\n");
> > > > > >
> > > > > > And even this one would be best left for cases where we can determine
> > > > > > that its a new situation, i.e. one that should work and not the ones we
> > > > > > know that will not work already and thus so far didn't alarm the user
> > > > > > into thinking something is wrong.
> > > > > >
> > > > > > Having the ones we know will fail as pr_debug() seems enough, I'd say.
> > > > >
> > > > > This means that:
> > > > > ```
> > > > > $ perf record -e data_read,LLC-prefetch-read -a sleep 0.1
> > > > > ```
> > > > > will fail (as data_read is a memory controller event and the LLC
> > > > > doesn't support sampling) with something like:
> > > > > ```
> > > > > Error:
> > > > > Failure to open any events for recording
> > > > > ```
> > > > > Which feels a bit minimal. As I already mentioned, it is also a
> > > > > behavior change and so has the potential to break scripts dependent on
> > > > > the failure information.
> > > >
> > > > I don't think it's about failure behavior, the concern is the error
> > > > messages.  It can take too much screen space when users give a long list
> > > > of invalid events.  And unfortunately the current error message for
> > > > checking dmesg is not very helpful.
> > >
> > > Making the dmesg message more useful is a separate issue. The error
> >
> > Sure.
> >
> > > message only happens when things are broken and I think having an
> > > error message is better than none, or somehow having to know to wade
> > > through verbose output. I think this is very clear in:
> > > https://lore.kernel.org/lkml/CAP-5=fVr43v8gkqi8SXVaNKnkO+cooQVqx3xUFJ-BtgxGHX90g@mail.gmail.com/
> > >
> > > > Anyway you can add this line too: "Use -v to see the details."
> > >
> > > So silently failing and then expecting users to scrape verbose output
> > > is a fairly significant behavior change for the tool.
> >
> > I'm not saying I want silent failures.  It should say it fails to parse
> > or open some events.  But I think it needs to care about repeating
> > failure messages.
> >
> > >
> > > > >
> > > > > A patch lowering the priority of error messages should be independent
> > > > > of the 4 changes here. I'd be happy if someone follows this series
> > > > > with a patch doing it.
> > > >
> > > > I think the error behavior is a part of this change.
> > >
> > > I disagree with it, so I think you need to address my comments.
> >
> > You are changing the error behavior by skipping failed events then the
> > relevant error messages should be handled properly in this patchset.
> 
> I'm not sure what you are asking and I'm not sure why it matters?
> Previously you'd asked for all the output to be moved under verbose.
> 
> If I specify an event that doesn't work with perf record today then it
> fails. With this patch it fails too. If that event is a core PMU event
> then there will be an error message for each core PMU that doesn't
> support the event. So I get 2 error messages on hybrid. This doesn't
> feel egregious or warrant a new error message mechanism. I would like
> it so that evsels supported 1 or more PMUs, in which case this would
> be 1 error message.
> 
> If I specify perf record today on an uncore event then perf record
> fails and I get 1 error message for the uncore PMU. The new behavior
> will be to get 1 error message per uncore PMU. If I'm on a server with
> 10s of uncore PMUs then maybe the message is spammy, but the command
> fails today and will continue to fail with this series. I don't see a
> motivation to change or optimize for this case and again, evsels that
> support >1 PMU would be the most appropriate fix.
> 
> The only case where there is no message today but would be with this
> patch series is for cycles on ARM's neoverse. There will be one
> warning for the evsel on the SLC PMU. That's one warning and not many.
> 
> As I've said, if you want a more elaborate error reporting system then
> take these patches and add it to them. There's a larger refactor to
> make evsels support >1 PMU that would clean up the many events on
> server uncore PMUs issue, but that shouldn't be part of this series
> nor gate it. If you are trying to perf record on uncore PMUs then you
> already have problems and optimizing the error messages for your
> mistake, I don't get why it matters?

What about with multiple events in the command line - one of them
failing with >1 PMUs and the command now succeeds?

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-15 17:31                 ` Namhyung Kim
@ 2025-01-15 17:56                   ` Ian Rogers
  2025-01-29 21:24                     ` Namhyung Kim
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-01-15 17:56 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Linus Torvalds, Peter Zijlstra,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Kan Liang, James Clark, Ze Gao, Weilin Wang,
	Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Atish Patra

On Wed, Jan 15, 2025 at 9:31 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Mon, Jan 13, 2025 at 03:04:26PM -0800, Ian Rogers wrote:
> > On Mon, Jan 13, 2025 at 12:51 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > Hi Ian,
> > >
> > > On Fri, Jan 10, 2025 at 01:33:57PM -0800, Ian Rogers wrote:
> > > > On Fri, Jan 10, 2025 at 11:26 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > On Fri, Jan 10, 2025 at 08:42:02AM -0800, Ian Rogers wrote:
> > > > > > On Fri, Jan 10, 2025 at 6:18 AM Arnaldo Carvalho de Melo
> > > > > > <acme@kernel.org> wrote:
> > > > > > >
> > > > > > > Adding Linus to the CC list as he participated in this discussion in the
> > > > > > > past, so a heads up about changes in this area that are being further
> > > > > > > discussed.
> > > > > >
> > > > > > Linus blocks my email so I'm not sure of the point.
> > > > >
> > > > > That's unfortunate, but he should be able to see others' reply.
> > > > >
> > > > > >
> > > > > > > On Thu, Jan 09, 2025 at 05:25:03PM -0800, Namhyung Kim wrote:
> > > > > > > > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > > > > > > > Whilst for many tools it is an expected behavior that failure to open
> > > > > > > > > a perf event is a failure, ARM decided to name PMU events the same as
> > > > > > > > > legacy events and then failed to rename such events on a server uncore
> > > > > > > > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > > > > > > > open the event on all PMUs that advertise/"have" the event, this
> > > > > > > > > yielded failures when trying to make the priority of legacy and
> > > > > > > > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > > > > > > > legacy event user on ARM hardware may find their event opened on an
> > > > > > > > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > > > > > > > such events which this patch implements. Rather than have the skipping
> > > > > > > > > conditional on running on ARM, the skipping is done on all
> > > > > > > > > architectures as such a fundamental behavioral difference could lead
> > > > > > > > > to problems with tools built/depending on perf.
> > > > > > > > >
> > > > > > > > > An example of perf record failing to open events on x86 is:
> > > > > > > > > ```
> > > > > > > > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > > > > > > > Error:
> > > > > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > > > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > > > > >
> > > > > > > > > Error:
> > > > > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > > > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > > > > >
> > > > > > > > > Error:
> > > > > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > > > > The LLC-prefetch-read event is not supported.
> > > > > > > > > [ perf record: Woken up 1 times to write data ]
> > > > > > > > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> > > > > > > >
> > > > > > > > I'm afraid this can be too noisy.
> > > > > > >
> > > > > > > Agreed.
> > > > > > >
> > > > > > > > > $ perf report --stats
> > > > > > > > > Aggregated stats:
> > > > > > > > >                TOTAL events:      17255
> > > > > > > > >                 MMAP events:        284  ( 1.6%)
> > > > > > > > >                 COMM events:       1961  (11.4%)
> > > > > > > > >                 EXIT events:          1  ( 0.0%)
> > > > > > > > >                 FORK events:       1960  (11.4%)
> > > > > > > > >               SAMPLE events:         87  ( 0.5%)
> > > > > > > > >                MMAP2 events:      12836  (74.4%)
> > > > > > > > >              KSYMBOL events:         83  ( 0.5%)
> > > > > > > > >            BPF_EVENT events:         36  ( 0.2%)
> > > > > > > > >       FINISHED_ROUND events:          2  ( 0.0%)
> > > > > > > > >             ID_INDEX events:          1  ( 0.0%)
> > > > > > > > >           THREAD_MAP events:          1  ( 0.0%)
> > > > > > > > >              CPU_MAP events:          1  ( 0.0%)
> > > > > > > > >            TIME_CONV events:          1  ( 0.0%)
> > > > > > > > >        FINISHED_INIT events:          1  ( 0.0%)
> > > > > > > > > cycles stats:
> > > > > > > > >               SAMPLE events:         87
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > > If all events fail to open then the perf record will fail:
> > > > > > > > > ```
> > > > > > > > > $ perf record -e LLC-prefetch-read true
> > > > > > > > > Error:
> > > > > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > > > > The LLC-prefetch-read event is not supported.
> > > > > > > > > Error:
> > > > > > > > > Failure to open any events for recording
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > > As an evlist may have dummy events that open when all command line
> > > > > > > > > events fail we ignore dummy events when detecting if at least some
> > > > > > > > > events open. This still permits the dummy event on its own to be used
> > > > > > > > > as a permission check:
> > > > > > > > > ```
> > > > > > > > > $ perf record -e dummy true
> > > > > > > > > [ perf record: Woken up 1 times to write data ]
> > > > > > > > > [ perf record: Captured and wrote 0.046 MB perf.data ]
> > > > > > > > > ```
> > > > > > > > > but allows failure when a dummy event is implicilty inserted or when
> > > > > > > > > there are insufficient permissions to open it:
> > > > > > > > > ```
> > > > > > > > > $ perf record -e LLC-prefetch-read -a true
> > > > > > > > > Error:
> > > > > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > > > > The LLC-prefetch-read event is not supported.
> > > > > > > > > Error:
> > > > > > > > > Failure to open any events for recording
> > > > > > > > > ```
> > > > > > > > >
> > > > > > > > > The issue with legacy events is that on RISC-V they want the driver to
> > > > > > > > > not have mappings from legacy to non-legacy config encodings for each
> > > > > > > > > vendor/model due to size, complexity and difficulty to update. It was
> > > > > > > > > reported that on ARM Apple-M? CPUs the legacy mapping in the driver
> > > > > > > > > was broken and the sysfs/json events should always take precedent,
> > > > > > > > > however, it isn't clear this is still the case. It is the case that
> > > > > > > > > without working around this issue a legacy event like cycles without a
> > > > > > > > > PMU can encode differently than when specified with a PMU - the
> > > > > > > > > non-PMU version favoring legacy encodings, the PMU one avoiding legacy
> > > > > > > > > encodings.
> > > > > > > > >
> > > > > > > > > The patch removes events and then adjusts the idx value for each
> > > > > > > > > evsel. This is done so that the dense xyarrays used for file
> > > > > > > > > descriptors, etc. don't contain broken entries. As event opening
> > > > > > > > > happens relatively late in the record process, use of the idx value
> > > > > > > > > before the open will have become corrupted, so it is expected there
> > > > > > > > > are latent bugs hidden behind this change - the change is best
> > > > > > > > > effort. As the only vendor that has broken event names is ARM, this
> > > > > > > > > will principally effect ARM users. They will also experience warning
> > > > > > > > > messages like those above because of the uncore PMU advertising legacy
> > > > > > > > > event names.
> > > > > > > > >
> > > > > > > > > Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> > > > > > > > > Signed-off-by: Ian Rogers <irogers@google.com>
> > > > > > > > > Tested-by: James Clark <james.clark@linaro.org>
> > > > > > > > > Tested-by: Leo Yan <leo.yan@arm.com>
> > > > > > > > > Tested-by: Atish Patra <atishp@rivosinc.com>
> > > > > > > > > ---
> > > > > > > > >  tools/perf/builtin-record.c | 47 ++++++++++++++++++++++++++++++++-----
> > > > > > > > >  1 file changed, 41 insertions(+), 6 deletions(-)
> > > > > > > > >
> > > > > > > > > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > > > > > > > > index 5db1aedf48df..c0b8249a3787 100644
> > > > > > > > > --- a/tools/perf/builtin-record.c
> > > > > > > > > +++ b/tools/perf/builtin-record.c
> > > > > > > > > @@ -961,7 +961,6 @@ static int record__config_tracking_events(struct record *rec)
> > > > > > > > >      */
> > > > > > > > >     if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
> > > > > > > > >         perf_pmus__num_core_pmus() > 1) {
> > > > > > > > > -
> > > > > > > > >             /*
> > > > > > > > >              * User space tasks can migrate between CPUs, so when tracing
> > > > > > > > >              * selected CPUs, sideband for all CPUs is still needed.
> > > > > > > > > @@ -1366,6 +1365,7 @@ static int record__open(struct record *rec)
> > > > > > > > >     struct perf_session *session = rec->session;
> > > > > > > > >     struct record_opts *opts = &rec->opts;
> > > > > > > > >     int rc = 0;
> > > > > > > > > +   bool skipped = false;
> > > > > > > > >
> > > > > > > > >     evlist__for_each_entry(evlist, pos) {
> > > > > > > > >  try_again:
> > > > > > > > > @@ -1381,15 +1381,50 @@ static int record__open(struct record *rec)
> > > > > > > > >                             pos = evlist__reset_weak_group(evlist, pos, true);
> > > > > > > > >                             goto try_again;
> > > > > > > > >                     }
> > > > > > > > > -                   rc = -errno;
> > > > > > > > >                     evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
> > > > > > > > > -                   ui__error("%s\n", msg);
> > > > > > > > > -                   goto out;
> > > > > > > > > +                   ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
> > > > > > > > > +                             evsel__name(pos), evsel__pmu_name(pos), msg);
> > > > > > >
> > > > > > > > How about changing it to pr_debug() and add below ...
> > > > > > >
> > > > > > > That sounds better.
> > > > > > >
> > > > > > > > > +                   pos->skippable = true;
> > > > > > > > > +                   skipped = true;
> > > > > > > > > +           } else {
> > > > > > > > > +                   pos->supported = true;
> > > > > > > > >             }
> > > > > > > > > -
> > > > > > > > > -           pos->supported = true;
> > > > > > > > >     }
> > > > > > > > >
> > > > > > > > > +   if (skipped) {
> > > > > > > > > +           struct evsel *tmp;
> > > > > > > > > +           int idx = 0;
> > > > > > > > > +           bool evlist_empty = true;
> > > > > > > > > +
> > > > > > > > > +           /* Remove evsels that failed to open and update indices. */
> > > > > > > > > +           evlist__for_each_entry_safe(evlist, tmp, pos) {
> > > > > > > > > +                   if (pos->skippable) {
> > > > > > > > > +                           evlist__remove(evlist, pos);
> > > > > > > > > +                           continue;
> > > > > > > > > +                   }
> > > > > > > > > +
> > > > > > > > > +                   /*
> > > > > > > > > +                    * Note, dummy events may be command line parsed or
> > > > > > > > > +                    * added by the tool. We care about supporting `perf
> > > > > > > > > +                    * record -e dummy` which may be used as a permission
> > > > > > > > > +                    * check. Dummy events that are added to the command
> > > > > > > > > +                    * line and opened along with other events that fail,
> > > > > > > > > +                    * will still fail as if the dummy events were tool
> > > > > > > > > +                    * added events for the sake of code simplicity.
> > > > > > > > > +                    */
> > > > > > > > > +                   if (!evsel__is_dummy_event(pos))
> > > > > > > > > +                           evlist_empty = false;
> > > > > > > > > +           }
> > > > > > > > > +           evlist__for_each_entry(evlist, pos) {
> > > > > > > > > +                   pos->core.idx = idx++;
> > > > > > > > > +           }
> > > > > > > > > +           /* If list is empty then fail. */
> > > > > > > > > +           if (evlist_empty) {
> > > > > > > > > +                   ui__error("Failure to open any events for recording.\n");
> > > > > > > > > +                   rc = -1;
> > > > > > > > > +                   goto out;
> > > > > > > > > +           }
> > > > > > >
> > > > > > > > ... ?
> > > > > > >
> > > > > > > >               if (!verbose)
> > > > > > > >                       ui__warning("Removed some unsupported events, use -v for details.\n");
> > > > > > >
> > > > > > > And even this one would be best left for cases where we can determine
> > > > > > > that its a new situation, i.e. one that should work and not the ones we
> > > > > > > know that will not work already and thus so far didn't alarm the user
> > > > > > > into thinking something is wrong.
> > > > > > >
> > > > > > > Having the ones we know will fail as pr_debug() seems enough, I'd say.
> > > > > >
> > > > > > This means that:
> > > > > > ```
> > > > > > $ perf record -e data_read,LLC-prefetch-read -a sleep 0.1
> > > > > > ```
> > > > > > will fail (as data_read is a memory controller event and the LLC
> > > > > > doesn't support sampling) with something like:
> > > > > > ```
> > > > > > Error:
> > > > > > Failure to open any events for recording
> > > > > > ```
> > > > > > Which feels a bit minimal. As I already mentioned, it is also a
> > > > > > behavior change and so has the potential to break scripts dependent on
> > > > > > the failure information.
> > > > >
> > > > > I don't think it's about failure behavior, the concern is the error
> > > > > messages.  It can take too much screen space when users give a long list
> > > > > of invalid events.  And unfortunately the current error message for
> > > > > checking dmesg is not very helpful.
> > > >
> > > > Making the dmesg message more useful is a separate issue. The error
> > >
> > > Sure.
> > >
> > > > message only happens when things are broken and I think having an
> > > > error message is better than none, or somehow having to know to wade
> > > > through verbose output. I think this is very clear in:
> > > > https://lore.kernel.org/lkml/CAP-5=fVr43v8gkqi8SXVaNKnkO+cooQVqx3xUFJ-BtgxGHX90g@mail.gmail.com/
> > > >
> > > > > Anyway you can add this line too: "Use -v to see the details."
> > > >
> > > > So silently failing and then expecting users to scrape verbose output
> > > > is a fairly significant behavior change for the tool.
> > >
> > > I'm not saying I want silent failures.  It should say it fails to parse
> > > or open some events.  But I think it needs to care about repeating
> > > failure messages.
> > >
> > > >
> > > > > >
> > > > > > A patch lowering the priority of error messages should be independent
> > > > > > of the 4 changes here. I'd be happy if someone follows this series
> > > > > > with a patch doing it.
> > > > >
> > > > > I think the error behavior is a part of this change.
> > > >
> > > > I disagree with it, so I think you need to address my comments.
> > >
> > > You are changing the error behavior by skipping failed events then the
> > > relevant error messages should be handled properly in this patchset.
> >
> > I'm not sure what you are asking and I'm not sure why it matters?
> > Previously you'd asked for all the output to be moved under verbose.
> >
> > If I specify an event that doesn't work with perf record today then it
> > fails. With this patch it fails too. If that event is a core PMU event
> > then there will be an error message for each core PMU that doesn't
> > support the event. So I get 2 error messages on hybrid. This doesn't
> > feel egregious or warrant a new error message mechanism. I would like
> > it so that evsels supported 1 or more PMUs, in which case this would
> > be 1 error message.
> >
> > If I specify perf record today on an uncore event then perf record
> > fails and I get 1 error message for the uncore PMU. The new behavior
> > will be to get 1 error message per uncore PMU. If I'm on a server with
> > 10s of uncore PMUs then maybe the message is spammy, but the command
> > fails today and will continue to fail with this series. I don't see a
> > motivation to change or optimize for this case and again, evsels that
> > support >1 PMU would be the most appropriate fix.
> >
> > The only case where there is no message today but would be with this
> > patch series is for cycles on ARM's neoverse. There will be one
> > warning for the evsel on the SLC PMU. That's one warning and not many.
> >
> > As I've said, if you want a more elaborate error reporting system then
> > take these patches and add it to them. There's a larger refactor to
> > make evsels support >1 PMU that would clean up the many events on
> > server uncore PMUs issue, but that shouldn't be part of this series
> > nor gate it. If you are trying to perf record on uncore PMUs then you
> > already have problems and optimizing the error messages for your
> > mistake, I don't get why it matters?
>
> What about with multiple events in the command line - one of them
> failing with >1 PMUs and the command now succeeds?

So this would be something like:
```
$ perf record -e cycles,instructions,data_read -a sleep 1
```
where data_read is an uncore PMU event. The current behavior is:
```
$ perf record -e cycles,instructions,data_read -a sleep 1
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (data_read).
"dmesg | grep -i perf" may provide additional information.
```
The new behavior is:
```
$ perf record -e cycles,instructions,data_read -a sleep 1
Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0'
which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (data_read).
"dmesg | grep -i perf" may provide additional information.

Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1'
which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument)
for event (data_read).
"dmesg | grep -i perf" may provide additional information.

[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 3.138 MB perf.data (11670 samples) ]
```

We know nobody does this, as the command currently fails. It succeeds
with this change, because that's the whole point of the change. I'm
not offended by seeing the event was being opened on >1 PMU. For the
only currently succeeding situation where this will now warn, the
cycles case on Neoverse because of the buggy event name in ARM's SLC
PMU, there will be 1 warning. For my example the appropriate fix is to
remove the data_read event. For the Neoverse case, specifying the PMU
resolves the issue until ARM fixes their driver.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-14  2:31           ` Ian Rogers
@ 2025-01-15 17:59             ` Namhyung Kim
  2025-01-15 21:20               ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Namhyung Kim @ 2025-01-15 17:59 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Atish Patra, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Mon, Jan 13, 2025 at 06:31:19PM -0800, Ian Rogers wrote:
> On Mon, Jan 13, 2025 at 2:51 PM Ian Rogers <irogers@google.com> wrote:
> >
> > On Mon, Jan 13, 2025 at 2:01 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Fri, Jan 10, 2025 at 02:15:18PM -0800, Ian Rogers wrote:
> > > > On Fri, Jan 10, 2025 at 11:40 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > On Thu, Jan 09, 2025 at 02:21:09PM -0800, Ian Rogers wrote:
> > > > > > Originally posted and merged from:
> > > > > > https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
> > > > > > This reverts commit 4f1b067359ac8364cdb7f9fda41085fa85789d0f although
> > > > > > the patch is now smaller due to related fixes being applied in commit
> > > > > > 22a4db3c3603 ("perf evsel: Add alternate_hw_config and use in
> > > > > > evsel__match").
> > > > > > The original commit message was:
> > > > > >
> > > > > > It was requested that RISC-V be able to add events to the perf tool so
> > > > > > the PMU driver didn't need to map legacy events to config encodings:
> > > > > > https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/
> > > > > >
> > > > > > This change makes the priority of events specified without a PMU the
> > > > > > same as those specified with a PMU, namely sysfs and JSON events are
> > > > > > checked first before using the legacy encoding.
> > > > >
> > > > > I'm still not convinced why we need this change despite of these
> > > > > troubles.  If it's because RISC-V cannot define the lagacy hardware
> > > > > events in the kernel driver, why not using a different name in JSON and
> > > > > ask users to use the name specifically?  Something like:
> > > > >
> > > > >   $ perf record -e riscv-cycles ...
> > > >
> > > > So ARM and RISC-V are more than able to speak for themselves and have
> > > > their tags on the series, but let's recap why I'm motivated to do this
> > > > change:
> > > >
> > > > 1) perf supported legacy events;
> > > > 2) perf supported sysfs and json events, but at a lower priority than
> > > > legacy events;
> > > > 3) hybrid support was added but in a way where all the hybrid PMUs
> > > > needed to be known, assumptions about PMU were implicit and baked into
> > > > the tool;
> > > > 4) metric support for hybrid was going in a similar implicit direction
> > > > and I objected, what would cycles mean in a metric if the core PMU was
> > >
> > > If the legacy cycles event in a metric is a problem, can we change the
> > > metric to be more specific?
> > >
> > >
> > > > implicit? Rather than pursue this the hybrid code was overhauled, PMUs
> > > > became more of a thing and we added a notion of a "core" PMU which
> > > > would support legacy events;
> > > > 5) ARM core PMUs differ in naming, etc. than just about every other
> > > > platform. Their core events had been being programmed as if they were
> > > > uncore events - ie without the legacy priority. Fixing hybrid, and
> > > > fixing ARM PMUs to know they supported legacy events, broke perf on
> > > > Apple-M? series due to a PMU driver issue with legacy events:
> > > > https://lore.kernel.org/lkml/08f1f185-e259-4014-9ca4-6411d5c1bc65@marcan.st/
> > > > "Perf broke on all Apple ARM64 systems (tested almost everything), and
> > > > according to maz also on Juno (so, probably all big.LITTLE) since
> > > > v6.5."
> > > > 6) sysfs/json events were made the priority over legacy to unbreak
> > > > perf on Apple-M? CPUs, but only if the PMU is specified:
> > > > https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com
> > > >    Reported-by: Hector Martin <marcan@marcan.st>
> > > >    Signed-off-by: Ian Rogers <irogers@google.com>
> > > >    Tested-by: Hector Martin <marcan@marcan.st>
> > > >    Tested-by: Marc Zyngier <maz@kernel.org>
> > > >    Acked-by: Mark Rutland <mark.rutland@arm.com>
> > >
> > > I think ARM/Apple-Mx is fine without this change, right?
> > >
> > > >
> > > > This gets us to the current code where I can trivially get an
> > > > inconsistency. Here on Intel with no PMU in the event name:
> > > > ```
> > > > $ perf stat -vv -e cpu-cycles true
> > > > Using CPUID GenuineIntel-6-8D-1
> > > > Control descriptor is not initialized
> > > > ------------------------------------------------------------
> > > > perf_event_attr:
> > > >   type                             0 (PERF_TYPE_HARDWARE)
> > > >   size                             136
> > > >   config                           0 (PERF_COUNT_HW_CPU_CYCLES)
> > > >   sample_type                      IDENTIFIER
> > > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > >   disabled                         1
> > > >   inherit                          1
> > > >   enable_on_exec                   1
> > > >   exclude_guest                    1
> > > > ------------------------------------------------------------
> > > > sys_perf_event_open: pid 752915  cpu -1  group_fd -1  flags 0x8 = 3
> > > > cpu-cycles: -1: 1293076 273429 273429
> > > > cpu-cycles: 1293076 273429 273429
> > > >
> > > >  Performance counter stats for 'true':
> > > >
> > > >          1,293,076      cpu-cycles
> > > >
> > > >        0.000809752 seconds time elapsed
> > > >
> > > >        0.000841000 seconds user
> > > >        0.000000000 seconds sys
> > > > ```
> > > >
> > > > Here with a PMU event name:
> > > > ```
> > > > $ sudo perf stat -vv -e cpu/cpu-cycles/ true
> > > > Using CPUID GenuineIntel-6-8D-1
> > > > Attempt to add: cpu/cpu-cycles=0/
> > > > ..after resolving event: cpu/event=0x3c/
> > > > Control descriptor is not initialized
> > > > ------------------------------------------------------------
> > > > perf_event_attr:
> > > >   type                             4 (cpu)
> > > >   size                             136
> > > >   config                           0x3c (cpu-cycles)
> > > >   sample_type                      IDENTIFIER
> > > >   read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
> > > >   disabled                         1
> > > >   inherit                          1
> > > >   enable_on_exec                   1
> > > >   exclude_guest                    1
> > > > ------------------------------------------------------------
> > > > sys_perf_event_open: pid 752839  cpu -1  group_fd -1  flags 0x8 = 3
> > > > cpu/cpu-cycles/: -1: 1421235 531150 531150
> > > > cpu/cpu-cycles/: 1421235 531150 531150
> > > >
> > > >  Performance counter stats for 'true':
> > > >
> > > >          1,421,235      cpu/cpu-cycles/
> > > >
> > > >        0.001292908 seconds time elapsed
> > > >
> > > >        0.001340000 seconds user
> > > >        0.000000000 seconds sys
> > > > ```
> > > >
> > > > That is the no PMU event is opened as type=0/config=0 (legacy) while
> > > > the PMU event is opened as type=4/config=0x3c (sysfs encoding). Now
> > >
> > > I'm not sure it's a problem.  I think it works as expected...?
> > >
> > >
> > > > let's cross our fingers and hope that in the driver they are really
> > > > the same thing. I take objection to the idea that there should be two
> > > > different priorities for sysfs/json and legacy depending on whether a
> > > > PMU is or isn't specified in the event name. The priority could be
> > > > legacy then sysfs/json, or it could be sysfs/json then legacy, but it
> > > > should be the same regardless of whether the PMU is put in the event
> > >
> > > Well, I think having PMU name in the event is a big difference.  Legacy
> > > events were there since Day 1, I guess it's natural to think that an
> > > event without PMU name means a legacy event and others should come with
> > > PMU names explicitly.
> >
> > So then we're breaking the event names by inserting a PMU name in
> > uniquify in the stat output:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/stat-display.c?h=perf-tools-next#n932
> >
> > There was an explicit, and reviewed by Jiri and Arnaldo, intent with
> > the hybrid work that using a legacy event with a hybrid PMU, even
> > though the PMU doesn't advertise through json or sysfs the legacy
> > event, the perf tool supports it.

I thought legacy events on hybrid were converted to PMU events.

> >
> > Making it so that events without PMUs are only legacy events just
> > doesn't work. There are far too many existing uses of non-legacy
> > events without PMU, the metrics contain 100s of examples.

That's unfortunate.  It'd be nice if metrics were written with PMU
names.

I have a question.  What if an event name in a metric matches to
multiple unrelated PMUs?

> >
> > Prior to switching json/sysfs to being the priority when a PMU is
> > specified, it was the case that all encodings were the same, with or
> > without a PMU.
> >
> > I don't think there is anything natural about assuming things about
> > event names. Take cycles, cpu-cycles and cpu_cycles:
> >  - cycles on x86 is only encoded via a legacy event;
> >  - cpu-cycles on Intel exists as a sysfs event, but cpu-cycles is also
> > a legacy event name;
> >  - cpu_cycles exists as a sysfs event on ARM but doesn't have a
> > corresponding legacy event name.

I think the behavior should be:

  cycles -> PERF_COUNT_HW_CPU_CYCLES
  cpu-cycles -> PERF_COUNT_HW_CPU_CYCLES
  cpu_cycles -> no legacy -> sysfs or json
  cpu/cycles/ -> sysfs or json
  cpu/cpu-cycles/ -> sysfs or json

Thanks,
Namhyung

> >
> > The difference in meaning of an event name can be as subtle as the
> > difference between a hyphen and an underscore. Given that we can't
> > break everybody's `perf <command> -e <event name> ..` command name nor
> > should we break all the metrics, I think the most intuitive thing is
> > cycles behave the same with or without a PMU. For example, there may
> > be differences in accuracy between a fixed and generic counter and the
> > legacy event may only work with one counter because of this while the
> > sysfs/json event uses all the counters, or vice versa. As explained,
> > in output code the tool will or will not insert PMU names treating
> > them as not mattering. Currently they do matter as the parsing will
> > give different perf_event_attr and those can have differing kernel
> > behaviors. This patch fixes this.
> 
> An extra thought and I may be special. I specify event names without
> PMUs first (less typing*), I may then see multiple outputs in
> primarily perf stat or see it when adding --per-core or -A, if I care
> I can specify the event name with the PMU to reduce the perf stat
> output. Having it that the event encoding changes between those two
> executions I think is surprising and inconsistent behavior. I don't
> mind if the behavior is sysfs/json then legacy (current behavior) or
> legacy then sysfs/json (behavior before the ARM Apple-M fix), ARM and
> RISC-V prefer (or have preferred) the sysfs/json then legacy approach
> hence pursuing it here.
> 
> Thanks,
> Ian
> 
> * The bash completion of events:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/perf-completion.sh?h=perf-tools-next#n172
> also skips PMU names. I suspect it is only a minority of users who
> specify a PMU when specifying an event and it would be a pretty major
> behavior change for them to have to switch from say inst_retired.any
> to cpu/inst_retired.any/, listing all PMUs for hybrid, etc. Tbh, I'm
> not sure what consistent alternative is really being presented as
> things get mentioned that are either obviously breaking existing users
> (all non-legacy events needing a PMU..) or obviously confusing (like
> making the difference between a dash and underscore significant).

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-15 17:59             ` Namhyung Kim
@ 2025-01-15 21:20               ` Ian Rogers
  2025-01-29 21:55                 ` Namhyung Kim
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-01-15 21:20 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Atish Patra, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Wed, Jan 15, 2025 at 9:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> > On Mon, Jan 13, 2025 at 2:51 PM Ian Rogers <irogers@google.com> wrote:
> > > There was an explicit, and reviewed by Jiri and Arnaldo, intent with
> > > the hybrid work that using a legacy event with a hybrid PMU, even
> > > though the PMU doesn't advertise through json or sysfs the legacy
> > > event, the perf tool supports it.
>
> I thought legacy events on hybrid were converted to PMU events.

No, when BIG.little was created nothing changed in perf events but
when Intel did hybrid they wanted to make the hybrid CPUs (atom and
performance) appear as if they were one type. The PMU event encodings
vary a lot for this on Intel, ARM has standards for the encoding.
Intel extended the legacy format to take a PMU type id:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/include/uapi/linux/perf_event.h?h=perf-tools-next#n41
"EEEEEEEE: PMU type ID"
that is in the top 32-bits of the config.

> > >
> > > Making it so that events without PMUs are only legacy events just
> > > doesn't work. There are far too many existing uses of non-legacy
> > > events without PMU, the metrics contain 100s of examples.
>
> That's unfortunate.  It'd be nice if metrics were written with PMU
> names.

But then we'd end up with things like on Intel:
UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD
becoming:
uncore_cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
or just:
cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
As a user the first works for me and doesn't have any ambiguity over
PMUs as the event name already encodes the PMU. AMD similarly place
the part of a pipeline into event names. Were we to break everybody by
requiring the PMU we'd also need to explain which PMU to use. Sites
with event lists (like https://perfmon-events.intel.com/) don't
explain the PMU and it'd be messy as on Intel you have a CHA PMU for
server chips but a CBOX on client chips, etc.

> I have a question.  What if an event name in a metric matches to
> multiple unrelated PMUs?

The metric may break or we'd aggregate the unrelated counts together.
Take a metric like IPC as "instructions/cycles", that metric should
work on a hybrid system as they have instructions and cycles. If you
used an event for instructions like inst_retired.any then maybe the
metric will fail on one kind of core that didn't have that event. Now
if we have accelerators advertising instructions and cycles events, we
should be able to compute the metric for the accelerator. What could
happen today is that the accelerator will have a cpumask of a single
CPU, we could aggregate the accelerator counter into the CPU event
with the same CPU as the cpumask, we'd end up with a weird quasi CPU
and accelerator IPC metric for that CPU. What should happen is that we
get an IPC for the accelerator and IPC for each hybrid core
independently, but the way we handle evsels, CPUs, PMUs is not really
set up for that. Hopefully getting a set of PMUs into the evsel will
clear that up. Assuming all of that is cleared up, is it wrong if the
IPC metric is computed for the accelerator if it was originally
written as a CPU metric? Not really. Could there be metrics where that
is the case? Probably, and specifying PMUs in the event names would be
a fix. There have also been proposals that we restrict the PMUs for
certain metrics. As event names are currently so distinct it isn't a
problem we've faced yet and it is not clear it is a problem other than
highlighting tech debt in areas of the tool like aggregation.

> > >
> > > Prior to switching json/sysfs to being the priority when a PMU is
> > > specified, it was the case that all encodings were the same, with or
> > > without a PMU.
> > >
> > > I don't think there is anything natural about assuming things about
> > > event names. Take cycles, cpu-cycles and cpu_cycles:
> > >  - cycles on x86 is only encoded via a legacy event;
> > >  - cpu-cycles on Intel exists as a sysfs event, but cpu-cycles is also
> > > a legacy event name;
> > >  - cpu_cycles exists as a sysfs event on ARM but doesn't have a
> > > corresponding legacy event name.
>
> I think the behavior should be:
>
>   cycles -> PERF_COUNT_HW_CPU_CYCLES
>   cpu-cycles -> PERF_COUNT_HW_CPU_CYCLES
>   cpu_cycles -> no legacy -> sysfs or json
>   cpu/cycles/ -> sysfs or json
>   cpu/cpu-cycles/ -> sysfs or json

So I disagree as if you add a PMU to an event name the encoding
shouldn't change:
1) This historically was perf's behavior.
2) Different event encodings can have different behaviors (broken in
some notable cases).
3) Intuitively what wildcarding does is try to open "*/event/" where *
is every possible PMU name. Having different event encodings is
breaking that intuition it could also break situations where you try
to assert equivalence based on type/config.
4) The legacy encodings were (are?) broken on ARM Apple M? CPUs,
that's why the priority was changed.
5) RISC-V would like the tool tackle the legacy to config mapping
challenge, rather than the PMU driver, given the potential diversity
of hardware implementations.

To this end we hosted RISC-V's perf people at Google and they
expressed that their preference was what this series does, and they
expressed this directly to you.

I don't think there would be an issue in this area if it wasn't for
Neoverse and Linus - that's why the revert happened. This change in
behavior was proposed by Arnaldo:
https://lore.kernel.org/lkml/ZlY0F_lmB37g10OK@x1/
and has tags from Intel, ARM and Rivos (RISC-V). I intend to carry it
in Google's tree.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-14 23:55             ` Ian Rogers
@ 2025-01-15 22:14               ` Namhyung Kim
  2025-01-15 22:40                 ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Namhyung Kim @ 2025-01-15 22:14 UTC (permalink / raw)
  To: Ian Rogers
  Cc: James Clark, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, Ze Gao, Weilin Wang,
	Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Atish Patra

On Tue, Jan 14, 2025 at 03:55:47PM -0800, Ian Rogers wrote:
> On Tue, Jan 14, 2025 at 11:29 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, Jan 10, 2025 at 11:18:53AM -0800, Ian Rogers wrote:
> > > On Fri, Jan 10, 2025 at 10:55 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Thu, Jan 09, 2025 at 08:44:38PM -0800, Ian Rogers wrote:
> > > > > On Thu, Jan 9, 2025 at 5:25 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > > > > > Whilst for many tools it is an expected behavior that failure to open
> > > > > > > a perf event is a failure, ARM decided to name PMU events the same as
> > > > > > > legacy events and then failed to rename such events on a server uncore
> > > > > > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > > > > > open the event on all PMUs that advertise/"have" the event, this
> > > > > > > yielded failures when trying to make the priority of legacy and
> > > > > > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > > > > > legacy event user on ARM hardware may find their event opened on an
> > > > > > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > > > > > such events which this patch implements. Rather than have the skipping
> > > > > > > conditional on running on ARM, the skipping is done on all
> > > > > > > architectures as such a fundamental behavioral difference could lead
> > > > > > > to problems with tools built/depending on perf.
> > > > > > >
> > > > > > > An example of perf record failing to open events on x86 is:
> > > > > > > ```
> > > > > > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > > > > > Error:
> > > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > > >
> > > > > > > Error:
> > > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > > >
> > > > > > > Error:
> > > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > > The LLC-prefetch-read event is not supported.
> > > > > > > [ perf record: Woken up 1 times to write data ]
> > > > > > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> > > > > >
> > > > > > I'm afraid this can be too noisy.
> > > > >
> > > > > The intention is to be noisy:
> > > > > 1) it matches the existing behavior, anything else is potentially a regression;
> > > >
> > > > Well.. I think you're changing the behavior. :)  Also currently it just
> > > > fails on the first event so it won't be too much noisy.
> > > >
> > > >   $ perf record -e data_read,data_write,LLC-prefetch-read -a sleep 0.1
> > > >   event syntax error: 'data_read,data_write,LLC-prefetch-read'
> > > >                        \___ Bad event name
> > > >
> > > >   Unable to find event on a PMU of 'data_read'
> > > >   Run 'perf list' for a list of valid events
> > > >
> > > >    Usage: perf record [<options>] [<command>]
> > > >       or: perf record [<options>] -- <command> [<options>]
> > > >
> > > >       -e, --event <event>   event selector. use 'perf list' to list available events
> > >
> > > Fwiw, this error is an event parsing error not an event opening error.
> > > You need to select an uncore event, I was using data_read which exists
> > > in the uncore_imc_free_running PMUs on Intel tigerlake. Here is the
> > > existing error message:
> > > ```
> > > $ perf record -e data_read -a true
> > > Error:
> > > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > > for event (data_read).
> > > "dmesg | grep -i perf" may provide additional information.
> > > ```
> > > and here it with the series:
> > > ```
> > > $ perf record -e data_read -a true
> > > Error:
> > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0'
> > > which will be removed.
> > > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > > for event (data_read).
> > > "dmesg | grep -i perf" may provide additional information.
> > >
> > > Error:
> > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1'
> > > which will be removed.
> > > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > > for event (data_read).
> > > "dmesg | grep -i perf" may provide additional information.
> > >
> > > Error:
> > > Failure to open any events for recording.
> > > ```
> > > and here is what it would be with pr_debug:
> > > ```
> > > $ perf record -e data_read -a true
> > > Error:
> > > Failure to open any events for recording.
> > > ```
> > > I believe this last output is worst because:
> > > 1) If not all events fail to open there is no error reported unless I
> > > know to run with -v, which will also bring a bunch more noise with it,
> >
> > I suggested to add a warning if any (not all) of events failed to open.
> >
> >   "Removed some unsupported events, use -v for details."
> >
> >
> > > 2) I don't see the PMU / event name and "Invalid argument" indicating
> > > what has gone wrong again unless I know to run with -v and get all the
> > > verbose noise with that.
> >
> > I don't think single -v adds a lot of noise in the output.
> >
> > >
> > > Yes it is noisy on 1 platform for 1 event due to an ARM PMU event name
> > > bug that ARM should have long ago fixed. That should be fixed rather
> > > than hiding errors and making users think they are recording samples
> > > when silently they're not - or they need to search through verbose
> > > output to try to find out if something broke.
> >
> > I'm not sure if it's a bug in the driver.  It happens because perf tool
> > changed the way it finds events - it used to look at the core PMUs only
> > if no PMU name was given, but now it searches every PMU, right?
> 
> So there is the ARM bug in the PMU driver that caused an issue with
> the hybrid fixes done because of wanting to have metrics work for
> hybrid. The bug is reported here:
> https://lore.kernel.org/lkml/08f1f185-e259-4014-9ca4-6411d5c1bc65@marcan.st/

I'm not sure if it's agreed to be called a PMU bug.
My understanding is it's the change in the perf tool that break it.


> The events are apple_icestorm_pmu/cycles/ and
> apple_firestorm_pmu/cycles/. The issue is that prior to fixing hybrid
> the ARM PMUs looked like uncore PMUs and couldn't open a legacy event,
> which was fine as they has sysfs events. When hybrid was fixed in the
> tool, the tool would then try to open apple_icestorm_pmu/cycles/ and
> apple_firestorm_pmu/cycles/ as legacy events - legacy having priority
> over sysfs/json back then. The legacy mapping was broken in the PMU

I don't know why you want to use legacy events (PERF_TYPE_HARDWARE)
when it has PMU in the event name and the PMU has a different type
enconding.


> driver. Now were everything working as intended, just the cycles event
> would be specified on the command line and the event would be wildcard
> opened on the apple_icestorm_pmu and apple_firestorm_pmu. I believe
> this way would already use a legacy encoding and so to work around the
> PMU driver bug people were specifying the PMU name to get the sysfs
> encoding, but that only worked as the PMUs appeared to be uncore.
> 
> > >
> > > > > 2) it only happens if trying to record on a PMU/event that doesn't
> > > > > support recording, something that is currently an error and so we're
> > > > > not motivated to change the behavior as no-one should be using it;
> > > >
> > > > It was caught by Linus, so we know at least one (very important) user.
> > >
> > > If they care enough then specifying the PMU with the event will avoid
> > > any warning and has always been a fix for this issue. It was the first
> > > proposed workaround for Linus.
> >
> > I guess that's what Linus said regression.
> 
> But a regression where? The tool's behavior is pretty clear, no PMU
> the event will be tried on every PMU, give it a PMU and the event will
> only be tried on that PMU, give it a PMU without a suffix and the
> event will be opened on all PMUs that match the name with different
> suffixes.

It may be clear to you but may not be to others.  When did the change
come in?  Before the change, people assume it would only try core PMU.
And the people can still have the idea if they haven't used any affected
events.  I guess many users would use legacy events only.


> I dislike the idea of  cpu-cycles implicitly being just for
> core PMUs, but cpu_cycles being for all PMUs as the hyphen is a legacy
> name and the underscore not.

That's because we specifically picked some names to be used as a legacy
event.  And it worked well.  If some PMU didn't use the name, it's their
fault and they should use PMU event with their name.


> I dislike the idea of specifying a PMU
> with uncore events as uncore events often already have a PMU within
> their event name and it also breaks the universe. 

Does the 'universe' mean 'metric'?

Having PMU name in the event name is their choice.  Do you see this in
sysfs or JSON?  Or both?

Actually I don't like the idea of trying every PMU if no PMU name is
given.  But you said reverting it would break metrics (I don't know if
there are other users rely on this behavior).  Maybe can we handle
metrics differently?

I guess we can put JSON events and metrics without PMU in a global name
space so that it can be searched (after legacy name) when users don't
specify PMUs in the command line.  Otherwise it should have PMU name
and sysfs event (then JSON events with PMU name) can be searched.

Does that make sense?


> When trying to find
> out what people mean by event names being implicitly associated with
> PMUs I get told I'm throwing out "what ifs," when all I'm doing is
> reading the code (that I wrote and I'm trying to fix) and trying to
> figure out what behavior people want. What I don't want is
> inconsistencies, events behaving differently in different scenarios
> and the perf output's use of event names being inconsistent with the
> parsing. RISC-V and ARM have wanted the syfs/json over legacy
> priority, so I'm trying to get that landed.

I'm not sure now RISC-V and ARM want it.  Or it needs to be more
specific what they want exactly.

> 
> Ultimately the original regression comes back to the ARM SLC PMU
> advertising a cycles event when it should have been named cpu_cycles,
> if for no other reason than uniformity with the bus_cycles name on the
> same PMU. The change in perf's wildcard behavior exposed the latent
> bug, that doesn't make the SLC PMU's event name not a bug. The change
> here is to make seeing that bug non-terminal to running the program.

I don't see it's a bug if uncore PMUs have an event named 'cycles' or
whatever.  It's just because perf record wanted to use it and that's
entirely tool's choice.

Thanks,
Namhyung

> 
> > >
> > > > > 3) for the wildcard case the only offender is ARM's SLC PMU and the
> > > > > appropriate fix there has always been to make the CPU cycle's event
> > > > > name match the bus_cycles event name by calling it cpu_cycles -
> > > > > something that doesn't conflict with a core PMU event name, the thing
> > > > > that has introduced all these problems, patches, long email exchanges,
> > > > > unfixed inconsistencies, etc.. If the errors aren't noisy then there
> > > > > is little motivation for the ARM SLC PMU's event name to be fixed.
> > > >
> > > > I understand your concern but I'm not sure it's the best way to fix the
> > > > issue.
> > >
> > > Right, I'm similarly concerned about hiding legitimate warning/error
> > > messages because of 1 event on 1 PMU on 1 architecture because of how
> > > perf gets driven by 1 user. Yes, when you break you can wade through
> > > the verbose output but imo the verbose output was never intended to be
> > > used in that way.
> >
> > Well, the verbose output is to debug when something doesn't go well, no?
> 
> The output isn't currently only enabled in verbose mode, so is this
> wrong? You will only get extra warnings with this change if you do
> anything wrong. For a hybrid system maybe you've gone from 1 warning
> to 2, I fail to see a big deal. Yes if you try to do perf record on an
> uncore server PMU with many instances you will potentially get many
> warnings, but the behavior before and after is to fail and the user is
> likely to figure out what the fix is in both cases, with more errors
> they may appreciate better that the event was getting opened on many
> PMUs. The trend for event parsing errors is to have more error
> messages. We went from 1 to 2 in commit
> a910e4666d61712840c78de33cc7f89de8affa78 and from 2 to many in commit
> fd7b8e8fb20f51d60dfee7792806548f3c6a4c2c. The trend isn't to try to
> move things into verbose only output and for things to silently (or
> with little detail) fail for the user.
> 
> Thanks,
> Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-15 22:14               ` Namhyung Kim
@ 2025-01-15 22:40                 ` Ian Rogers
  0 siblings, 0 replies; 53+ messages in thread
From: Ian Rogers @ 2025-01-15 22:40 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: James Clark, Leo Yan, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, Ze Gao, Weilin Wang,
	Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Atish Patra

On Wed, Jan 15, 2025 at 2:14 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Tue, Jan 14, 2025 at 03:55:47PM -0800, Ian Rogers wrote:
> > On Tue, Jan 14, 2025 at 11:29 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Fri, Jan 10, 2025 at 11:18:53AM -0800, Ian Rogers wrote:
> > > > On Fri, Jan 10, 2025 at 10:55 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > On Thu, Jan 09, 2025 at 08:44:38PM -0800, Ian Rogers wrote:
> > > > > > On Thu, Jan 9, 2025 at 5:25 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > > >
> > > > > > > On Thu, Jan 09, 2025 at 02:21:08PM -0800, Ian Rogers wrote:
> > > > > > > > Whilst for many tools it is an expected behavior that failure to open
> > > > > > > > a perf event is a failure, ARM decided to name PMU events the same as
> > > > > > > > legacy events and then failed to rename such events on a server uncore
> > > > > > > > SLC PMU. As perf's default behavior when no PMU is specified is to
> > > > > > > > open the event on all PMUs that advertise/"have" the event, this
> > > > > > > > yielded failures when trying to make the priority of legacy and
> > > > > > > > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > > > > > > > legacy event user on ARM hardware may find their event opened on an
> > > > > > > > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > > > > > > > such events which this patch implements. Rather than have the skipping
> > > > > > > > conditional on running on ARM, the skipping is done on all
> > > > > > > > architectures as such a fundamental behavioral difference could lead
> > > > > > > > to problems with tools built/depending on perf.
> > > > > > > >
> > > > > > > > An example of perf record failing to open events on x86 is:
> > > > > > > > ```
> > > > > > > > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > > > > > > > Error:
> > > > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > > > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > > > >
> > > > > > > > Error:
> > > > > > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > > > > > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > > > > > > > "dmesg | grep -i perf" may provide additional information.
> > > > > > > >
> > > > > > > > Error:
> > > > > > > > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > > > > > > > The LLC-prefetch-read event is not supported.
> > > > > > > > [ perf record: Woken up 1 times to write data ]
> > > > > > > > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> > > > > > >
> > > > > > > I'm afraid this can be too noisy.
> > > > > >
> > > > > > The intention is to be noisy:
> > > > > > 1) it matches the existing behavior, anything else is potentially a regression;
> > > > >
> > > > > Well.. I think you're changing the behavior. :)  Also currently it just
> > > > > fails on the first event so it won't be too much noisy.
> > > > >
> > > > >   $ perf record -e data_read,data_write,LLC-prefetch-read -a sleep 0.1
> > > > >   event syntax error: 'data_read,data_write,LLC-prefetch-read'
> > > > >                        \___ Bad event name
> > > > >
> > > > >   Unable to find event on a PMU of 'data_read'
> > > > >   Run 'perf list' for a list of valid events
> > > > >
> > > > >    Usage: perf record [<options>] [<command>]
> > > > >       or: perf record [<options>] -- <command> [<options>]
> > > > >
> > > > >       -e, --event <event>   event selector. use 'perf list' to list available events
> > > >
> > > > Fwiw, this error is an event parsing error not an event opening error.
> > > > You need to select an uncore event, I was using data_read which exists
> > > > in the uncore_imc_free_running PMUs on Intel tigerlake. Here is the
> > > > existing error message:
> > > > ```
> > > > $ perf record -e data_read -a true
> > > > Error:
> > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > > > for event (data_read).
> > > > "dmesg | grep -i perf" may provide additional information.
> > > > ```
> > > > and here it with the series:
> > > > ```
> > > > $ perf record -e data_read -a true
> > > > Error:
> > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0'
> > > > which will be removed.
> > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > > > for event (data_read).
> > > > "dmesg | grep -i perf" may provide additional information.
> > > >
> > > > Error:
> > > > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1'
> > > > which will be removed.
> > > > The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> > > > for event (data_read).
> > > > "dmesg | grep -i perf" may provide additional information.
> > > >
> > > > Error:
> > > > Failure to open any events for recording.
> > > > ```
> > > > and here is what it would be with pr_debug:
> > > > ```
> > > > $ perf record -e data_read -a true
> > > > Error:
> > > > Failure to open any events for recording.
> > > > ```
> > > > I believe this last output is worst because:
> > > > 1) If not all events fail to open there is no error reported unless I
> > > > know to run with -v, which will also bring a bunch more noise with it,
> > >
> > > I suggested to add a warning if any (not all) of events failed to open.
> > >
> > >   "Removed some unsupported events, use -v for details."
> > >
> > >
> > > > 2) I don't see the PMU / event name and "Invalid argument" indicating
> > > > what has gone wrong again unless I know to run with -v and get all the
> > > > verbose noise with that.
> > >
> > > I don't think single -v adds a lot of noise in the output.
> > >
> > > >
> > > > Yes it is noisy on 1 platform for 1 event due to an ARM PMU event name
> > > > bug that ARM should have long ago fixed. That should be fixed rather
> > > > than hiding errors and making users think they are recording samples
> > > > when silently they're not - or they need to search through verbose
> > > > output to try to find out if something broke.
> > >
> > > I'm not sure if it's a bug in the driver.  It happens because perf tool
> > > changed the way it finds events - it used to look at the core PMUs only
> > > if no PMU name was given, but now it searches every PMU, right?
> >
> > So there is the ARM bug in the PMU driver that caused an issue with
> > the hybrid fixes done because of wanting to have metrics work for
> > hybrid. The bug is reported here:
> > https://lore.kernel.org/lkml/08f1f185-e259-4014-9ca4-6411d5c1bc65@marcan.st/
>
> I'm not sure if it's agreed to be called a PMU bug.
> My understanding is it's the change in the perf tool that break it.
>
>
> > The events are apple_icestorm_pmu/cycles/ and
> > apple_firestorm_pmu/cycles/. The issue is that prior to fixing hybrid
> > the ARM PMUs looked like uncore PMUs and couldn't open a legacy event,
> > which was fine as they has sysfs events. When hybrid was fixed in the
> > tool, the tool would then try to open apple_icestorm_pmu/cycles/ and
> > apple_firestorm_pmu/cycles/ as legacy events - legacy having priority
> > over sysfs/json back then. The legacy mapping was broken in the PMU
>
> I don't know why you want to use legacy events (PERF_TYPE_HARDWARE)
> when it has PMU in the event name and the PMU has a different type
> enconding.

Historically we used legacy then sysfs/json. When Intel did the hybrid
work they kept this priority for legacy events when the hybrid PMU was
specified. Intel change, Arnaldo and Jiri reviewing. My belief in why
it was done this way is that not every legacy event has a sysfs/json
encoding, and the priority was just inheriting an existing behavior.

> > driver. Now were everything working as intended, just the cycles event
> > would be specified on the command line and the event would be wildcard
> > opened on the apple_icestorm_pmu and apple_firestorm_pmu. I believe
> > this way would already use a legacy encoding and so to work around the
> > PMU driver bug people were specifying the PMU name to get the sysfs
> > encoding, but that only worked as the PMUs appeared to be uncore.
> >
> > > >
> > > > > > 2) it only happens if trying to record on a PMU/event that doesn't
> > > > > > support recording, something that is currently an error and so we're
> > > > > > not motivated to change the behavior as no-one should be using it;
> > > > >
> > > > > It was caught by Linus, so we know at least one (very important) user.
> > > >
> > > > If they care enough then specifying the PMU with the event will avoid
> > > > any warning and has always been a fix for this issue. It was the first
> > > > proposed workaround for Linus.
> > >
> > > I guess that's what Linus said regression.
> >
> > But a regression where? The tool's behavior is pretty clear, no PMU
> > the event will be tried on every PMU, give it a PMU and the event will
> > only be tried on that PMU, give it a PMU without a suffix and the
> > event will be opened on all PMUs that match the name with different
> > suffixes.
>
> It may be clear to you but may not be to others.  When did the change
> come in?  Before the change, people assume it would only try core PMU.
> And the people can still have the idea if they haven't used any affected
> events.  I guess many users would use legacy events only.

Who are these others? The only affected event is a cycles event on an
ARM SLC PMU. What gets fixed? Potentially Apple-M hardware where
legacy events have historically been broken. Rather than others
complaining I think there's a much greater number of others who will
be happy about the fix.

> > I dislike the idea of  cpu-cycles implicitly being just for
> > core PMUs, but cpu_cycles being for all PMUs as the hyphen is a legacy
> > name and the underscore not.
>
> That's because we specifically picked some names to be used as a legacy
> event.  And it worked well.  If some PMU didn't use the name, it's their
> fault and they should use PMU event with their name.

I'm not sure this is the case. If you look at the legacy event names
it was typical they included something like a PMU before the first
hyphen. What does cpu- mean for hybrid? Why does LLC mean L2 when
typical LLCs these days are L3? I think ideally we'd delete the legacy
events and fix the missing events by explicitly putting them into
sysfs/json. I don't see that happening soon.

> > I dislike the idea of specifying a PMU
> > with uncore events as uncore events often already have a PMU within
> > their event name and it also breaks the universe.
>
> Does the 'universe' mean 'metric'?

No, I mean if I do:
$ perf stat -e data_read ...
it works today. Making all non-core events require a PMU means I need to type:
$ perf stat -e uncore_imc_free_running/data_read/ ...
This is true for all non-core events with your proposal. I've
previously advised you that the former behavior is what perf's command
line completion of event names assumes.

> Having PMU name in the event name is their choice.  Do you see this in
> sysfs or JSON?  Or both?
>
> Actually I don't like the idea of trying every PMU if no PMU name is
> given.  But you said reverting it would break metrics (I don't know if
> there are other users rely on this behavior).  Maybe can we handle
> metrics differently?
>
> I guess we can put JSON events and metrics without PMU in a global name
> space so that it can be searched (after legacy name) when users don't
> specify PMUs in the command line.  Otherwise it should have PMU name
> and sysfs event (then JSON events with PMU name) can be searched.
>
> Does that make sense?

So my intention for metrics is that the events there work as events
would work for perf stat. Not least this simplifies testing and
creating metrics.

There are things we can do with the search order of events, I'd like
to make it so users can create their own events, but this is getting
off topic.

> > When trying to find
> > out what people mean by event names being implicitly associated with
> > PMUs I get told I'm throwing out "what ifs," when all I'm doing is
> > reading the code (that I wrote and I'm trying to fix) and trying to
> > figure out what behavior people want. What I don't want is
> > inconsistencies, events behaving differently in different scenarios
> > and the perf output's use of event names being inconsistent with the
> > parsing. RISC-V and ARM have wanted the syfs/json over legacy
> > priority, so I'm trying to get that landed.
>
> I'm not sure now RISC-V and ARM want it.  Or it needs to be more
> specific what they want exactly.

In which case let's make the priority be legacy then sysfs/json, I'm
happy with that. We can revert the changes that Mark Rutland and the
Apple-M folks pushed for. We can tell RISC-V they're not being
specific enough with their need. I don't think that's as good an
alternative as the changes here, but if it works for you...

> >
> > Ultimately the original regression comes back to the ARM SLC PMU
> > advertising a cycles event when it should have been named cpu_cycles,
> > if for no other reason than uniformity with the bus_cycles name on the
> > same PMU. The change in perf's wildcard behavior exposed the latent
> > bug, that doesn't make the SLC PMU's event name not a bug. The change
> > here is to make seeing that bug non-terminal to running the program.
>
> I don't see it's a bug if uncore PMUs have an event named 'cycles' or
> whatever.  It's just because perf record wanted to use it and that's
> entirely tool's choice.

It's a bug because a wildcard match */cycles/ will match against the
SLC PMU's event and if users don't want to specify a PMU it breaks
perf record when wildcarding cycles. These changes lower the failure
to a warning, implementing the behavior proposed by Arnaldo:
https://lore.kernel.org/lkml/ZlY0F_lmB37g10OK@x1/
and has tags from Intel, ARM and Rivos (RISC-V). I intend to carry it
in Google's tree. If you want to require uncore events have PMUs in
their names, maintain all the metrics, etc. I don't plan on carrying
that but you have complete freedom to do what you see best.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 3/4] perf record: Skip don't fail for events that don't open
  2025-01-15 17:56                   ` Ian Rogers
@ 2025-01-29 21:24                     ` Namhyung Kim
  0 siblings, 0 replies; 53+ messages in thread
From: Namhyung Kim @ 2025-01-29 21:24 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Arnaldo Carvalho de Melo, Linus Torvalds, Peter Zijlstra,
	Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Adrian Hunter, Kan Liang, James Clark, Ze Gao, Weilin Wang,
	Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Atish Patra

Hi Ian,

Sorry for the delay.

On Wed, Jan 15, 2025 at 09:56:59AM -0800, Ian Rogers wrote:
> On Wed, Jan 15, 2025 at 9:31 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Mon, Jan 13, 2025 at 03:04:26PM -0800, Ian Rogers wrote:
> > > On Mon, Jan 13, 2025 at 12:51 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > Hi Ian,
> > > >
> > > > On Fri, Jan 10, 2025 at 01:33:57PM -0800, Ian Rogers wrote:
> > > > > On Fri, Jan 10, 2025 at 11:26 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > On Fri, Jan 10, 2025 at 08:42:02AM -0800, Ian Rogers wrote:
[...]
> > > > > > > A patch lowering the priority of error messages should be independent
> > > > > > > of the 4 changes here. I'd be happy if someone follows this series
> > > > > > > with a patch doing it.
> > > > > >
> > > > > > I think the error behavior is a part of this change.
> > > > >
> > > > > I disagree with it, so I think you need to address my comments.
> > > >
> > > > You are changing the error behavior by skipping failed events then the
> > > > relevant error messages should be handled properly in this patchset.
> > >
> > > I'm not sure what you are asking and I'm not sure why it matters?
> > > Previously you'd asked for all the output to be moved under verbose.
> > >
> > > If I specify an event that doesn't work with perf record today then it
> > > fails. With this patch it fails too. If that event is a core PMU event
> > > then there will be an error message for each core PMU that doesn't
> > > support the event. So I get 2 error messages on hybrid. This doesn't
> > > feel egregious or warrant a new error message mechanism. I would like
> > > it so that evsels supported 1 or more PMUs, in which case this would
> > > be 1 error message.
> > >
> > > If I specify perf record today on an uncore event then perf record
> > > fails and I get 1 error message for the uncore PMU. The new behavior
> > > will be to get 1 error message per uncore PMU. If I'm on a server with
> > > 10s of uncore PMUs then maybe the message is spammy, but the command
> > > fails today and will continue to fail with this series. I don't see a
> > > motivation to change or optimize for this case and again, evsels that
> > > support >1 PMU would be the most appropriate fix.
> > >
> > > The only case where there is no message today but would be with this
> > > patch series is for cycles on ARM's neoverse. There will be one
> > > warning for the evsel on the SLC PMU. That's one warning and not many.
> > >
> > > As I've said, if you want a more elaborate error reporting system then
> > > take these patches and add it to them. There's a larger refactor to
> > > make evsels support >1 PMU that would clean up the many events on
> > > server uncore PMUs issue, but that shouldn't be part of this series
> > > nor gate it. If you are trying to perf record on uncore PMUs then you
> > > already have problems and optimizing the error messages for your
> > > mistake, I don't get why it matters?
> >
> > What about with multiple events in the command line - one of them
> > failing with >1 PMUs and the command now succeeds?
> 
> So this would be something like:
> ```
> $ perf record -e cycles,instructions,data_read -a sleep 1
> ```
> where data_read is an uncore PMU event. The current behavior is:
> ```
> $ perf record -e cycles,instructions,data_read -a sleep 1
> Error:
> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (data_read).
> "dmesg | grep -i perf" may provide additional information.
> ```
> The new behavior is:
> ```
> $ perf record -e cycles,instructions,data_read -a sleep 1
> Error:
> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0'
> which will be removed.
> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (data_read).
> "dmesg | grep -i perf" may provide additional information.
> 
> Error:
> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1'
> which will be removed.
> The sys_perf_event_open() syscall returned with 22 (Invalid argument)
> for event (data_read).
> "dmesg | grep -i perf" may provide additional information.
> 
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 3.138 MB perf.data (11670 samples) ]
> ```
> 
> We know nobody does this, as the command currently fails. It succeeds
> with this change, because that's the whole point of the change.

Well, I think it's because it failed before.  New users can come anytime
and do whatever they want (or can).  They might pass 100 failing events
with 1 successful event and it will give a ton of warnings with this.
So it'd be better ratelimit the message and make it optional (with -v).

But more importantly, I think we should agree on the patch 4 first.

Thanks,
Namhyung


> I'm not offended by seeing the event was being opened on >1 PMU. For the
> only currently succeeding situation where this will now warn, the
> cycles case on Neoverse because of the buggy event name in ARM's SLC
> PMU, there will be 1 warning. For my example the appropriate fix is to
> remove the data_read event. For the Neoverse case, specifying the PMU
> resolves the issue until ARM fixes their driver.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-15 21:20               ` Ian Rogers
@ 2025-01-29 21:55                 ` Namhyung Kim
  2025-01-30  1:16                   ` Ian Rogers
  2025-01-30  6:12                   ` Atish Kumar Patra
  0 siblings, 2 replies; 53+ messages in thread
From: Namhyung Kim @ 2025-01-29 21:55 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Atish Patra, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Wed, Jan 15, 2025 at 01:20:32PM -0800, Ian Rogers wrote:
> On Wed, Jan 15, 2025 at 9:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > > On Mon, Jan 13, 2025 at 2:51 PM Ian Rogers <irogers@google.com> wrote:
> > > > There was an explicit, and reviewed by Jiri and Arnaldo, intent with
> > > > the hybrid work that using a legacy event with a hybrid PMU, even
> > > > though the PMU doesn't advertise through json or sysfs the legacy
> > > > event, the perf tool supports it.
> >
> > I thought legacy events on hybrid were converted to PMU events.
> 
> No, when BIG.little was created nothing changed in perf events but
> when Intel did hybrid they wanted to make the hybrid CPUs (atom and
> performance) appear as if they were one type. The PMU event encodings
> vary a lot for this on Intel, ARM has standards for the encoding.
> Intel extended the legacy format to take a PMU type id:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/include/uapi/linux/perf_event.h?h=perf-tools-next#n41
> "EEEEEEEE: PMU type ID"
> that is in the top 32-bits of the config.

Oh right, I forgot the extended type thing.  Then we can keep the legacy
encoding with it on hybrid systems when users give well-known names (w/o
PMU) for legacy event.

> 
> > > >
> > > > Making it so that events without PMUs are only legacy events just
> > > > doesn't work. There are far too many existing uses of non-legacy
> > > > events without PMU, the metrics contain 100s of examples.
> >
> > That's unfortunate.  It'd be nice if metrics were written with PMU
> > names.
> 
> But then we'd end up with things like on Intel:
> UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD
> becoming:
> uncore_cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
> or just:
> cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
> As a user the first works for me and doesn't have any ambiguity over
> PMUs as the event name already encodes the PMU. AMD similarly place
> the part of a pipeline into event names. Were we to break everybody by
> requiring the PMU we'd also need to explain which PMU to use. Sites
> with event lists (like https://perfmon-events.intel.com/) don't
> explain the PMU and it'd be messy as on Intel you have a CHA PMU for
> server chips but a CBOX on client chips, etc.

While I prefer having PMU names in the JSON events/metrics, it may not
be pratical to change them all.  Probably we can allow them without PMU
and hope that they have unique prefixes.

> 
> > I have a question.  What if an event name in a metric matches to
> > multiple unrelated PMUs?
> 
> The metric may break or we'd aggregate the unrelated counts together.

Ok, then they should use unique names.


> Take a metric like IPC as "instructions/cycles", that metric should
> work on a hybrid system as they have instructions and cycles. If you
> used an event for instructions like inst_retired.any then maybe the
> metric will fail on one kind of core that didn't have that event. Now

The metrics is for specific CPU model then the vendor should be
responsible to provide accurate metrics using approapriate PMU/events
IMHO.


> if we have accelerators advertising instructions and cycles events, we
> should be able to compute the metric for the accelerator. What could
> happen today is that the accelerator will have a cpumask of a single
> CPU, we could aggregate the accelerator counter into the CPU event
> with the same CPU as the cpumask, we'd end up with a weird quasi CPU
> and accelerator IPC metric for that CPU. What should happen is that we
> get an IPC for the accelerator and IPC for each hybrid core
> independently, but the way we handle evsels, CPUs, PMUs is not really
> set up for that. Hopefully getting a set of PMUs into the evsel will
> clear that up. Assuming all of that is cleared up, is it wrong if the
> IPC metric is computed for the accelerator if it was originally
> written as a CPU metric? Not really. Could there be metrics where that
> is the case?

Yes, I think there should be separate metrics for the accelerators.


> Probably, and specifying PMUs in the event names would be
> a fix. There have also been proposals that we restrict the PMUs for
> certain metrics. As event names are currently so distinct it isn't a
> problem we've faced yet and it is not clear it is a problem other than
> highlighting tech debt in areas of the tool like aggregation.
> 
> > > >
> > > > Prior to switching json/sysfs to being the priority when a PMU is
> > > > specified, it was the case that all encodings were the same, with or
> > > > without a PMU.
> > > >
> > > > I don't think there is anything natural about assuming things about
> > > > event names. Take cycles, cpu-cycles and cpu_cycles:
> > > >  - cycles on x86 is only encoded via a legacy event;
> > > >  - cpu-cycles on Intel exists as a sysfs event, but cpu-cycles is also
> > > > a legacy event name;
> > > >  - cpu_cycles exists as a sysfs event on ARM but doesn't have a
> > > > corresponding legacy event name.
> >
> > I think the behavior should be:
> >
> >   cycles -> PERF_COUNT_HW_CPU_CYCLES
> >   cpu-cycles -> PERF_COUNT_HW_CPU_CYCLES
> >   cpu_cycles -> no legacy -> sysfs or json
> >   cpu/cycles/ -> sysfs or json
> >   cpu/cpu-cycles/ -> sysfs or json
> 
> So I disagree as if you add a PMU to an event name the encoding
> shouldn't change:
> 1) This historically was perf's behavior.

Well.. I'm not sure about the history.  I believe the logic I said above
is the historic and (I think) right behavior.

> 2) Different event encodings can have different behaviors (broken in
> some notable cases).

Yep, let's make it clear.

> 3) Intuitively what wildcarding does is try to open "*/event/" where *
> is every possible PMU name. Having different event encodings is
> breaking that intuition it could also break situations where you try
> to assert equivalence based on type/config.

While I don't like the wildcard matching, I think it doesn't matter as
long as we keep the above behavior.  If it can find a legacy name, then
go with it, done.  If not, try all PMUs as if it's given with PMU name
in the event.

> 4) The legacy encodings were (are?) broken on ARM Apple M? CPUs,
> that's why the priority was changed.

I guess that why they use cpu_cycles.

> 5) RISC-V would like the tool tackle the legacy to config mapping
> challenge, rather than the PMU driver, given the potential diversity
> of hardware implementations.

I hope they can find a better solution. :)

> 
> To this end we hosted RISC-V's perf people at Google and they
> expressed that their preference was what this series does, and they
> expressed this directly to you.
> 
> I don't think there would be an issue in this area if it wasn't for
> Neoverse and Linus - that's why the revert happened. This change in
> behavior was proposed by Arnaldo:
> https://lore.kernel.org/lkml/ZlY0F_lmB37g10OK@x1/
> and has tags from Intel, ARM and Rivos (RISC-V). I intend to carry it
> in Google's tree.

Maybe it's because of Linus.  But anyway it reminds me of behaviors that
need to be discussed.  And we can (and should) improve things always.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2025-01-09 22:21 [PATCH v5 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
                   ` (3 preceding siblings ...)
  2025-01-09 22:21 ` [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy" Ian Rogers
@ 2025-01-29 22:05 ` Namhyung Kim
  2025-01-30 17:46 ` Namhyung Kim
  5 siblings, 0 replies; 53+ messages in thread
From: Namhyung Kim @ 2025-01-29 22:05 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe

On Thu, Jan 09, 2025 at 02:21:05PM -0800, Ian Rogers wrote:
> At the RISC-V summit the topic of avoiding event data being in the
> RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> events being the priority when no PMU is provided so that legacy
> events maybe supported via json. Originally Mark Rutland also
> expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> M? processors, but James Clark more recently tested this and believes
> the driver issues there may not have existed or have been resolved. In
> any case, it is inconsistent that with a PMU event names avoid legacy
> encodings, but when wildcarding PMUs (ie without a PMU with the event
> name) the legacy encodings have priority.
> 
> The patch doing this work was reverted in a v6.10 release candidate
> as, even though the patch was posted for weeks and had been on
> linux-next for weeks without issue, Linus was in the habit of using
> explicit legacy events with unsupported precision options on his
> Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> where ARM decided to call the events bus_cycles and cycles, the latter
> being also a legacy event name. ARM haven't renamed the cycles event
> to a more consistent cpu_cycles and avoided the problem. With these
> changes the problematic event will now be skipped, a large warning
> produced, and perf record will continue for the other PMU events. This
> solution was proposed by Arnaldo.
> 
> Two minor changes have been added to help with the error message and
> to work around issues occurring with "perf stat metrics (shadow stat)
> test".
> 
> The patches have only been tested on my x86 non-hybrid laptop.
> 
> v5: Follow Namhyung's suggestion and ignore the case where command
>     line dummy events fail to open alongside other events that all
>     fail to open. Note, the Tested-by tags are left on the series as
>     v4 and v5 were changing an error case that doesn't occur in
>     testing but was manually tested by myself.
> 
> v4: Rework the no events opening change from v3 to make it handle
>     multiple dummy events. Sadly an evlist isn't empty if it just
>     contains dummy events as the dummy event may be used with "perf
>     record -e dummy .." as a way to determine whether permission
>     issues exist. Other software events like cpu-clock would suffice
>     for this, but the using dummy genie has left the bottle.
> 
>     Another problem is that we appear to have an excessive number of
>     dummy events added, for example, we can likely avoid a dummy event
>     and add sideband data to the original event. For auxtrace more
>     dummy events may be opened too. Anyway, this has led to the
>     approach taken in patch 3 where the number of dummy parsed events
>     is computed. If the number of removed/failing-to-open non-dummy
>     events matches the number of non-dummy events then we want to
>     fail, but only if there are no parsed dummy events or if there was
>     one then it must have opened. The math here is hard to read, but
>     passes my manual testing.
> 
> v3: Make no events opening for perf record a failure as suggested by
>     James Clark and Aditya Bodkhe <Aditya.Bodkhe1@ibm.com>. Also,
>     rebase.
> 
> v2: Rebase and add tested-by tags from James Clark, Leo Yan and Atish
>     Patra who have tested on RISC-V and ARM CPUs, including the
>     problem case from before.
> 
> Ian Rogers (4):
>   perf evsel: Add pmu_name helper
>   perf stat: Fix find_stat for mixed legacy/non-legacy events

I think the first two are quite independent.  I'll take them first.

Thanks,
Namhyung


>   perf record: Skip don't fail for events that don't open
>   perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
>     legacy"
> 
>  tools/perf/builtin-record.c    | 47 ++++++++++++++++++---
>  tools/perf/util/evsel.c        | 10 +++++
>  tools/perf/util/evsel.h        |  1 +
>  tools/perf/util/parse-events.c | 26 +++++++++---
>  tools/perf/util/parse-events.l | 76 +++++++++++++++++-----------------
>  tools/perf/util/parse-events.y | 60 ++++++++++++++++++---------
>  tools/perf/util/pmus.c         | 20 +++++++--
>  tools/perf/util/stat-shadow.c  |  3 +-
>  8 files changed, 169 insertions(+), 74 deletions(-)
> 
> -- 
> 2.47.1.613.gc27f4b7a9f-goog
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-29 21:55                 ` Namhyung Kim
@ 2025-01-30  1:16                   ` Ian Rogers
  2025-01-30  5:16                     ` Namhyung Kim
  2025-01-30  6:12                   ` Atish Kumar Patra
  1 sibling, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-01-30  1:16 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Atish Patra, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Wed, Jan 29, 2025 at 1:55 PM Namhyung Kim <namhyung@kernel.org> wrote:
> On Wed, Jan 15, 2025 at 01:20:32PM -0800, Ian Rogers wrote:
> > On Wed, Jan 15, 2025 at 9:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > I think the behavior should be:
> > >
> > >   cycles -> PERF_COUNT_HW_CPU_CYCLES
> > >   cpu-cycles -> PERF_COUNT_HW_CPU_CYCLES
> > >   cpu_cycles -> no legacy -> sysfs or json
> > >   cpu/cycles/ -> sysfs or json
> > >   cpu/cpu-cycles/ -> sysfs or json
> >
> > So I disagree as if you add a PMU to an event name the encoding
> > shouldn't change:
> > 1) This historically was perf's behavior.
>
> Well.. I'm not sure about the history.  I believe the logic I said above
> is the historic and (I think) right behavior.

You're wrong as you are describing the behavior post:
https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com
commit a24d9d9dc096fc0d0bd85302c9a4fe4fe3b1107b from Nov 2022, but
somehow without legacy event fall backs which Intel added with a PMU
for hybrid.

The behavior in this patch series is best for RISC-V, presumably ARM
(particularly for Apple M? CPUs), carries ARM and Intel's tags,
implements the behavior Arnaldo asked for, and solves the
inconsistency that I think is fundamentally wrong in the tool that PMU
names shouldn't matter on an event name (an inconsistency my past
fixes introduced). It is also part of solving other problems:
https://lore.kernel.org/linux-perf-users/20250127-counter_delegation-v3-0-64894d7e16d5@rivosinc.com/

You've not pointed at anything wrong in the scheme that these patches
introduce, and are supported by vendors, except that it is a behavior
change. I can, and have, pointed at many issues with your proposal
above and the current behavior. The behavior change came about to work
around PMU bugs over 2 years ago but only partially did so. It makes
sense to remedy this and for the clean, consistent behavior this
series achieves. It is unfortunate that it is a behavior change, but
the first step for that was made 2 years ago. I think it also makes
sense that something self described as legacy is a lower priority and
of the past (wrt event naming moving forward).

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-30  1:16                   ` Ian Rogers
@ 2025-01-30  5:16                     ` Namhyung Kim
  2025-01-30  6:03                       ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Namhyung Kim @ 2025-01-30  5:16 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Atish Patra, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Wed, Jan 29, 2025 at 05:16:58PM -0800, Ian Rogers wrote:
> On Wed, Jan 29, 2025 at 1:55 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > On Wed, Jan 15, 2025 at 01:20:32PM -0800, Ian Rogers wrote:
> > > On Wed, Jan 15, 2025 at 9:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > I think the behavior should be:
> > > >
> > > >   cycles -> PERF_COUNT_HW_CPU_CYCLES
> > > >   cpu-cycles -> PERF_COUNT_HW_CPU_CYCLES
> > > >   cpu_cycles -> no legacy -> sysfs or json
> > > >   cpu/cycles/ -> sysfs or json
> > > >   cpu/cpu-cycles/ -> sysfs or json
> > >
> > > So I disagree as if you add a PMU to an event name the encoding
> > > shouldn't change:
> > > 1) This historically was perf's behavior.
> >
> > Well.. I'm not sure about the history.  I believe the logic I said above
> > is the historic and (I think) right behavior.
> 
> You're wrong as you are describing the behavior post:
> https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com
> commit a24d9d9dc096fc0d0bd85302c9a4fe4fe3b1107b from Nov 2022, but
> somehow without legacy event fall backs which Intel added with a PMU
> for hybrid.
> 
> The behavior in this patch series is best for RISC-V, presumably ARM
> (particularly for Apple M? CPUs), carries ARM and Intel's tags,
> implements the behavior Arnaldo asked for, and solves the
> inconsistency that I think is fundamentally wrong in the tool that PMU
> names shouldn't matter on an event name (an inconsistency my past
> fixes introduced). It is also part of solving other problems:
> https://lore.kernel.org/linux-perf-users/20250127-counter_delegation-v3-0-64894d7e16d5@rivosinc.com/

So you think the below behavior is preferred, right?

  cycles -> cpu/cycles/ (or whatever PMU name) -> sysfs or json

And there's no way to use legacy event encodings anymore?

> 
> You've not pointed at anything wrong in the scheme that these patches
> introduce, and are supported by vendors, except that it is a behavior
> change. I can, and have, pointed at many issues with your proposal
> above and the current behavior. The behavior change came about to work
> around PMU bugs over 2 years ago but only partially did so. It makes
> sense to remedy this and for the clean, consistent behavior this
> series achieves. It is unfortunate that it is a behavior change, but
> the first step for that was made 2 years ago. I think it also makes
> sense that something self described as legacy is a lower priority and
> of the past (wrt event naming moving forward).

I want to clarify the event parsing behavior and to find the right way
to deal with various cases.  I haven't followed the activities in this
area closely so I missed some changes in the past.  Maybe the problem
is that the behavior is complex and not clarified.  Hopefully we can
write it down in a doc.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-30  5:16                     ` Namhyung Kim
@ 2025-01-30  6:03                       ` Ian Rogers
  2025-01-31 22:28                         ` Namhyung Kim
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-01-30  6:03 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Atish Patra, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Wed, Jan 29, 2025 at 9:16 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Jan 29, 2025 at 05:16:58PM -0800, Ian Rogers wrote:
> > On Wed, Jan 29, 2025 at 1:55 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > On Wed, Jan 15, 2025 at 01:20:32PM -0800, Ian Rogers wrote:
> > > > On Wed, Jan 15, 2025 at 9:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > I think the behavior should be:
> > > > >
> > > > >   cycles -> PERF_COUNT_HW_CPU_CYCLES
> > > > >   cpu-cycles -> PERF_COUNT_HW_CPU_CYCLES
> > > > >   cpu_cycles -> no legacy -> sysfs or json
> > > > >   cpu/cycles/ -> sysfs or json
> > > > >   cpu/cpu-cycles/ -> sysfs or json
> > > >
> > > > So I disagree as if you add a PMU to an event name the encoding
> > > > shouldn't change:
> > > > 1) This historically was perf's behavior.
> > >
> > > Well.. I'm not sure about the history.  I believe the logic I said above
> > > is the historic and (I think) right behavior.
> >
> > You're wrong as you are describing the behavior post:
> > https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com
> > commit a24d9d9dc096fc0d0bd85302c9a4fe4fe3b1107b from Nov 2022, but
> > somehow without legacy event fall backs which Intel added with a PMU
> > for hybrid.
> >
> > The behavior in this patch series is best for RISC-V, presumably ARM
> > (particularly for Apple M? CPUs), carries ARM and Intel's tags,
> > implements the behavior Arnaldo asked for, and solves the
> > inconsistency that I think is fundamentally wrong in the tool that PMU
> > names shouldn't matter on an event name (an inconsistency my past
> > fixes introduced). It is also part of solving other problems:
> > https://lore.kernel.org/linux-perf-users/20250127-counter_delegation-v3-0-64894d7e16d5@rivosinc.com/
>
> So you think the below behavior is preferred, right?
>
>   cycles -> cpu/cycles/ (or whatever PMU name) -> sysfs or json
>
> And there's no way to use legacy event encodings anymore?

This is absolutely the right thing to do! If sysfs/json knows better
than to allow a legacy event named cycles, advertises it, then perf
should select it. Not doing this was the cause of the ARM Apple M?
breakage - because their PMUs looked uncore before hybrid fixes and so
weren't known previously to accept legacy events and always used the
sysfs/json encodings in preference. Why would or not having the PMU in
the event name imply a different and sometimes known broken encoding?
And then in the perf stat uniquification we can rename the event to be
the version with a different encoding. It is madness to me.

If a user wants to force a legacy event, even though most typically
the driver is saying it knows better, they can use a raw event
encoding or in the case of cycles its alias cpu-cycles. If there
really is a use-case for using legacy encodings, we could introduce
new legacy-cpu and legacy-cache PMUs that advertise the events, but
then the wildcard behavior would be weird.

To be clear, I do not know of a single use-case where the legacy
encodings are actually wanted when sysfs/json have an encoding. The
opposite is very much true, that legacy encodings are not wanted -
hence wanting the lowering of their priority everywhere originally by
ARM to fix Apple M? and then by RISC-V.

> >
> > You've not pointed at anything wrong in the scheme that these patches
> > introduce, and are supported by vendors, except that it is a behavior
> > change. I can, and have, pointed at many issues with your proposal
> > above and the current behavior. The behavior change came about to work
> > around PMU bugs over 2 years ago but only partially did so. It makes
> > sense to remedy this and for the clean, consistent behavior this
> > series achieves. It is unfortunate that it is a behavior change, but
> > the first step for that was made 2 years ago. I think it also makes
> > sense that something self described as legacy is a lower priority and
> > of the past (wrt event naming moving forward).
>
> I want to clarify the event parsing behavior and to find the right way
> to deal with various cases.  I haven't followed the activities in this
> area closely so I missed some changes in the past.  Maybe the problem
> is that the behavior is complex and not clarified.  Hopefully we can
> write it down in a doc.

I think what is typical in the kernel is the source is the best
documentation. By simplifying event parsing, for example,
parse-events.y has been reduced from 952 lines (in v5.10) to 762 lines
- so we're about 25% simpler whilst being more correct (I've fixed all
the memory leaks, etc.) and avoiding expensive start-up costs, lazy
initialization, etc.

Having a single priority for which events are preferred, legacy vs
sysfs/json with or without PMU, will further make the code base
simpler and easy to understand.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-29 21:55                 ` Namhyung Kim
  2025-01-30  1:16                   ` Ian Rogers
@ 2025-01-30  6:12                   ` Atish Kumar Patra
  2025-01-31 22:42                     ` Namhyung Kim
  1 sibling, 1 reply; 53+ messages in thread
From: Atish Kumar Patra @ 2025-01-30  6:12 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Wed, Jan 29, 2025 at 1:55 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Jan 15, 2025 at 01:20:32PM -0800, Ian Rogers wrote:
> > On Wed, Jan 15, 2025 at 9:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > > On Mon, Jan 13, 2025 at 2:51 PM Ian Rogers <irogers@google.com> wrote:
> > > > > There was an explicit, and reviewed by Jiri and Arnaldo, intent with
> > > > > the hybrid work that using a legacy event with a hybrid PMU, even
> > > > > though the PMU doesn't advertise through json or sysfs the legacy
> > > > > event, the perf tool supports it.
> > >
> > > I thought legacy events on hybrid were converted to PMU events.
> >
> > No, when BIG.little was created nothing changed in perf events but
> > when Intel did hybrid they wanted to make the hybrid CPUs (atom and
> > performance) appear as if they were one type. The PMU event encodings
> > vary a lot for this on Intel, ARM has standards for the encoding.
> > Intel extended the legacy format to take a PMU type id:
> > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/include/uapi/linux/perf_event.h?h=perf-tools-next#n41
> > "EEEEEEEE: PMU type ID"
> > that is in the top 32-bits of the config.
>
> Oh right, I forgot the extended type thing.  Then we can keep the legacy
> encoding with it on hybrid systems when users give well-known names (w/o
> PMU) for legacy event.
>
> >
> > > > >
> > > > > Making it so that events without PMUs are only legacy events just
> > > > > doesn't work. There are far too many existing uses of non-legacy
> > > > > events without PMU, the metrics contain 100s of examples.
> > >
> > > That's unfortunate.  It'd be nice if metrics were written with PMU
> > > names.
> >
> > But then we'd end up with things like on Intel:
> > UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD
> > becoming:
> > uncore_cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
> > or just:
> > cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
> > As a user the first works for me and doesn't have any ambiguity over
> > PMUs as the event name already encodes the PMU. AMD similarly place
> > the part of a pipeline into event names. Were we to break everybody by
> > requiring the PMU we'd also need to explain which PMU to use. Sites
> > with event lists (like https://perfmon-events.intel.com/) don't
> > explain the PMU and it'd be messy as on Intel you have a CHA PMU for
> > server chips but a CBOX on client chips, etc.
>
> While I prefer having PMU names in the JSON events/metrics, it may not
> be pratical to change them all.  Probably we can allow them without PMU
> and hope that they have unique prefixes.
>
> >
> > > I have a question.  What if an event name in a metric matches to
> > > multiple unrelated PMUs?
> >
> > The metric may break or we'd aggregate the unrelated counts together.
>
> Ok, then they should use unique names.
>
>
> > Take a metric like IPC as "instructions/cycles", that metric should
> > work on a hybrid system as they have instructions and cycles. If you
> > used an event for instructions like inst_retired.any then maybe the
> > metric will fail on one kind of core that didn't have that event. Now
>
> The metrics is for specific CPU model then the vendor should be
> responsible to provide accurate metrics using approapriate PMU/events
> IMHO.
>
>
> > if we have accelerators advertising instructions and cycles events, we
> > should be able to compute the metric for the accelerator. What could
> > happen today is that the accelerator will have a cpumask of a single
> > CPU, we could aggregate the accelerator counter into the CPU event
> > with the same CPU as the cpumask, we'd end up with a weird quasi CPU
> > and accelerator IPC metric for that CPU. What should happen is that we
> > get an IPC for the accelerator and IPC for each hybrid core
> > independently, but the way we handle evsels, CPUs, PMUs is not really
> > set up for that. Hopefully getting a set of PMUs into the evsel will
> > clear that up. Assuming all of that is cleared up, is it wrong if the
> > IPC metric is computed for the accelerator if it was originally
> > written as a CPU metric? Not really. Could there be metrics where that
> > is the case?
>
> Yes, I think there should be separate metrics for the accelerators.
>
>
> > Probably, and specifying PMUs in the event names would be
> > a fix. There have also been proposals that we restrict the PMUs for
> > certain metrics. As event names are currently so distinct it isn't a
> > problem we've faced yet and it is not clear it is a problem other than
> > highlighting tech debt in areas of the tool like aggregation.
> >
> > > > >
> > > > > Prior to switching json/sysfs to being the priority when a PMU is
> > > > > specified, it was the case that all encodings were the same, with or
> > > > > without a PMU.
> > > > >
> > > > > I don't think there is anything natural about assuming things about
> > > > > event names. Take cycles, cpu-cycles and cpu_cycles:
> > > > >  - cycles on x86 is only encoded via a legacy event;
> > > > >  - cpu-cycles on Intel exists as a sysfs event, but cpu-cycles is also
> > > > > a legacy event name;
> > > > >  - cpu_cycles exists as a sysfs event on ARM but doesn't have a
> > > > > corresponding legacy event name.
> > >
> > > I think the behavior should be:
> > >
> > >   cycles -> PERF_COUNT_HW_CPU_CYCLES
> > >   cpu-cycles -> PERF_COUNT_HW_CPU_CYCLES
> > >   cpu_cycles -> no legacy -> sysfs or json
> > >   cpu/cycles/ -> sysfs or json
> > >   cpu/cpu-cycles/ -> sysfs or json
> >
> > So I disagree as if you add a PMU to an event name the encoding
> > shouldn't change:
> > 1) This historically was perf's behavior.
>
> Well.. I'm not sure about the history.  I believe the logic I said above
> is the historic and (I think) right behavior.
>
> > 2) Different event encodings can have different behaviors (broken in
> > some notable cases).
>
> Yep, let's make it clear.
>
> > 3) Intuitively what wildcarding does is try to open "*/event/" where *
> > is every possible PMU name. Having different event encodings is
> > breaking that intuition it could also break situations where you try
> > to assert equivalence based on type/config.
>
> While I don't like the wildcard matching, I think it doesn't matter as
> long as we keep the above behavior.  If it can find a legacy name, then
> go with it, done.  If not, try all PMUs as if it's given with PMU name
> in the event.
>
> > 4) The legacy encodings were (are?) broken on ARM Apple M? CPUs,
> > that's why the priority was changed.
>
> I guess that why they use cpu_cycles.
>
> > 5) RISC-V would like the tool tackle the legacy to config mapping
> > challenge, rather than the PMU driver, given the potential diversity
> > of hardware implementations.
>
> I hope they can find a better solution. :)
>

Sorry for reposing. Gmail converted it to html for some reason.

I have posted the latest support here.
https://lore.kernel.org/kvm/20250127-counter_delegation-v3-12-64894d7e16d5@rivosinc.com/T/

As of now, we have adopted a hybrid approach where a vendor can decide
whether to encode the legacy events
in the json or in the driver (if this series is merged). In absence of
that, every vendor has to define it in the driver.
We will deal with the fall out of the exploding driver when the
situation arrives.

If a vendor chooses to define in both places, driver encoding will
take precedence.
I have tried to describe the scheme in the cover letter. Please let me
know if I should clarify more.

> >
> > To this end we hosted RISC-V's perf people at Google and they
> > expressed that their preference was what this series does, and they
> > expressed this directly to you.
> >
> > I don't think there would be an issue in this area if it wasn't for
> > Neoverse and Linus - that's why the revert happened. This change in
> > behavior was proposed by Arnaldo:
> > https://lore.kernel.org/lkml/ZlY0F_lmB37g10OK@x1/
> > and has tags from Intel, ARM and Rivos (RISC-V). I intend to carry it
> > in Google's tree.
>
> Maybe it's because of Linus.  But anyway it reminds me of behaviors that
> need to be discussed.  And we can (and should) improve things always.
>
> Thanks,
> Namhyung
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2025-01-09 22:21 [PATCH v5 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
                   ` (4 preceding siblings ...)
  2025-01-29 22:05 ` [PATCH v5 0/4] Prefer sysfs/JSON events also when no PMU is provided Namhyung Kim
@ 2025-01-30 17:46 ` Namhyung Kim
  5 siblings, 0 replies; 53+ messages in thread
From: Namhyung Kim @ 2025-01-30 17:46 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Ian Rogers

On Thu, 09 Jan 2025 14:21:05 -0800, Ian Rogers wrote:
> At the RISC-V summit the topic of avoiding event data being in the
> RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> events being the priority when no PMU is provided so that legacy
> events maybe supported via json. Originally Mark Rutland also
> expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> M? processors, but James Clark more recently tested this and believes
> the driver issues there may not have existed or have been resolved. In
> any case, it is inconsistent that with a PMU event names avoid legacy
> encodings, but when wildcarding PMUs (ie without a PMU with the event
> name) the legacy encodings have priority.
> 
> [...]
Applied patch 1 and 2 to perf-tools-next, thanks!

Best regards,
Namhyung


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-30  6:03                       ` Ian Rogers
@ 2025-01-31 22:28                         ` Namhyung Kim
  0 siblings, 0 replies; 53+ messages in thread
From: Namhyung Kim @ 2025-01-31 22:28 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Atish Patra, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Wed, Jan 29, 2025 at 10:03:03PM -0800, Ian Rogers wrote:
> On Wed, Jan 29, 2025 at 9:16 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, Jan 29, 2025 at 05:16:58PM -0800, Ian Rogers wrote:
> > > On Wed, Jan 29, 2025 at 1:55 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > On Wed, Jan 15, 2025 at 01:20:32PM -0800, Ian Rogers wrote:
> > > > > On Wed, Jan 15, 2025 at 9:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > > I think the behavior should be:
> > > > > >
> > > > > >   cycles -> PERF_COUNT_HW_CPU_CYCLES
> > > > > >   cpu-cycles -> PERF_COUNT_HW_CPU_CYCLES
> > > > > >   cpu_cycles -> no legacy -> sysfs or json
> > > > > >   cpu/cycles/ -> sysfs or json
> > > > > >   cpu/cpu-cycles/ -> sysfs or json
> > > > >
> > > > > So I disagree as if you add a PMU to an event name the encoding
> > > > > shouldn't change:
> > > > > 1) This historically was perf's behavior.
> > > >
> > > > Well.. I'm not sure about the history.  I believe the logic I said above
> > > > is the historic and (I think) right behavior.
> > >
> > > You're wrong as you are describing the behavior post:
> > > https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com
> > > commit a24d9d9dc096fc0d0bd85302c9a4fe4fe3b1107b from Nov 2022, but
> > > somehow without legacy event fall backs which Intel added with a PMU
> > > for hybrid.
> > >
> > > The behavior in this patch series is best for RISC-V, presumably ARM
> > > (particularly for Apple M? CPUs), carries ARM and Intel's tags,
> > > implements the behavior Arnaldo asked for, and solves the
> > > inconsistency that I think is fundamentally wrong in the tool that PMU
> > > names shouldn't matter on an event name (an inconsistency my past
> > > fixes introduced). It is also part of solving other problems:
> > > https://lore.kernel.org/linux-perf-users/20250127-counter_delegation-v3-0-64894d7e16d5@rivosinc.com/
> >
> > So you think the below behavior is preferred, right?
> >
> >   cycles -> cpu/cycles/ (or whatever PMU name) -> sysfs or json
> >
> > And there's no way to use legacy event encodings anymore?
> 
> This is absolutely the right thing to do! If sysfs/json knows better
> than to allow a legacy event named cycles, advertises it, then perf
> should select it. Not doing this was the cause of the ARM Apple M?
> breakage - because their PMUs looked uncore before hybrid fixes and so
> weren't known previously to accept legacy events and always used the
> sysfs/json encodings in preference. Why would or not having the PMU in
> the event name imply a different and sometimes known broken encoding?

Because I think 'event' and 'pmu/event/' are different and it's natural
that 'event' (w/o PMU) to use the legacy encoding.  Maybe ARM Apple M?
has broken implementation, but then they should fix it.

IIRC what they wanted was to pick sysfs encoding when they use the
'pmu/event/' format.  And that's perfectly fine.


> And then in the perf stat uniquification we can rename the event to be
> the version with a different encoding. It is madness to me.

Right, so it should use full 'pmu/event/' format.

> 
> If a user wants to force a legacy event, even though most typically
> the driver is saying it knows better, they can use a raw event
> encoding or in the case of cycles its alias cpu-cycles. If there
> really is a use-case for using legacy encodings, we could introduce
> new legacy-cpu and legacy-cache PMUs that advertise the events, but
> then the wildcard behavior would be weird.

I don't think raw event accepts legacy encoding.  Also I don't want the
additional legacy-* PMUs.  Maybe what I want is to remove surprises -
it'd be confusing if I have two events (w/o PMU) and one is using legacy
encoding and the other is using sysfs because (core?) PMU has an alias.

> 
> To be clear, I do not know of a single use-case where the legacy
> encodings are actually wanted when sysfs/json have an encoding. The
> opposite is very much true, that legacy encodings are not wanted -
> hence wanting the lowering of their priority everywhere originally by
> ARM to fix Apple M? and then by RISC-V.

Is Apple M* currently broken?  I'm not sure if we won't ever want the
legacy encoding.

> 
> > >
> > > You've not pointed at anything wrong in the scheme that these patches
> > > introduce, and are supported by vendors, except that it is a behavior
> > > change. I can, and have, pointed at many issues with your proposal
> > > above and the current behavior. The behavior change came about to work
> > > around PMU bugs over 2 years ago but only partially did so. It makes
> > > sense to remedy this and for the clean, consistent behavior this
> > > series achieves. It is unfortunate that it is a behavior change, but
> > > the first step for that was made 2 years ago. I think it also makes
> > > sense that something self described as legacy is a lower priority and
> > > of the past (wrt event naming moving forward).
> >
> > I want to clarify the event parsing behavior and to find the right way
> > to deal with various cases.  I haven't followed the activities in this
> > area closely so I missed some changes in the past.  Maybe the problem
> > is that the behavior is complex and not clarified.  Hopefully we can
> > write it down in a doc.
> 
> I think what is typical in the kernel is the source is the best
> documentation. By simplifying event parsing, for example,
> parse-events.y has been reduced from 952 lines (in v5.10) to 762 lines
> - so we're about 25% simpler whilst being more correct (I've fixed all
> the memory leaks, etc.) and avoiding expensive start-up costs, lazy
> initialization, etc.

That's great!

> 
> Having a single priority for which events are preferred, legacy vs
> sysfs/json with or without PMU, will further make the code base
> simpler and easy to understand.

I cannot agree.  I think 'event' and 'pmu/event/' are different.  And we
need to clarify how to handle 'event' case correctly.  And I hope we can
disable the wildcard behavior.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-30  6:12                   ` Atish Kumar Patra
@ 2025-01-31 22:42                     ` Namhyung Kim
  2025-02-01  8:45                       ` Ian Rogers
  2025-02-03  5:47                       ` Atish Kumar Patra
  0 siblings, 2 replies; 53+ messages in thread
From: Namhyung Kim @ 2025-01-31 22:42 UTC (permalink / raw)
  To: Atish Kumar Patra
  Cc: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Wed, Jan 29, 2025 at 10:12:14PM -0800, Atish Kumar Patra wrote:
> On Wed, Jan 29, 2025 at 1:55 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, Jan 15, 2025 at 01:20:32PM -0800, Ian Rogers wrote:
> > > On Wed, Jan 15, 2025 at 9:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > > On Mon, Jan 13, 2025 at 2:51 PM Ian Rogers <irogers@google.com> wrote:
> > > > > > There was an explicit, and reviewed by Jiri and Arnaldo, intent with
> > > > > > the hybrid work that using a legacy event with a hybrid PMU, even
> > > > > > though the PMU doesn't advertise through json or sysfs the legacy
> > > > > > event, the perf tool supports it.
> > > >
> > > > I thought legacy events on hybrid were converted to PMU events.
> > >
> > > No, when BIG.little was created nothing changed in perf events but
> > > when Intel did hybrid they wanted to make the hybrid CPUs (atom and
> > > performance) appear as if they were one type. The PMU event encodings
> > > vary a lot for this on Intel, ARM has standards for the encoding.
> > > Intel extended the legacy format to take a PMU type id:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/include/uapi/linux/perf_event.h?h=perf-tools-next#n41
> > > "EEEEEEEE: PMU type ID"
> > > that is in the top 32-bits of the config.
> >
> > Oh right, I forgot the extended type thing.  Then we can keep the legacy
> > encoding with it on hybrid systems when users give well-known names (w/o
> > PMU) for legacy event.
> >
> > >
> > > > > >
> > > > > > Making it so that events without PMUs are only legacy events just
> > > > > > doesn't work. There are far too many existing uses of non-legacy
> > > > > > events without PMU, the metrics contain 100s of examples.
> > > >
> > > > That's unfortunate.  It'd be nice if metrics were written with PMU
> > > > names.
> > >
> > > But then we'd end up with things like on Intel:
> > > UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD
> > > becoming:
> > > uncore_cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
> > > or just:
> > > cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
> > > As a user the first works for me and doesn't have any ambiguity over
> > > PMUs as the event name already encodes the PMU. AMD similarly place
> > > the part of a pipeline into event names. Were we to break everybody by
> > > requiring the PMU we'd also need to explain which PMU to use. Sites
> > > with event lists (like https://perfmon-events.intel.com/) don't
> > > explain the PMU and it'd be messy as on Intel you have a CHA PMU for
> > > server chips but a CBOX on client chips, etc.
> >
> > While I prefer having PMU names in the JSON events/metrics, it may not
> > be pratical to change them all.  Probably we can allow them without PMU
> > and hope that they have unique prefixes.
> >
> > >
> > > > I have a question.  What if an event name in a metric matches to
> > > > multiple unrelated PMUs?
> > >
> > > The metric may break or we'd aggregate the unrelated counts together.
> >
> > Ok, then they should use unique names.
> >
> >
> > > Take a metric like IPC as "instructions/cycles", that metric should
> > > work on a hybrid system as they have instructions and cycles. If you
> > > used an event for instructions like inst_retired.any then maybe the
> > > metric will fail on one kind of core that didn't have that event. Now
> >
> > The metrics is for specific CPU model then the vendor should be
> > responsible to provide accurate metrics using approapriate PMU/events
> > IMHO.
> >
> >
> > > if we have accelerators advertising instructions and cycles events, we
> > > should be able to compute the metric for the accelerator. What could
> > > happen today is that the accelerator will have a cpumask of a single
> > > CPU, we could aggregate the accelerator counter into the CPU event
> > > with the same CPU as the cpumask, we'd end up with a weird quasi CPU
> > > and accelerator IPC metric for that CPU. What should happen is that we
> > > get an IPC for the accelerator and IPC for each hybrid core
> > > independently, but the way we handle evsels, CPUs, PMUs is not really
> > > set up for that. Hopefully getting a set of PMUs into the evsel will
> > > clear that up. Assuming all of that is cleared up, is it wrong if the
> > > IPC metric is computed for the accelerator if it was originally
> > > written as a CPU metric? Not really. Could there be metrics where that
> > > is the case?
> >
> > Yes, I think there should be separate metrics for the accelerators.
> >
> >
> > > Probably, and specifying PMUs in the event names would be
> > > a fix. There have also been proposals that we restrict the PMUs for
> > > certain metrics. As event names are currently so distinct it isn't a
> > > problem we've faced yet and it is not clear it is a problem other than
> > > highlighting tech debt in areas of the tool like aggregation.
> > >
> > > > > >
> > > > > > Prior to switching json/sysfs to being the priority when a PMU is
> > > > > > specified, it was the case that all encodings were the same, with or
> > > > > > without a PMU.
> > > > > >
> > > > > > I don't think there is anything natural about assuming things about
> > > > > > event names. Take cycles, cpu-cycles and cpu_cycles:
> > > > > >  - cycles on x86 is only encoded via a legacy event;
> > > > > >  - cpu-cycles on Intel exists as a sysfs event, but cpu-cycles is also
> > > > > > a legacy event name;
> > > > > >  - cpu_cycles exists as a sysfs event on ARM but doesn't have a
> > > > > > corresponding legacy event name.
> > > >
> > > > I think the behavior should be:
> > > >
> > > >   cycles -> PERF_COUNT_HW_CPU_CYCLES
> > > >   cpu-cycles -> PERF_COUNT_HW_CPU_CYCLES
> > > >   cpu_cycles -> no legacy -> sysfs or json
> > > >   cpu/cycles/ -> sysfs or json
> > > >   cpu/cpu-cycles/ -> sysfs or json
> > >
> > > So I disagree as if you add a PMU to an event name the encoding
> > > shouldn't change:
> > > 1) This historically was perf's behavior.
> >
> > Well.. I'm not sure about the history.  I believe the logic I said above
> > is the historic and (I think) right behavior.
> >
> > > 2) Different event encodings can have different behaviors (broken in
> > > some notable cases).
> >
> > Yep, let's make it clear.
> >
> > > 3) Intuitively what wildcarding does is try to open "*/event/" where *
> > > is every possible PMU name. Having different event encodings is
> > > breaking that intuition it could also break situations where you try
> > > to assert equivalence based on type/config.
> >
> > While I don't like the wildcard matching, I think it doesn't matter as
> > long as we keep the above behavior.  If it can find a legacy name, then
> > go with it, done.  If not, try all PMUs as if it's given with PMU name
> > in the event.
> >
> > > 4) The legacy encodings were (are?) broken on ARM Apple M? CPUs,
> > > that's why the priority was changed.
> >
> > I guess that why they use cpu_cycles.
> >
> > > 5) RISC-V would like the tool tackle the legacy to config mapping
> > > challenge, rather than the PMU driver, given the potential diversity
> > > of hardware implementations.
> >
> > I hope they can find a better solution. :)
> >
> 
> Sorry for reposing. Gmail converted it to html for some reason.
> 
> I have posted the latest support here.
> https://lore.kernel.org/kvm/20250127-counter_delegation-v3-12-64894d7e16d5@rivosinc.com/T/
> 
> As of now, we have adopted a hybrid approach where a vendor can decide
> whether to encode the legacy events
> in the json or in the driver (if this series is merged). In absence of
> that, every vendor has to define it in the driver.
> We will deal with the fall out of the exploding driver when the
> situation arrives.

I don't know how hard it'd be cause I'm not familiar with RISC-V.  But
basically you only need to maintain 9 legacy encodings (PERF_COUNT_HW_*)
and a few dozen combinations of supported cache events (PERF_COUNT_HW_
CACHE_*) for each vendor.  All others can go to json anyway.

I think this is what all other archs (including x86) do.

Thanks,
Namhyung

> 
> If a vendor chooses to define in both places, driver encoding will
> take precedence.
> I have tried to describe the scheme in the cover letter. Please let me
> know if I should clarify more.
> 
> > >
> > > To this end we hosted RISC-V's perf people at Google and they
> > > expressed that their preference was what this series does, and they
> > > expressed this directly to you.
> > >
> > > I don't think there would be an issue in this area if it wasn't for
> > > Neoverse and Linus - that's why the revert happened. This change in
> > > behavior was proposed by Arnaldo:
> > > https://lore.kernel.org/lkml/ZlY0F_lmB37g10OK@x1/
> > > and has tags from Intel, ARM and Rivos (RISC-V). I intend to carry it
> > > in Google's tree.
> >
> > Maybe it's because of Linus.  But anyway it reminds me of behaviors that
> > need to be discussed.  And we can (and should) improve things always.
> >
> > Thanks,
> > Namhyung
> >

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-31 22:42                     ` Namhyung Kim
@ 2025-02-01  8:45                       ` Ian Rogers
  2025-02-04  0:15                         ` Namhyung Kim
  2025-02-03  5:47                       ` Atish Kumar Patra
  1 sibling, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-02-01  8:45 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Atish Kumar Patra, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Beeman Strong, Arnaldo Carvalho de Melo

On Fri, Jan 31, 2025 at 2:42 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Jan 29, 2025 at 10:12:14PM -0800, Atish Kumar Patra wrote:
> > On Wed, Jan 29, 2025 at 1:55 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Wed, Jan 15, 2025 at 01:20:32PM -0800, Ian Rogers wrote:
> > > > On Wed, Jan 15, 2025 at 9:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > > On Mon, Jan 13, 2025 at 2:51 PM Ian Rogers <irogers@google.com> wrote:
> > > > > > > There was an explicit, and reviewed by Jiri and Arnaldo, intent with
> > > > > > > the hybrid work that using a legacy event with a hybrid PMU, even
> > > > > > > though the PMU doesn't advertise through json or sysfs the legacy
> > > > > > > event, the perf tool supports it.
> > > > >
> > > > > I thought legacy events on hybrid were converted to PMU events.
> > > >
> > > > No, when BIG.little was created nothing changed in perf events but
> > > > when Intel did hybrid they wanted to make the hybrid CPUs (atom and
> > > > performance) appear as if they were one type. The PMU event encodings
> > > > vary a lot for this on Intel, ARM has standards for the encoding.
> > > > Intel extended the legacy format to take a PMU type id:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/include/uapi/linux/perf_event.h?h=perf-tools-next#n41
> > > > "EEEEEEEE: PMU type ID"
> > > > that is in the top 32-bits of the config.
> > >
> > > Oh right, I forgot the extended type thing.  Then we can keep the legacy
> > > encoding with it on hybrid systems when users give well-known names (w/o
> > > PMU) for legacy event.
> > >
> > > >
> > > > > > >
> > > > > > > Making it so that events without PMUs are only legacy events just
> > > > > > > doesn't work. There are far too many existing uses of non-legacy
> > > > > > > events without PMU, the metrics contain 100s of examples.
> > > > >
> > > > > That's unfortunate.  It'd be nice if metrics were written with PMU
> > > > > names.
> > > >
> > > > But then we'd end up with things like on Intel:
> > > > UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD
> > > > becoming:
> > > > uncore_cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
> > > > or just:
> > > > cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
> > > > As a user the first works for me and doesn't have any ambiguity over
> > > > PMUs as the event name already encodes the PMU. AMD similarly place
> > > > the part of a pipeline into event names. Were we to break everybody by
> > > > requiring the PMU we'd also need to explain which PMU to use. Sites
> > > > with event lists (like https://perfmon-events.intel.com/) don't
> > > > explain the PMU and it'd be messy as on Intel you have a CHA PMU for
> > > > server chips but a CBOX on client chips, etc.
> > >
> > > While I prefer having PMU names in the JSON events/metrics, it may not
> > > be pratical to change them all.  Probably we can allow them without PMU
> > > and hope that they have unique prefixes.
> > >
> > > >
> > > > > I have a question.  What if an event name in a metric matches to
> > > > > multiple unrelated PMUs?
> > > >
> > > > The metric may break or we'd aggregate the unrelated counts together.
> > >
> > > Ok, then they should use unique names.
> > >
> > >
> > > > Take a metric like IPC as "instructions/cycles", that metric should
> > > > work on a hybrid system as they have instructions and cycles. If you
> > > > used an event for instructions like inst_retired.any then maybe the
> > > > metric will fail on one kind of core that didn't have that event. Now
> > >
> > > The metrics is for specific CPU model then the vendor should be
> > > responsible to provide accurate metrics using approapriate PMU/events
> > > IMHO.
> > >
> > >
> > > > if we have accelerators advertising instructions and cycles events, we
> > > > should be able to compute the metric for the accelerator. What could
> > > > happen today is that the accelerator will have a cpumask of a single
> > > > CPU, we could aggregate the accelerator counter into the CPU event
> > > > with the same CPU as the cpumask, we'd end up with a weird quasi CPU
> > > > and accelerator IPC metric for that CPU. What should happen is that we
> > > > get an IPC for the accelerator and IPC for each hybrid core
> > > > independently, but the way we handle evsels, CPUs, PMUs is not really
> > > > set up for that. Hopefully getting a set of PMUs into the evsel will
> > > > clear that up. Assuming all of that is cleared up, is it wrong if the
> > > > IPC metric is computed for the accelerator if it was originally
> > > > written as a CPU metric? Not really. Could there be metrics where that
> > > > is the case?
> > >
> > > Yes, I think there should be separate metrics for the accelerators.
> > >
> > >
> > > > Probably, and specifying PMUs in the event names would be
> > > > a fix. There have also been proposals that we restrict the PMUs for
> > > > certain metrics. As event names are currently so distinct it isn't a
> > > > problem we've faced yet and it is not clear it is a problem other than
> > > > highlighting tech debt in areas of the tool like aggregation.
> > > >
> > > > > > >
> > > > > > > Prior to switching json/sysfs to being the priority when a PMU is
> > > > > > > specified, it was the case that all encodings were the same, with or
> > > > > > > without a PMU.
> > > > > > >
> > > > > > > I don't think there is anything natural about assuming things about
> > > > > > > event names. Take cycles, cpu-cycles and cpu_cycles:
> > > > > > >  - cycles on x86 is only encoded via a legacy event;
> > > > > > >  - cpu-cycles on Intel exists as a sysfs event, but cpu-cycles is also
> > > > > > > a legacy event name;
> > > > > > >  - cpu_cycles exists as a sysfs event on ARM but doesn't have a
> > > > > > > corresponding legacy event name.
> > > > >
> > > > > I think the behavior should be:
> > > > >
> > > > >   cycles -> PERF_COUNT_HW_CPU_CYCLES
> > > > >   cpu-cycles -> PERF_COUNT_HW_CPU_CYCLES
> > > > >   cpu_cycles -> no legacy -> sysfs or json
> > > > >   cpu/cycles/ -> sysfs or json
> > > > >   cpu/cpu-cycles/ -> sysfs or json
> > > >
> > > > So I disagree as if you add a PMU to an event name the encoding
> > > > shouldn't change:
> > > > 1) This historically was perf's behavior.
> > >
> > > Well.. I'm not sure about the history.  I believe the logic I said above
> > > is the historic and (I think) right behavior.
> > >
> > > > 2) Different event encodings can have different behaviors (broken in
> > > > some notable cases).
> > >
> > > Yep, let's make it clear.
> > >
> > > > 3) Intuitively what wildcarding does is try to open "*/event/" where *
> > > > is every possible PMU name. Having different event encodings is
> > > > breaking that intuition it could also break situations where you try
> > > > to assert equivalence based on type/config.
> > >
> > > While I don't like the wildcard matching, I think it doesn't matter as
> > > long as we keep the above behavior.  If it can find a legacy name, then
> > > go with it, done.  If not, try all PMUs as if it's given with PMU name
> > > in the event.
> > >
> > > > 4) The legacy encodings were (are?) broken on ARM Apple M? CPUs,
> > > > that's why the priority was changed.
> > >
> > > I guess that why they use cpu_cycles.
> > >
> > > > 5) RISC-V would like the tool tackle the legacy to config mapping
> > > > challenge, rather than the PMU driver, given the potential diversity
> > > > of hardware implementations.
> > >
> > > I hope they can find a better solution. :)
> > >
> >
> > Sorry for reposing. Gmail converted it to html for some reason.
> >
> > I have posted the latest support here.
> > https://lore.kernel.org/kvm/20250127-counter_delegation-v3-12-64894d7e16d5@rivosinc.com/T/
> >
> > As of now, we have adopted a hybrid approach where a vendor can decide
> > whether to encode the legacy events
> > in the json or in the driver (if this series is merged). In absence of
> > that, every vendor has to define it in the driver.
> > We will deal with the fall out of the exploding driver when the
> > situation arrives.
>
> I don't know how hard it'd be cause I'm not familiar with RISC-V.  But
> basically you only need to maintain 9 legacy encodings (PERF_COUNT_HW_*)
> and a few dozen combinations of supported cache events (PERF_COUNT_HW_
> CACHE_*) for each vendor.  All others can go to json anyway.
>
> I think this is what all other archs (including x86) do.

This is well known to the people involved.

While the PMU driver needs to encode or avoid these event names, they
become special "legacy" names inside the perf tool. Magically a name
like cpu_cycles will wildcard match (match on >1 PMU) whilst a name
like cpu-cycles won't (only matching on core PMUs). This is completely
confusing to users. It is even more confusing when you are saying the
tool should intentionally use two different encodings.

The perf event enum types are limited but the tool recognizes more
event names and then uses legacy encodings. I have yet to hear a
sensible list of what are legacy event names, is cpu-cycles in there
or just cycles? Why on earth would you want to keep synonyms like LLC
meaning L2 cache?

The intention with "pmu syntax" for events is that the PMU clarifies
the type in the perf_event_attr. Previously it was assumed that the
PMU type would be raw (4), and the x86 PMUs even use that as their
type number. Pretending these days we don't now have hybrid core PMUs,
10s of uncore PMUs. Doing that work had to reinvent event parsing and
encoding.

If you look at the matching as it is today:
cpu_cycles -> tries to match on all PMUs
*/cpu_cycles/ -> tries to match on all PMUs
arm*/cpu_cycles/ -> tries matches on all PMUs that have arm at the start
armv8_pmuv3/cpu_cycles/ -> matches only the armv8_pmuv3 PMU

I don't see why it isn't obvious that the behavior of no PMU and the
PMU being * is expected to be exactly the same - it really is today
and that is what the code does, please try it. There just isn't a
notion of not having a PMU because even for legacy events we have to
reinvent the PMUs to inject the correct extended type information
otherwise we'd profile just a fraction of the cores. We add PMUs when
we display events to make the events more readable. There isn't a
notion of these events being legacy and not, they are just assumed to
be the same, PMU or not.

As I've explained to you, I plan to transition the metric code to use
event parsing and to union evlists rather than use strings and hash
tables. This is to fix tracepoints appearing incorrectly to always
have suffixes in the "metric-id" calculation. Recognizing modifiers
properly would end up reinventing event parsing, so let's just make
use of what we have and parse events early. It makes sense when
unioning evsels in an evlist to do it off of the perf_event_attr, this
will allow Intel's slots and topdown.slots to be correctly detected as
aliases in metrics, something of a pain in formulas today. Why would
the behavior of an event like cycles be different in non-hybrid
metrics (where PMUs generally aren't specified) and in hybrid metrics
(where PMUs generally are specified)? Events may not be recognized as
aliases because ones without a PMU in the metric will get a legacy
encoding. In your change:
https://lore.kernel.org/r/20221018020227.85905-16-namhyung@kernel.org
you assume all events with the same name are in fact the same event,
but that is making wild assumptions about what is placed in the evsel
name and I am trying to fix it in:
https://lore.kernel.org/lkml/20250201074320.746259-1-irogers@google.com/
You did similar with your proposal for hwmon events and I rejected it.
The fact that the name term in an event configuration clobbers an
evsel's name, its just the intent of the thing and the name was never
supposed to have some sacred legacy or whatever meaning.

I still see no sense in:
perf stat -e cpu_cycles ...
meaning:
perf stat -e */cpu_cycles/ ...
and:
perf stat -e cpu-cycles ...
trying to mean close to:
perf stat -e cpu/cpu-cycles/ ...
why one is implicitly a * and the other a core PMU, I mean it is the
definition of confusing. And in the latter cpu-cycles case you want
those two events to be encoded differently.

All of this is overlooking that we have 1 event that is a problem on 1
PMU on 1 architecture. If it weren't for that event we'd already have
this patch landed and consistent event encodings. By not taking the
patch it hurts Apple M, RISC-V users and my own work.

Please can you explain why keeping the current encoding is good and if
we like legacy events so much, can we revert the changes to prioritize
sysfs/json when a PMU name is present. I'm afraid what you are
explaining makes no sense to me, breaks existing platforms (Apple M)
and is a blockage to future work. Saying everyone should rewrite
everything, that's not a workable solution - not least because in some
situations (old PMU drivers on Apple M) we lack a time machine.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-01-31 22:42                     ` Namhyung Kim
  2025-02-01  8:45                       ` Ian Rogers
@ 2025-02-03  5:47                       ` Atish Kumar Patra
  1 sibling, 0 replies; 53+ messages in thread
From: Atish Kumar Patra @ 2025-02-03  5:47 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	bpf, Aditya Bodkhe, Leo Yan, Beeman Strong,
	Arnaldo Carvalho de Melo

On Fri, Jan 31, 2025 at 2:42 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Jan 29, 2025 at 10:12:14PM -0800, Atish Kumar Patra wrote:
> > On Wed, Jan 29, 2025 at 1:55 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Wed, Jan 15, 2025 at 01:20:32PM -0800, Ian Rogers wrote:
> > > > On Wed, Jan 15, 2025 at 9:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > > On Mon, Jan 13, 2025 at 2:51 PM Ian Rogers <irogers@google.com> wrote:
> > > > > > > There was an explicit, and reviewed by Jiri and Arnaldo, intent with
> > > > > > > the hybrid work that using a legacy event with a hybrid PMU, even
> > > > > > > though the PMU doesn't advertise through json or sysfs the legacy
> > > > > > > event, the perf tool supports it.
> > > > >
> > > > > I thought legacy events on hybrid were converted to PMU events.
> > > >
> > > > No, when BIG.little was created nothing changed in perf events but
> > > > when Intel did hybrid they wanted to make the hybrid CPUs (atom and
> > > > performance) appear as if they were one type. The PMU event encodings
> > > > vary a lot for this on Intel, ARM has standards for the encoding.
> > > > Intel extended the legacy format to take a PMU type id:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/include/uapi/linux/perf_event.h?h=perf-tools-next#n41
> > > > "EEEEEEEE: PMU type ID"
> > > > that is in the top 32-bits of the config.
> > >
> > > Oh right, I forgot the extended type thing.  Then we can keep the legacy
> > > encoding with it on hybrid systems when users give well-known names (w/o
> > > PMU) for legacy event.
> > >
> > > >
> > > > > > >
> > > > > > > Making it so that events without PMUs are only legacy events just
> > > > > > > doesn't work. There are far too many existing uses of non-legacy
> > > > > > > events without PMU, the metrics contain 100s of examples.
> > > > >
> > > > > That's unfortunate.  It'd be nice if metrics were written with PMU
> > > > > names.
> > > >
> > > > But then we'd end up with things like on Intel:
> > > > UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD
> > > > becoming:
> > > > uncore_cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
> > > > or just:
> > > > cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
> > > > As a user the first works for me and doesn't have any ambiguity over
> > > > PMUs as the event name already encodes the PMU. AMD similarly place
> > > > the part of a pipeline into event names. Were we to break everybody by
> > > > requiring the PMU we'd also need to explain which PMU to use. Sites
> > > > with event lists (like https://perfmon-events.intel.com/) don't
> > > > explain the PMU and it'd be messy as on Intel you have a CHA PMU for
> > > > server chips but a CBOX on client chips, etc.
> > >
> > > While I prefer having PMU names in the JSON events/metrics, it may not
> > > be pratical to change them all.  Probably we can allow them without PMU
> > > and hope that they have unique prefixes.
> > >
> > > >
> > > > > I have a question.  What if an event name in a metric matches to
> > > > > multiple unrelated PMUs?
> > > >
> > > > The metric may break or we'd aggregate the unrelated counts together.
> > >
> > > Ok, then they should use unique names.
> > >
> > >
> > > > Take a metric like IPC as "instructions/cycles", that metric should
> > > > work on a hybrid system as they have instructions and cycles. If you
> > > > used an event for instructions like inst_retired.any then maybe the
> > > > metric will fail on one kind of core that didn't have that event. Now
> > >
> > > The metrics is for specific CPU model then the vendor should be
> > > responsible to provide accurate metrics using approapriate PMU/events
> > > IMHO.
> > >
> > >
> > > > if we have accelerators advertising instructions and cycles events, we
> > > > should be able to compute the metric for the accelerator. What could
> > > > happen today is that the accelerator will have a cpumask of a single
> > > > CPU, we could aggregate the accelerator counter into the CPU event
> > > > with the same CPU as the cpumask, we'd end up with a weird quasi CPU
> > > > and accelerator IPC metric for that CPU. What should happen is that we
> > > > get an IPC for the accelerator and IPC for each hybrid core
> > > > independently, but the way we handle evsels, CPUs, PMUs is not really
> > > > set up for that. Hopefully getting a set of PMUs into the evsel will
> > > > clear that up. Assuming all of that is cleared up, is it wrong if the
> > > > IPC metric is computed for the accelerator if it was originally
> > > > written as a CPU metric? Not really. Could there be metrics where that
> > > > is the case?
> > >
> > > Yes, I think there should be separate metrics for the accelerators.
> > >
> > >
> > > > Probably, and specifying PMUs in the event names would be
> > > > a fix. There have also been proposals that we restrict the PMUs for
> > > > certain metrics. As event names are currently so distinct it isn't a
> > > > problem we've faced yet and it is not clear it is a problem other than
> > > > highlighting tech debt in areas of the tool like aggregation.
> > > >
> > > > > > >
> > > > > > > Prior to switching json/sysfs to being the priority when a PMU is
> > > > > > > specified, it was the case that all encodings were the same, with or
> > > > > > > without a PMU.
> > > > > > >
> > > > > > > I don't think there is anything natural about assuming things about
> > > > > > > event names. Take cycles, cpu-cycles and cpu_cycles:
> > > > > > >  - cycles on x86 is only encoded via a legacy event;
> > > > > > >  - cpu-cycles on Intel exists as a sysfs event, but cpu-cycles is also
> > > > > > > a legacy event name;
> > > > > > >  - cpu_cycles exists as a sysfs event on ARM but doesn't have a
> > > > > > > corresponding legacy event name.
> > > > >
> > > > > I think the behavior should be:
> > > > >
> > > > >   cycles -> PERF_COUNT_HW_CPU_CYCLES
> > > > >   cpu-cycles -> PERF_COUNT_HW_CPU_CYCLES
> > > > >   cpu_cycles -> no legacy -> sysfs or json
> > > > >   cpu/cycles/ -> sysfs or json
> > > > >   cpu/cpu-cycles/ -> sysfs or json
> > > >
> > > > So I disagree as if you add a PMU to an event name the encoding
> > > > shouldn't change:
> > > > 1) This historically was perf's behavior.
> > >
> > > Well.. I'm not sure about the history.  I believe the logic I said above
> > > is the historic and (I think) right behavior.
> > >
> > > > 2) Different event encodings can have different behaviors (broken in
> > > > some notable cases).
> > >
> > > Yep, let's make it clear.
> > >
> > > > 3) Intuitively what wildcarding does is try to open "*/event/" where *
> > > > is every possible PMU name. Having different event encodings is
> > > > breaking that intuition it could also break situations where you try
> > > > to assert equivalence based on type/config.
> > >
> > > While I don't like the wildcard matching, I think it doesn't matter as
> > > long as we keep the above behavior.  If it can find a legacy name, then
> > > go with it, done.  If not, try all PMUs as if it's given with PMU name
> > > in the event.
> > >
> > > > 4) The legacy encodings were (are?) broken on ARM Apple M? CPUs,
> > > > that's why the priority was changed.
> > >
> > > I guess that why they use cpu_cycles.
> > >
> > > > 5) RISC-V would like the tool tackle the legacy to config mapping
> > > > challenge, rather than the PMU driver, given the potential diversity
> > > > of hardware implementations.
> > >
> > > I hope they can find a better solution. :)
> > >
> >
> > Sorry for reposing. Gmail converted it to html for some reason.
> >
> > I have posted the latest support here.
> > https://lore.kernel.org/kvm/20250127-counter_delegation-v3-12-64894d7e16d5@rivosinc.com/T/
> >
> > As of now, we have adopted a hybrid approach where a vendor can decide
> > whether to encode the legacy events
> > in the json or in the driver (if this series is merged). In absence of
> > that, every vendor has to define it in the driver.
> > We will deal with the fall out of the exploding driver when the
> > situation arrives.
>
> I don't know how hard it'd be cause I'm not familiar with RISC-V.  But
> basically you only need to maintain 9 legacy encodings (PERF_COUNT_HW_*)
> and a few dozen combinations of supported cache events (PERF_COUNT_HW_
> CACHE_*) for each vendor.  All others can go to json anyway.
>
That's what I did in the series posted above.

The infrastructure:
https://lore.kernel.org/kvm/20250127-counter_delegation-v3-12-64894d7e16d5@rivosinc.com/T/#m2bcb3bbf267f9131c45cc55e3dffd45c31859f34

Example of qemu event encoding:
https://lore.kernel.org/kvm/20250127-counter_delegation-v3-12-64894d7e16d5@rivosinc.com/T/#m51985fb0c4e323bc037da32a19b7d85d92d0d864

Please let us know if you see any issues with this approach.

> I think this is what all other archs (including x86) do.
>
> Thanks,
> Namhyung
>
> >
> > If a vendor chooses to define in both places, driver encoding will
> > take precedence.
> > I have tried to describe the scheme in the cover letter. Please let me
> > know if I should clarify more.
> >
> > > >
> > > > To this end we hosted RISC-V's perf people at Google and they
> > > > expressed that their preference was what this series does, and they
> > > > expressed this directly to you.
> > > >
> > > > I don't think there would be an issue in this area if it wasn't for
> > > > Neoverse and Linus - that's why the revert happened. This change in
> > > > behavior was proposed by Arnaldo:
> > > > https://lore.kernel.org/lkml/ZlY0F_lmB37g10OK@x1/
> > > > and has tags from Intel, ARM and Rivos (RISC-V). I intend to carry it
> > > > in Google's tree.
> > >
> > > Maybe it's because of Linus.  But anyway it reminds me of behaviors that
> > > need to be discussed.  And we can (and should) improve things always.
> > >
> > > Thanks,
> > > Namhyung
> > >

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-02-01  8:45                       ` Ian Rogers
@ 2025-02-04  0:15                         ` Namhyung Kim
  2025-02-04  0:41                           ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Namhyung Kim @ 2025-02-04  0:15 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Atish Kumar Patra, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Beeman Strong, Arnaldo Carvalho de Melo

On Sat, Feb 01, 2025 at 12:45:04AM -0800, Ian Rogers wrote:
> On Fri, Jan 31, 2025 at 2:42 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, Jan 29, 2025 at 10:12:14PM -0800, Atish Kumar Patra wrote:
> > > On Wed, Jan 29, 2025 at 1:55 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Wed, Jan 15, 2025 at 01:20:32PM -0800, Ian Rogers wrote:
> > > > > On Wed, Jan 15, 2025 at 9:59 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > > On Mon, Jan 13, 2025 at 2:51 PM Ian Rogers <irogers@google.com> wrote:
> > > > > > > > There was an explicit, and reviewed by Jiri and Arnaldo, intent with
> > > > > > > > the hybrid work that using a legacy event with a hybrid PMU, even
> > > > > > > > though the PMU doesn't advertise through json or sysfs the legacy
> > > > > > > > event, the perf tool supports it.
> > > > > >
> > > > > > I thought legacy events on hybrid were converted to PMU events.
> > > > >
> > > > > No, when BIG.little was created nothing changed in perf events but
> > > > > when Intel did hybrid they wanted to make the hybrid CPUs (atom and
> > > > > performance) appear as if they were one type. The PMU event encodings
> > > > > vary a lot for this on Intel, ARM has standards for the encoding.
> > > > > Intel extended the legacy format to take a PMU type id:
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/include/uapi/linux/perf_event.h?h=perf-tools-next#n41
> > > > > "EEEEEEEE: PMU type ID"
> > > > > that is in the top 32-bits of the config.
> > > >
> > > > Oh right, I forgot the extended type thing.  Then we can keep the legacy
> > > > encoding with it on hybrid systems when users give well-known names (w/o
> > > > PMU) for legacy event.
> > > >
> > > > >
> > > > > > > >
> > > > > > > > Making it so that events without PMUs are only legacy events just
> > > > > > > > doesn't work. There are far too many existing uses of non-legacy
> > > > > > > > events without PMU, the metrics contain 100s of examples.
> > > > > >
> > > > > > That's unfortunate.  It'd be nice if metrics were written with PMU
> > > > > > names.
> > > > >
> > > > > But then we'd end up with things like on Intel:
> > > > > UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD
> > > > > becoming:
> > > > > uncore_cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
> > > > > or just:
> > > > > cha/UNC_CHA_TOR_OCCUPANCY.IA_MISS_DRD/
> > > > > As a user the first works for me and doesn't have any ambiguity over
> > > > > PMUs as the event name already encodes the PMU. AMD similarly place
> > > > > the part of a pipeline into event names. Were we to break everybody by
> > > > > requiring the PMU we'd also need to explain which PMU to use. Sites
> > > > > with event lists (like https://perfmon-events.intel.com/) don't
> > > > > explain the PMU and it'd be messy as on Intel you have a CHA PMU for
> > > > > server chips but a CBOX on client chips, etc.
> > > >
> > > > While I prefer having PMU names in the JSON events/metrics, it may not
> > > > be pratical to change them all.  Probably we can allow them without PMU
> > > > and hope that they have unique prefixes.
> > > >
> > > > >
> > > > > > I have a question.  What if an event name in a metric matches to
> > > > > > multiple unrelated PMUs?
> > > > >
> > > > > The metric may break or we'd aggregate the unrelated counts together.
> > > >
> > > > Ok, then they should use unique names.
> > > >
> > > >
> > > > > Take a metric like IPC as "instructions/cycles", that metric should
> > > > > work on a hybrid system as they have instructions and cycles. If you
> > > > > used an event for instructions like inst_retired.any then maybe the
> > > > > metric will fail on one kind of core that didn't have that event. Now
> > > >
> > > > The metrics is for specific CPU model then the vendor should be
> > > > responsible to provide accurate metrics using approapriate PMU/events
> > > > IMHO.
> > > >
> > > >
> > > > > if we have accelerators advertising instructions and cycles events, we
> > > > > should be able to compute the metric for the accelerator. What could
> > > > > happen today is that the accelerator will have a cpumask of a single
> > > > > CPU, we could aggregate the accelerator counter into the CPU event
> > > > > with the same CPU as the cpumask, we'd end up with a weird quasi CPU
> > > > > and accelerator IPC metric for that CPU. What should happen is that we
> > > > > get an IPC for the accelerator and IPC for each hybrid core
> > > > > independently, but the way we handle evsels, CPUs, PMUs is not really
> > > > > set up for that. Hopefully getting a set of PMUs into the evsel will
> > > > > clear that up. Assuming all of that is cleared up, is it wrong if the
> > > > > IPC metric is computed for the accelerator if it was originally
> > > > > written as a CPU metric? Not really. Could there be metrics where that
> > > > > is the case?
> > > >
> > > > Yes, I think there should be separate metrics for the accelerators.
> > > >
> > > >
> > > > > Probably, and specifying PMUs in the event names would be
> > > > > a fix. There have also been proposals that we restrict the PMUs for
> > > > > certain metrics. As event names are currently so distinct it isn't a
> > > > > problem we've faced yet and it is not clear it is a problem other than
> > > > > highlighting tech debt in areas of the tool like aggregation.
> > > > >
> > > > > > > >
> > > > > > > > Prior to switching json/sysfs to being the priority when a PMU is
> > > > > > > > specified, it was the case that all encodings were the same, with or
> > > > > > > > without a PMU.
> > > > > > > >
> > > > > > > > I don't think there is anything natural about assuming things about
> > > > > > > > event names. Take cycles, cpu-cycles and cpu_cycles:
> > > > > > > >  - cycles on x86 is only encoded via a legacy event;
> > > > > > > >  - cpu-cycles on Intel exists as a sysfs event, but cpu-cycles is also
> > > > > > > > a legacy event name;
> > > > > > > >  - cpu_cycles exists as a sysfs event on ARM but doesn't have a
> > > > > > > > corresponding legacy event name.
> > > > > >
> > > > > > I think the behavior should be:
> > > > > >
> > > > > >   cycles -> PERF_COUNT_HW_CPU_CYCLES
> > > > > >   cpu-cycles -> PERF_COUNT_HW_CPU_CYCLES
> > > > > >   cpu_cycles -> no legacy -> sysfs or json
> > > > > >   cpu/cycles/ -> sysfs or json
> > > > > >   cpu/cpu-cycles/ -> sysfs or json
> > > > >
> > > > > So I disagree as if you add a PMU to an event name the encoding
> > > > > shouldn't change:
> > > > > 1) This historically was perf's behavior.
> > > >
> > > > Well.. I'm not sure about the history.  I believe the logic I said above
> > > > is the historic and (I think) right behavior.
> > > >
> > > > > 2) Different event encodings can have different behaviors (broken in
> > > > > some notable cases).
> > > >
> > > > Yep, let's make it clear.
> > > >
> > > > > 3) Intuitively what wildcarding does is try to open "*/event/" where *
> > > > > is every possible PMU name. Having different event encodings is
> > > > > breaking that intuition it could also break situations where you try
> > > > > to assert equivalence based on type/config.
> > > >
> > > > While I don't like the wildcard matching, I think it doesn't matter as
> > > > long as we keep the above behavior.  If it can find a legacy name, then
> > > > go with it, done.  If not, try all PMUs as if it's given with PMU name
> > > > in the event.
> > > >
> > > > > 4) The legacy encodings were (are?) broken on ARM Apple M? CPUs,
> > > > > that's why the priority was changed.
> > > >
> > > > I guess that why they use cpu_cycles.
> > > >
> > > > > 5) RISC-V would like the tool tackle the legacy to config mapping
> > > > > challenge, rather than the PMU driver, given the potential diversity
> > > > > of hardware implementations.
> > > >
> > > > I hope they can find a better solution. :)
> > > >
> > >
> > > Sorry for reposing. Gmail converted it to html for some reason.
> > >
> > > I have posted the latest support here.
> > > https://lore.kernel.org/kvm/20250127-counter_delegation-v3-12-64894d7e16d5@rivosinc.com/T/
> > >
> > > As of now, we have adopted a hybrid approach where a vendor can decide
> > > whether to encode the legacy events
> > > in the json or in the driver (if this series is merged). In absence of
> > > that, every vendor has to define it in the driver.
> > > We will deal with the fall out of the exploding driver when the
> > > situation arrives.
> >
> > I don't know how hard it'd be cause I'm not familiar with RISC-V.  But
> > basically you only need to maintain 9 legacy encodings (PERF_COUNT_HW_*)
> > and a few dozen combinations of supported cache events (PERF_COUNT_HW_
> > CACHE_*) for each vendor.  All others can go to json anyway.
> >
> > I think this is what all other archs (including x86) do.
> 
> This is well known to the people involved.
> 
> While the PMU driver needs to encode or avoid these event names, they
> become special "legacy" names inside the perf tool. Magically a name
> like cpu_cycles will wildcard match (match on >1 PMU) whilst a name
> like cpu-cycles won't (only matching on core PMUs). This is completely
> confusing to users. It is even more confusing when you are saying the
> tool should intentionally use two different encodings.

The legacy encoding is a part of the ABI, and it's natural to use it.
We historically used 'cycles' and 'cpu-cycles' as legacy events and it
should remain as is IMHO.  I'm not sure why ARM uses 'cpu_cycles', but
I guess they don't want to use the legacy encoding for some reason.

> 
> The perf event enum types are limited but the tool recognizes more
> event names and then uses legacy encodings. I have yet to hear a
> sensible list of what are legacy event names, is cpu-cycles in there
> or just cycles? Why on earth would you want to keep synonyms like LLC
> meaning L2 cache?

I think it's clear what are legacy events: `perf list hw`.

In fact, it doesn't matter for tools what LLC means.  I think it's the
drivers' respensibility to match sensible events to legacy encoding.
We only need to use the event as they prepared.

> 
> The intention with "pmu syntax" for events is that the PMU clarifies
> the type in the perf_event_attr. Previously it was assumed that the
> PMU type would be raw (4), and the x86 PMUs even use that as their
> type number. Pretending these days we don't now have hybrid core PMUs,
> 10s of uncore PMUs. Doing that work had to reinvent event parsing and
> encoding.
> 
> If you look at the matching as it is today:
> cpu_cycles -> tries to match on all PMUs
> */cpu_cycles/ -> tries to match on all PMUs
> arm*/cpu_cycles/ -> tries matches on all PMUs that have arm at the start
> armv8_pmuv3/cpu_cycles/ -> matches only the armv8_pmuv3 PMU

I didn't realize we can use '*'.  Then I guess we can disable the
default wildcard match.  Users can add '*/.../' easily if they really
want it, right?  I still think all of this problem comes from the
wildcard behavior.

Probably we need to do these for events without PMU name:
1. use legacy event if it's the well-known name, if not
2. check core PMU (cpu) for sysfs events, if not
3. search all JSON events (not to break metrics)

> 
> I don't see why it isn't obvious that the behavior of no PMU and the
> PMU being * is expected to be exactly the same - it really is today
> and that is what the code does, please try it. There just isn't a
> notion of not having a PMU because even for legacy events we have to
> reinvent the PMUs to inject the correct extended type information
> otherwise we'd profile just a fraction of the cores. We add PMUs when
> we display events to make the events more readable. There isn't a
> notion of these events being legacy and not, they are just assumed to
> be the same, PMU or not.

Yes, it's confusing.  So I think we'd better make cycles != */cycles/.

> 
> As I've explained to you, I plan to transition the metric code to use
> event parsing and to union evlists rather than use strings and hash
> tables. This is to fix tracepoints appearing incorrectly to always
> have suffixes in the "metric-id" calculation. Recognizing modifiers
> properly would end up reinventing event parsing, so let's just make
> use of what we have and parse events early. It makes sense when
> unioning evsels in an evlist to do it off of the perf_event_attr, this
> will allow Intel's slots and topdown.slots to be correctly detected as
> aliases in metrics, something of a pain in formulas today. Why would
> the behavior of an event like cycles be different in non-hybrid
> metrics (where PMUs generally aren't specified) and in hybrid metrics
> (where PMUs generally are specified)? Events may not be recognized as
> aliases because ones without a PMU in the metric will get a legacy
> encoding. In your change:
> https://lore.kernel.org/r/20221018020227.85905-16-namhyung@kernel.org
> you assume all events with the same name are in fact the same event,
> but that is making wild assumptions about what is placed in the evsel
> name and I am trying to fix it in:
> https://lore.kernel.org/lkml/20250201074320.746259-1-irogers@google.com/
> You did similar with your proposal for hwmon events and I rejected it.
> The fact that the name term in an event configuration clobbers an
> evsel's name, its just the intent of the thing and the name was never
> supposed to have some sacred legacy or whatever meaning.

Thanks for the fix!

> 
> I still see no sense in:
> perf stat -e cpu_cycles ...
> meaning:
> perf stat -e */cpu_cycles/ ...
> and:
> perf stat -e cpu-cycles ...
> trying to mean close to:
> perf stat -e cpu/cpu-cycles/ ...
> why one is implicitly a * and the other a core PMU, I mean it is the
> definition of confusing. And in the latter cpu-cycles case you want
> those two events to be encoded differently.

Yep, I agree it's confusing.  So my opinion is to use legacy encoding
and no default wildcard. :)

> 
> All of this is overlooking that we have 1 event that is a problem on 1
> PMU on 1 architecture. If it weren't for that event we'd already have
> this patch landed and consistent event encodings. By not taking the
> patch it hurts Apple M, RISC-V users and my own work.

Well, I'm not talking about the specific event or an architecture.  What
I'm focusing on is what the sensible behavior is.

> 
> Please can you explain why keeping the current encoding is good and if
> we like legacy events so much, can we revert the changes to prioritize
> sysfs/json when a PMU name is present. I'm afraid what you are
> explaining makes no sense to me, breaks existing platforms (Apple M)
> and is a blockage to future work. Saying everyone should rewrite
> everything, that's not a workable solution - not least because in some
> situations (old PMU drivers on Apple M) we lack a time machine.

It's not clear to me if we have a problem on Apple M as of now.

And I don't have a problem with 'pmu/event/' case.  I hope to find a way
to support what I described without rewriting all metrics.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-02-04  0:15                         ` Namhyung Kim
@ 2025-02-04  0:41                           ` Ian Rogers
  2025-02-05  1:57                             ` Namhyung Kim
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-02-04  0:41 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Atish Kumar Patra, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Beeman Strong, Arnaldo Carvalho de Melo

On Mon, Feb 3, 2025 at 4:15 PM Namhyung Kim <namhyung@kernel.org> wrote:
[snip]
> Yep, I agree it's confusing.  So my opinion is to use legacy encoding
> and no default wildcard. :)

Making it so that all non-legacy, non-core PMU events require a PMU is
a breaking change and a regression for all users, command line event
name suggesting, any tool built off of perf, and so on. Breaking all
perf users and requiring all perf metrics be rewritten is well..
something..

Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-02-04  0:41                           ` Ian Rogers
@ 2025-02-05  1:57                             ` Namhyung Kim
  2025-02-05  4:48                               ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Namhyung Kim @ 2025-02-05  1:57 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Atish Kumar Patra, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Beeman Strong, Arnaldo Carvalho de Melo

On Mon, Feb 03, 2025 at 04:41:11PM -0800, Ian Rogers wrote:
> On Mon, Feb 3, 2025 at 4:15 PM Namhyung Kim <namhyung@kernel.org> wrote:
> [snip]
> > Yep, I agree it's confusing.  So my opinion is to use legacy encoding
> > and no default wildcard. :)
> 
> Making it so that all non-legacy, non-core PMU events require a PMU is
> a breaking change and a regression for all users, command line event
> name suggesting, any tool built off of perf, and so on. Breaking all
> perf users and requiring all perf metrics be rewritten is well..
> something..

Well, I guess the majority of users don't use non-core PMU events.  And
we used to have PMU prefix on those events for years so old users should
not be affected.  Actually perf list shows them with PMU prefix so I
think new users are also expected to use the PMU name.

  $ perf list pmu
  ...
  cstate_pkg/c2-residency/                           [Kernel PMU event]
  ...
  i915/actual-frequency/                             [Kernel PMU event]
  i915/bcs0-busy/                                    [Kernel PMU event]
  ...
  msr/tsc/                                           [Kernel PMU event]
  ...
  power/energy-cores/                                [Kernel PMU event]
  ...
  uncore_clock/clockticks/                           [Kernel PMU event]
  uncore_imc_free_running/data_read/                 [Kernel PMU event]  
  ...

The exception is the JSON events like below.

  uncore interconnect:
    unc_arb_coh_trk_requests.all
         [UNC_ARB_COH_TRK_REQUESTS.ALL. Unit: uncore_arb]
 
which I hoped to be 'uncore_arb/unc_arb_coh_trk_requests.all/' or even
'uncore_arb/coh_trk_requests.all/'.  But it would be hard to change the
all metric expressions now.  Also users can directly use them as they
are listed by `perf list`.  So we need to support that without PMUs.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-02-05  1:57                             ` Namhyung Kim
@ 2025-02-05  4:48                               ` Ian Rogers
  2025-02-06  5:09                                 ` Namhyung Kim
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-02-05  4:48 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Atish Kumar Patra, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Beeman Strong, Arnaldo Carvalho de Melo

On Tue, Feb 4, 2025 at 5:58 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Mon, Feb 03, 2025 at 04:41:11PM -0800, Ian Rogers wrote:
> > On Mon, Feb 3, 2025 at 4:15 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > [snip]
> > > Yep, I agree it's confusing.  So my opinion is to use legacy encoding
> > > and no default wildcard. :)
> >
> > Making it so that all non-legacy, non-core PMU events require a PMU is
> > a breaking change and a regression for all users, command line event
> > name suggesting, any tool built off of perf, and so on. Breaking all
> > perf users and requiring all perf metrics be rewritten is well..
> > something..
>
> Well, I guess the majority of users don't use non-core PMU events.  And
> we used to have PMU prefix on those events for years so old users should
> not be affected.  Actually perf list shows them with PMU prefix so I
> think new users are also expected to use the PMU name.
>
>   $ perf list pmu
>   ...
>   cstate_pkg/c2-residency/                           [Kernel PMU event]
>   ...
>   i915/actual-frequency/                             [Kernel PMU event]
>   i915/bcs0-busy/                                    [Kernel PMU event]
>   ...
>   msr/tsc/                                           [Kernel PMU event]
>   ...
>   power/energy-cores/                                [Kernel PMU event]
>   ...
>   uncore_clock/clockticks/                           [Kernel PMU event]
>   uncore_imc_free_running/data_read/                 [Kernel PMU event]
>   ...
>
> The exception is the JSON events like below.
>
>   uncore interconnect:
>     unc_arb_coh_trk_requests.all
>          [UNC_ARB_COH_TRK_REQUESTS.ALL. Unit: uncore_arb]
>
> which I hoped to be 'uncore_arb/unc_arb_coh_trk_requests.all/' or even
> 'uncore_arb/coh_trk_requests.all/'.  But it would be hard to change the
> all metric expressions now.  Also users can directly use them as they
> are listed by `perf list`.  So we need to support that without PMUs.

So there's nothing wrong with your proposal except it breaks non-core
events. We can't agree to flip the default on a flag for perf top:
https://lore.kernel.org/lkml/20240516222159.3710131-1-irogers@google.com/
to make perf top behave as, you know, top does as it could be an
option people depend on. A behavior that matters if you do user
filtering as exited processes stay in perf top (both confusing and
un-top like). Fwiw, that reminds me of another patch series being
unreviewed:
https://lore.kernel.org/lkml/20250111190143.1029906-1-irogers@google.com/
Anyway, the perf top flag is one that no-one knows exists on a command
most people don't know exists - Julia Evans' zine of course loves it
and we love Julia's work and the zine. So, it would seem to me that
changing something as fundamental as how all non-core events behave
would be seen as a regression. Imagine the person going to
perfmon-events.intel.com, finding an event name and expecting to be
able to use it with perf. Now they need to grub around in perf list to
locate the PMU. What is appropriate for them to know about how
suffixes work and show in perf list..? Well that's assuming suffixes
work in the future as ARM will probably launch an a1000 CPU and the
PMU will look like a hex suffix and the whole naming convention
implodes.

Even with this what would be the behavior of core events? You want
legacy events to have priority over sysfs/json when there is no PMU.
You know, and have stated not caring, RISC-V wants different and that
it breaks Apple-M's PMUs for a fairly large range of kernel releases
including 1 LTS kernel - the only reason I'm writing patches in this
area in the 1st place. Software is soft and you can go fix software
anywhere in the stack. Listening to vendors and not breaking everyone
is the point-of-view these patches have been coming from. I find it
very hard to have a conversation where this is just forgotten about
and we're working on hypotheticals which seem to be both unwanted and
implausible.

I don't know why people (yourself, Linus) keep wanting to show me the
perf list output. It is arbitrary. I rewrote it and changed the
behavior of all uncore PMUs within it (we didn't used to deduplicate
based on the PMU suffix). It is nice that people think it reads like
some religious text. Why is the formatting different in perf list for
json specified events? Well it is because json events have
descriptions and the events you are showing with a PMU don't have a
description. I think because there is no description, an effort was
made to keep the output compact and put the PMU and event name
together. It wasn't trying to enter some kind of long lasting marriage
that the event name should only ever be used with the PMU. What
happens if an event is both in sysfs and json? Well the sysfs event
will get the description from the json and then I believe it won't
behave as you show. Did the event get broken, as perf list no longer
shows it with a PMU, by having a json description written? I think not
and I think having descriptions with events is a good thing.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-02-05  4:48                               ` Ian Rogers
@ 2025-02-06  5:09                                 ` Namhyung Kim
  2025-02-06  7:44                                   ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Namhyung Kim @ 2025-02-06  5:09 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Atish Kumar Patra, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Beeman Strong, Arnaldo Carvalho de Melo

On Tue, Feb 04, 2025 at 08:48:20PM -0800, Ian Rogers wrote:
> On Tue, Feb 4, 2025 at 5:58 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Mon, Feb 03, 2025 at 04:41:11PM -0800, Ian Rogers wrote:
> > > On Mon, Feb 3, 2025 at 4:15 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > [snip]
> > > > Yep, I agree it's confusing.  So my opinion is to use legacy encoding
> > > > and no default wildcard. :)
> > >
> > > Making it so that all non-legacy, non-core PMU events require a PMU is
> > > a breaking change and a regression for all users, command line event
> > > name suggesting, any tool built off of perf, and so on. Breaking all
> > > perf users and requiring all perf metrics be rewritten is well..
> > > something..
> >
> > Well, I guess the majority of users don't use non-core PMU events.  And
> > we used to have PMU prefix on those events for years so old users should
> > not be affected.  Actually perf list shows them with PMU prefix so I
> > think new users are also expected to use the PMU name.
> >
> >   $ perf list pmu
> >   ...
> >   cstate_pkg/c2-residency/                           [Kernel PMU event]
> >   ...
> >   i915/actual-frequency/                             [Kernel PMU event]
> >   i915/bcs0-busy/                                    [Kernel PMU event]
> >   ...
> >   msr/tsc/                                           [Kernel PMU event]
> >   ...
> >   power/energy-cores/                                [Kernel PMU event]
> >   ...
> >   uncore_clock/clockticks/                           [Kernel PMU event]
> >   uncore_imc_free_running/data_read/                 [Kernel PMU event]
> >   ...
> >
> > The exception is the JSON events like below.
> >
> >   uncore interconnect:
> >     unc_arb_coh_trk_requests.all
> >          [UNC_ARB_COH_TRK_REQUESTS.ALL. Unit: uncore_arb]
> >
> > which I hoped to be 'uncore_arb/unc_arb_coh_trk_requests.all/' or even
> > 'uncore_arb/coh_trk_requests.all/'.  But it would be hard to change the
> > all metric expressions now.  Also users can directly use them as they
> > are listed by `perf list`.  So we need to support that without PMUs.
> 
> So there's nothing wrong with your proposal except it breaks non-core
> events. We can't agree to flip the default on a flag for perf top:
> https://lore.kernel.org/lkml/20240516222159.3710131-1-irogers@google.com/
> to make perf top behave as, you know, top does as it could be an
> option people depend on. A behavior that matters if you do user
> filtering as exited processes stay in perf top (both confusing and
> un-top like). Fwiw, that reminds me of another patch series being
> unreviewed:
> https://lore.kernel.org/lkml/20250111190143.1029906-1-irogers@google.com/

Ok, I'll review that later.  Sorry my review bandwidth is not very high.


> Anyway, the perf top flag is one that no-one knows exists on a command
> most people don't know exists - Julia Evans' zine of course loves it
> and we love Julia's work and the zine.

You mean the -z flag which is documented in the man page and also it the
help message (perf top -h).  Anyone can read the doc can know it's
there.  Of course, people would prefer reading zines than man pages. :)


> So, it would seem to me that
> changing something as fundamental as how all non-core events behave
> would be seen as a regression.

Yep, it'd be a regression.  And that's why we cannot simply change the
behavior.  But I guess not much users would be affected by that since
it's undocumented behavior.


> Imagine the person going to
> perfmon-events.intel.com, finding an event name and expecting to be
> able to use it with perf. Now they need to grub around in perf list to
> locate the PMU. What is appropriate for them to know about how
> suffixes work and show in perf list..? Well that's assuming suffixes
> work in the future as ARM will probably launch an a1000 CPU and the
> PMU will look like a hex suffix and the whole naming convention
> implodes.

Which suffix do you mean?

Anyway, the person looked up the intel webpage would be eager to learn
about performance related things.  Can we also assume if they also want
to learn about the perf tool itself? :)

If it's not the case, we have this:

  $ perf record -e xxx
  event syntax error: 'xxx'
                       \___ Bad event name
  
  Unable to find event on a PMU of 'xxx'
  Run 'perf list' for a list of valid events
  
   Usage: perf record [<options>] [<command>]
      or: perf record [<options>] -- <command> [<options>]
  
      -e, --event <event>   event selector. use 'perf list' to list available events

So it says twice to run 'perf list' to see the events.  Then they can
run either:

  $ perf list | grep xxx

or

  $ perf list xxx

to see the actual name of the event available in the perf tool.

> 
> Even with this what would be the behavior of core events? You want
> legacy events to have priority over sysfs/json when there is no PMU.
> You know, and have stated not caring, RISC-V wants different and that
> it breaks Apple-M's PMUs for a fairly large range of kernel releases
> including 1 LTS kernel - the only reason I'm writing patches in this
> area in the 1st place. Software is soft and you can go fix software
> anywhere in the stack. Listening to vendors and not breaking everyone
> is the point-of-view these patches have been coming from. I find it
> very hard to have a conversation where this is just forgotten about
> and we're working on hypotheticals which seem to be both unwanted and
> implausible.

Sorry I don't want to repeat that too.  Correct me if I'm wrong:

1. RISC-V is working on a solution with the current status and it's not
   absoluted needed to change the current behavior.

2. Apple-M is fixed already.

> 
> I don't know why people (yourself, Linus) keep wanting to show me the
> perf list output. It is arbitrary. I rewrote it and changed the
> behavior of all uncore PMUs within it (we didn't used to deduplicate
> based on the PMU suffix). It is nice that people think it reads like
> some religious text.

I think it's what we want users to know how to use the events.


> Why is the formatting different in perf list for
> json specified events? Well it is because json events have
> descriptions and the events you are showing with a PMU don't have a
> description. I think because there is no description, an effort was
> made to keep the output compact and put the PMU and event name
> together. It wasn't trying to enter some kind of long lasting marriage
> that the event name should only ever be used with the PMU.

I like the description but I don't like the formatting.  I think I
understand why it looks like that but it could be different.  Anyway,
I don't think showing PMU name is related to having descriptions.


> What happens if an event is both in sysfs and json? Well the sysfs event
> will get the description from the json and then I believe it won't
> behave as you show. Did the event get broken, as perf list no longer
> shows it with a PMU, by having a json description written? I think not
> and I think having descriptions with events is a good thing.

That's bad.  Probably we should fix it takes only one of the sources and
change the JSON event not to clash with sysfs.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-02-06  5:09                                 ` Namhyung Kim
@ 2025-02-06  7:44                                   ` Ian Rogers
  2025-02-07  4:44                                     ` Namhyung Kim
  0 siblings, 1 reply; 53+ messages in thread
From: Ian Rogers @ 2025-02-06  7:44 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Atish Kumar Patra, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Beeman Strong, Arnaldo Carvalho de Melo

On Wed, Feb 5, 2025 at 9:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Tue, Feb 04, 2025 at 08:48:20PM -0800, Ian Rogers wrote:
> > On Tue, Feb 4, 2025 at 5:58 PM Namhyung Kim <namhyung@kernel.org> wrote:
> You mean the -z flag which is documented in the man page and also it the
> help message (perf top -h).  Anyone can read the doc can know it's
> there.  Of course, people would prefer reading zines than man pages. :)

I link to the patch. My point is that something as minor as making
"perf top" behave as "top" does was too big a (user command line)
regression to land - I strongly suspect nobody would notice. Your
proposal breaks all non-core events on every perf command that takes
PMU events. It is a bigger change.

> > So, it would seem to me that
> > changing something as fundamental as how all non-core events behave
> > would be seen as a regression.
>
> Yep, it'd be a regression.

Agreed, you are arguing for a regression.

> Which suffix do you mean?

It's off topic. ARM added hex suffixes to PMUs representing physical
memory addresses of memory controllers but then that makes cortex_a72
look like it has a 3 character suffix. So perf assumes hex digits more
than 4 characters long is a hex suffix, which of course it wouldn't be
for a1000 (which is also somewhat close to being an old Acorn
archimedes machine number ;-) ).

> Anyway, the person looked up the intel webpage would be eager to learn
> about performance related things.  Can we also assume if they also want
> to learn about the perf tool itself? :)

I'm not sure how turning data_read into
uncore_imc_free_running/data_read/ is in anyway helping people
understand perf? They want an event name that matches the
documentation, manual, web site. It is what the vendors I've spoken to
want as they use the event names across tools (fwiw oprofile doesn't
even have a notion of a PMU). To my knowledge the PMU names are the
wild west, often illogical and never mentioned in any kind of
documentation. I have a hard time explaining how the suffixes work and
I believe there are more conventions in the works where there can be
multiple what we are currently calling suffixes.

> If it's not the case, we have this:
>
>   $ perf record -e xxx
>   event syntax error: 'xxx'
>                        \___ Bad event name
>
>   Unable to find event on a PMU of 'xxx'
>   Run 'perf list' for a list of valid events
>
>    Usage: perf record [<options>] [<command>]
>       or: perf record [<options>] -- <command> [<options>]
>
>       -e, --event <event>   event selector. use 'perf list' to list available events
>
> So it says twice to run 'perf list' to see the events.  Then they can
> run either:
>
>   $ perf list | grep xxx
>
> or
>
>   $ perf list xxx
>
> to see the actual name of the event available in the perf tool.

Why was adding a PMU to an event name, working around ARM's PMU bug,
such an unsurmontable problem that the original change was reverted?
Because 1 person didn't want to have to write a PMU prefix and
considered it a monumental regression having to do so.

> >
> > Even with this what would be the behavior of core events? You want
> > legacy events to have priority over sysfs/json when there is no PMU.
> > You know, and have stated not caring, RISC-V wants different and that
> > it breaks Apple-M's PMUs for a fairly large range of kernel releases
> > including 1 LTS kernel - the only reason I'm writing patches in this
> > area in the 1st place. Software is soft and you can go fix software
> > anywhere in the stack. Listening to vendors and not breaking everyone
> > is the point-of-view these patches have been coming from. I find it
> > very hard to have a conversation where this is just forgotten about
> > and we're working on hypotheticals which seem to be both unwanted and
> > implausible.
>
> Sorry I don't want to repeat that too.  Correct me if I'm wrong:

You are wrong.

> 1. RISC-V is working on a solution with the current status and it's not
>    absoluted needed to change the current behavior.

They said to you directly it was what they wanted, that's why I
reposted this change and it is, has always been, in the cover letter.
They've then followed up expressing their desire for this behavior but
having to have a plan b as the original change was reverted and you
are blocking this change landing.

> 2. Apple-M is fixed already.

No, James tried to repro the bug on a Juno board, not an Apple M, and
didn't succeed. I don't know what kernel he tried. I was told by Mark
Rutland (at LPC) that the tool fix was absolutely necessary and the
PMU driver wouldn't be fixed, hence the series flipping behavior that
I thought Intel would most likely block and wasn't keen to do in the
1st place (not least wade through all the test behavior changes and
the bug tail). All of this was premised on a threat of reverting all
of the hybrid support so that Apple M could be made to work again, and
I was trying to do a less worse alternative.
https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com

> >
> > I don't know why people (yourself, Linus) keep wanting to show me the
> > perf list output. It is arbitrary. I rewrote it and changed the
> > behavior of all uncore PMUs within it (we didn't used to deduplicate
> > based on the PMU suffix). It is nice that people think it reads like
> > some religious text.
>
> I think it's what we want users to know how to use the events.

I don't understand what you are trying to say. I'm saying the behavior
of perf list in its output is arbitrary. We use the same printing code
for every kind of event. An aesthetic decision to put things on a line
does not imply that it is more valid to use or not use a PMU, it just
happens to be what the tool does. Did I break perf list as if you look
in old perf list you see:
```
$ perf list
List of pre-defined events (to be used in -e or -M):

 duration_time                                      [Tool event]
...
```
while now you see:
```
$ perf list
List of pre-defined events (to be used in -e or -M):
...
tool:
 duration_time
      [Wall clock interval time in nanoseconds. Unit: tool]
...
```
I'm hoping people find it useful to have the unit documented.

> > Why is the formatting different in perf list for
> > json specified events? Well it is because json events have
> > descriptions and the events you are showing with a PMU don't have a
> > description. I think because there is no description, an effort was
> > made to keep the output compact and put the PMU and event name
> > together. It wasn't trying to enter some kind of long lasting marriage
> > that the event name should only ever be used with the PMU.
>
> I like the description but I don't like the formatting.  I think I
> understand why it looks like that but it could be different.  Anyway,
> I don't think showing PMU name is related to having descriptions.

No, it has more to do with how I was feeling about filling in two
string fields called name and alias when rewriting the perf list code.
I added aliases containing the PMU name just to add a little bit more
detail when there seemed to be little documentation with certain
events. I never intended placing the PMU names into any events to be a
commitment that all non-core PMU events would need a PMU prefix and to
break all such people using those events.

> > What happens if an event is both in sysfs and json? Well the sysfs event
> > will get the description from the json and then I believe it won't
> > behave as you show. Did the event get broken, as perf list no longer
> > shows it with a PMU, by having a json description written? I think not
> > and I think having descriptions with events is a good thing.
>
> That's bad.  Probably we should fix it takes only one of the sources and
> change the JSON event not to clash with sysfs.

No, you are talking about breaking everything already, let's not break
it yet further - not least as we lack a reasonable way to test it. I
think if you are serious about having such breaking changes then it is
best you add a new command line option, like with libpfm events.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-02-06  7:44                                   ` Ian Rogers
@ 2025-02-07  4:44                                     ` Namhyung Kim
  2025-02-07  6:15                                       ` Ian Rogers
  0 siblings, 1 reply; 53+ messages in thread
From: Namhyung Kim @ 2025-02-07  4:44 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Atish Kumar Patra, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Beeman Strong, Arnaldo Carvalho de Melo

On Wed, Feb 05, 2025 at 11:44:57PM -0800, Ian Rogers wrote:
> On Wed, Feb 5, 2025 at 9:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Tue, Feb 04, 2025 at 08:48:20PM -0800, Ian Rogers wrote:
> > > On Tue, Feb 4, 2025 at 5:58 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > You mean the -z flag which is documented in the man page and also it the
> > help message (perf top -h).  Anyone can read the doc can know it's
> > there.  Of course, people would prefer reading zines than man pages. :)
> 
> I link to the patch. My point is that something as minor as making
> "perf top" behave as "top" does was too big a (user command line)
> regression to land - I strongly suspect nobody would notice. Your
> proposal breaks all non-core events on every perf command that takes
> PMU events. It is a bigger change.

I also suspect not much people is using non-core events without PMU.
But I won't argue that since I don't have any data.

> 
> > > So, it would seem to me that
> > > changing something as fundamental as how all non-core events behave
> > > would be seen as a regression.
> >
> > Yep, it'd be a regression.
> 
> Agreed, you are arguing for a regression.

Right, but I thought it won't affect many.  But who knows..
And yes, I don't want to create new troubles.

> 
> > Which suffix do you mean?
> 
> It's off topic. ARM added hex suffixes to PMUs representing physical
> memory addresses of memory controllers but then that makes cortex_a72
> look like it has a 3 character suffix. So perf assumes hex digits more
> than 4 characters long is a hex suffix, which of course it wouldn't be
> for a1000 (which is also somewhat close to being an old Acorn
> archimedes machine number ;-) ).

ok.

> 
> > Anyway, the person looked up the intel webpage would be eager to learn
> > about performance related things.  Can we also assume if they also want
> > to learn about the perf tool itself? :)
> 
> I'm not sure how turning data_read into
> uncore_imc_free_running/data_read/ is in anyway helping people
> understand perf? They want an event name that matches the
> documentation, manual, web site. It is what the vendors I've spoken to
> want as they use the event names across tools (fwiw oprofile doesn't
> even have a notion of a PMU). To my knowledge the PMU names are the
> wild west, often illogical and never mentioned in any kind of
> documentation. I have a hard time explaining how the suffixes work and
> I believe there are more conventions in the works where there can be
> multiple what we are currently calling suffixes.

I mean if something doesn't work, they will look 'perf list' and find
the event name it supports.  For me, PMU name gives a tiny bit more
information about the 'data_read' event.  But proper decscription for
the event is preferred.

> 
> > If it's not the case, we have this:
> >
> >   $ perf record -e xxx
> >   event syntax error: 'xxx'
> >                        \___ Bad event name
> >
> >   Unable to find event on a PMU of 'xxx'
> >   Run 'perf list' for a list of valid events
> >
> >    Usage: perf record [<options>] [<command>]
> >       or: perf record [<options>] -- <command> [<options>]
> >
> >       -e, --event <event>   event selector. use 'perf list' to list available events
> >
> > So it says twice to run 'perf list' to see the events.  Then they can
> > run either:
> >
> >   $ perf list | grep xxx
> >
> > or
> >
> >   $ perf list xxx
> >
> > to see the actual name of the event available in the perf tool.
> 
> Why was adding a PMU to an event name, working around ARM's PMU bug,
> such an unsurmontable problem that the original change was reverted?
> Because 1 person didn't want to have to write a PMU prefix and
> considered it a monumental regression having to do so.

Because it's a legacy event 'cycles' and he didn't expect the wildcard
behavior?

> 
> > >
> > > Even with this what would be the behavior of core events? You want
> > > legacy events to have priority over sysfs/json when there is no PMU.
> > > You know, and have stated not caring, RISC-V wants different and that
> > > it breaks Apple-M's PMUs for a fairly large range of kernel releases
> > > including 1 LTS kernel - the only reason I'm writing patches in this
> > > area in the 1st place. Software is soft and you can go fix software
> > > anywhere in the stack. Listening to vendors and not breaking everyone
> > > is the point-of-view these patches have been coming from. I find it
> > > very hard to have a conversation where this is just forgotten about
> > > and we're working on hypotheticals which seem to be both unwanted and
> > > implausible.
> >
> > Sorry I don't want to repeat that too.  Correct me if I'm wrong:
> 
> You are wrong.

Hmm.. ok.

> 
> > 1. RISC-V is working on a solution with the current status and it's not
> >    absoluted needed to change the current behavior.
> 
> They said to you directly it was what they wanted, that's why I
> reposted this change and it is, has always been, in the cover letter.
> They've then followed up expressing their desire for this behavior but
> having to have a plan b as the original change was reverted and you
> are blocking this change landing.

So they have the plan B.  But still prefer overriding legacy with JSON?

> 
> > 2. Apple-M is fixed already.
> 
> No, James tried to repro the bug on a Juno board, not an Apple M, and
> didn't succeed. I don't know what kernel he tried. I was told by Mark
> Rutland (at LPC) that the tool fix was absolutely necessary and the
> PMU driver wouldn't be fixed, hence the series flipping behavior that
> I thought Intel would most likely block and wasn't keen to do in the
> 1st place (not least wade through all the test behavior changes and
> the bug tail). All of this was premised on a threat of reverting all
> of the hybrid support so that Apple M could be made to work again, and
> I was trying to do a less worse alternative.
> https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com

Sorry, it's not clear to me what's the problem exactly.  Can you give me
an example command line?

> 
> > >
> > > I don't know why people (yourself, Linus) keep wanting to show me the
> > > perf list output. It is arbitrary. I rewrote it and changed the
> > > behavior of all uncore PMUs within it (we didn't used to deduplicate
> > > based on the PMU suffix). It is nice that people think it reads like
> > > some religious text.
> >
> > I think it's what we want users to know how to use the events.
> 
> I don't understand what you are trying to say. I'm saying the behavior
> of perf list in its output is arbitrary. We use the same printing code
> for every kind of event. An aesthetic decision to put things on a line
> does not imply that it is more valid to use or not use a PMU, it just
> happens to be what the tool does. Did I break perf list as if you look
> in old perf list you see:
> ```
> $ perf list
> List of pre-defined events (to be used in -e or -M):
> 
>  duration_time                                      [Tool event]
> ...
> ```
> while now you see:
> ```
> $ perf list
> List of pre-defined events (to be used in -e or -M):
> ...
> tool:
>  duration_time
>       [Wall clock interval time in nanoseconds. Unit: tool]
> ...
> ```
> I'm hoping people find it useful to have the unit documented.

The most important information I think is the name of the event
(duration_time).  It'd be appropriate if you could call it
'tool/duration_time/' but I'm not sure if it's acceptable cause
tool events are not real PMU events.  If so, maybe

 duration_time or tool/duration_time/

?

> 
> > > Why is the formatting different in perf list for
> > > json specified events? Well it is because json events have
> > > descriptions and the events you are showing with a PMU don't have a
> > > description. I think because there is no description, an effort was
> > > made to keep the output compact and put the PMU and event name
> > > together. It wasn't trying to enter some kind of long lasting marriage
> > > that the event name should only ever be used with the PMU.
> >
> > I like the description but I don't like the formatting.  I think I
> > understand why it looks like that but it could be different.  Anyway,
> > I don't think showing PMU name is related to having descriptions.
> 
> No, it has more to do with how I was feeling about filling in two
> string fields called name and alias when rewriting the perf list code.
> I added aliases containing the PMU name just to add a little bit more
> detail when there seemed to be little documentation with certain
> events. I never intended placing the PMU names into any events to be a
> commitment that all non-core PMU events would need a PMU prefix and to
> break all such people using those events.

I think people should use a PMU prefix before wildcard is enabled.

> 
> > > What happens if an event is both in sysfs and json? Well the sysfs event
> > > will get the description from the json and then I believe it won't
> > > behave as you show. Did the event get broken, as perf list no longer
> > > shows it with a PMU, by having a json description written? I think not
> > > and I think having descriptions with events is a good thing.
> >
> > That's bad.  Probably we should fix it takes only one of the sources and
> > change the JSON event not to clash with sysfs.
> 
> No, you are talking about breaking everything already, let's not break
> it yet further - not least as we lack a reasonable way to test it. I
> think if you are serious about having such breaking changes then it is
> best you add a new command line option, like with libpfm events.

I don't want to break things.  What's the intended behavior in that case?

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-02-07  4:44                                     ` Namhyung Kim
@ 2025-02-07  6:15                                       ` Ian Rogers
  2025-02-07 17:18                                         ` Atish Kumar Patra
  2025-02-19 23:22                                         ` Namhyung Kim
  0 siblings, 2 replies; 53+ messages in thread
From: Ian Rogers @ 2025-02-07  6:15 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Atish Kumar Patra, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Beeman Strong, Arnaldo Carvalho de Melo

On Thu, Feb 6, 2025 at 8:44 PM Namhyung Kim <namhyung@kernel.org> wrote:
> On Wed, Feb 05, 2025 at 11:44:57PM -0800, Ian Rogers wrote:
> > Why was adding a PMU to an event name, working around ARM's PMU bug,
> > such an unsurmontable problem that the original change was reverted?
> > Because 1 person didn't want to have to write a PMU prefix and
> > considered it a monumental regression having to do so.
>
> Because it's a legacy event 'cycles' and he didn't expect the wildcard
> behavior?

And someone who say with perf v6.14 can type `perf stat -e data_read
...` and then with your proposal now has to type `perf stat -e
uncore_imc_free_running/data_read/ ...` because data_read isn't a core
event, this is expected behavior because the error message mentions
perf list?

> > > 1. RISC-V is working on a solution with the current status and it's not
> > >    absoluted needed to change the current behavior.
> >
> > They said to you directly it was what they wanted, that's why I
> > reposted this change and it is, has always been, in the cover letter.
> > They've then followed up expressing their desire for this behavior but
> > having to have a plan b as the original change was reverted and you
> > are blocking this change landing.
>
> So they have the plan B.  But still prefer overriding legacy with JSON?

Yes.

> > > 2. Apple-M is fixed already.
> >
> > No, James tried to repro the bug on a Juno board, not an Apple M, and
> > didn't succeed. I don't know what kernel he tried. I was told by Mark
> > Rutland (at LPC) that the tool fix was absolutely necessary and the
> > PMU driver wouldn't be fixed, hence the series flipping behavior that
> > I thought Intel would most likely block and wasn't keen to do in the
> > 1st place (not least wade through all the test behavior changes and
> > the bug tail). All of this was premised on a threat of reverting all
> > of the hybrid support so that Apple M could be made to work again, and
> > I was trying to do a less worse alternative.
> > https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com
>
> Sorry, it's not clear to me what's the problem exactly.  Can you give me
> an example command line?

What broke: when arm PMUs were recognized as core and not uncore PMUs,
as part of fixing hybrid, we encoded legacy events on them. So
arm_blah/cycles/ became a type 0 config 0 event, no extended type as
PMU support for that is tested first. A type 0 config 0 event is
broken on the Apple-M PMUs, an event that doesn't count or something
like that. Because they had a sysfs event of arm_blah/cycles/ before
the change the broken legacy encoding on the PMU was never used, the
legacy event broke things.

Because they had this problem the Apple-M users were used to using
arm_blah/cycles/ rather than cycles to avoid legacy events. This
change, not your proposal, is making it so that without a PMU they
also don't get legacy events because in no uncertain terms it was
expressed they weren't going to work. There was a lot of advocating
for removing all hybrid support from the tool.

> > I don't understand what you are trying to say. I'm saying the behavior
> > of perf list in its output is arbitrary. We use the same printing code
> > for every kind of event. An aesthetic decision to put things on a line
> > does not imply that it is more valid to use or not use a PMU, it just
> > happens to be what the tool does. Did I break perf list as if you look
> > in old perf list you see:
> > ```
> > $ perf list
> > List of pre-defined events (to be used in -e or -M):
> >
> >  duration_time                                      [Tool event]
> > ...
> > ```
> > while now you see:
> > ```
> > $ perf list
> > List of pre-defined events (to be used in -e or -M):
> > ...
> > tool:
> >  duration_time
> >       [Wall clock interval time in nanoseconds. Unit: tool]
> > ...
> > ```
> > I'm hoping people find it useful to have the unit documented.
>
> The most important information I think is the name of the event
> (duration_time).  It'd be appropriate if you could call it
> 'tool/duration_time/' but I'm not sure if it's acceptable cause
> tool events are not real PMU events.  If so, maybe
>
>  duration_time or tool/duration_time/
>
> ?

I don't mind showing a PMU and not showing a PMU. duration_time isn't
a core event, does it also get allowed no PMU prefix in your new
scheme? My point isn't to discuss duration_time it is to point out
that `perf list` output isn't sacred and says different things over
time. Those things may or may not include a PMU as there has never
been any rigor, it is a mush of strings that are printed.

In the perf list code we have an event and an alias. In my opinion if
something is an alias of something else then it implies having the
same perf_event_attr encoding. In your proposal this wouldn't be true
for legacy events as it isn't true today. Which has always been my
point about wanting to get this fixed.

> I think people should use a PMU prefix before wildcard is enabled.

I don't understand. You want to break uncore events without a PMU and
disable wild carding, then enable wildcarding again. Like I say I
think it is better you work on this behavior under a non `-e` command
line option.

> > > > What happens if an event is both in sysfs and json? Well the sysfs event
> > > > will get the description from the json and then I believe it won't
> > > > behave as you show. Did the event get broken, as perf list no longer
> > > > shows it with a PMU, by having a json description written? I think not
> > > > and I think having descriptions with events is a good thing.
> > >
> > > That's bad.  Probably we should fix it takes only one of the sources and
> > > change the JSON event not to clash with sysfs.
> >
> > No, you are talking about breaking everything already, let's not break
> > it yet further - not least as we lack a reasonable way to test it. I
> > think if you are serious about having such breaking changes then it is
> > best you add a new command line option, like with libpfm events.
>
> I don't want to break things.  What's the intended behavior in that case?

The behavior is in pmu's update_event, but basically we prefer the
json data over the sysfs data:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n506
This allows the json/tool data to correct the sysfs data - as well as
to add information like descriptions and topic.
But my point isn't that I support your let's have two events instead
of updating events. I have maintained this behavior as it has always
been the behavior and I care about not breaking everything. Something
that I assumed was taken for granted hence making `perf top` behave in
a way where it is showing samples for processes that have terminated
by default.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-02-07  6:15                                       ` Ian Rogers
@ 2025-02-07 17:18                                         ` Atish Kumar Patra
  2025-02-19 23:22                                         ` Namhyung Kim
  1 sibling, 0 replies; 53+ messages in thread
From: Atish Kumar Patra @ 2025-02-07 17:18 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Namhyung Kim, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Beeman Strong, Arnaldo Carvalho de Melo

On Thu, Feb 6, 2025 at 10:15 PM Ian Rogers <irogers@google.com> wrote:
>
> On Thu, Feb 6, 2025 at 8:44 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > On Wed, Feb 05, 2025 at 11:44:57PM -0800, Ian Rogers wrote:
> > > Why was adding a PMU to an event name, working around ARM's PMU bug,
> > > such an unsurmontable problem that the original change was reverted?
> > > Because 1 person didn't want to have to write a PMU prefix and
> > > considered it a monumental regression having to do so.
> >
> > Because it's a legacy event 'cycles' and he didn't expect the wildcard
> > behavior?
>
> And someone who say with perf v6.14 can type `perf stat -e data_read
> ...` and then with your proposal now has to type `perf stat -e
> uncore_imc_free_running/data_read/ ...` because data_read isn't a core
> event, this is expected behavior because the error message mentions
> perf list?
>
> > > > 1. RISC-V is working on a solution with the current status and it's not
> > > >    absoluted needed to change the current behavior.
> > >
> > > They said to you directly it was what they wanted, that's why I
> > > reposted this change and it is, has always been, in the cover letter.
> > > They've then followed up expressing their desire for this behavior but
> > > having to have a plan b as the original change was reverted and you
> > > are blocking this change landing.
> >
> > So they have the plan B.  But still prefer overriding legacy with JSON?
>
> Yes.
>
Even though the driver encoding was envisioned as plan B, I think we
have to keep that irrespective of
legacy overriding with json is available or not due to the reasons I
iterated earlier
(e.g direct legacy event usage and hypervisor) and some renewed
interest in standardizing event encodings in RISC-V[1]

If the overriding legacy with JSON is available, each future vendor
may just provide the json file instead of modifying the driver.
However, it will be a matter of convenience and clutter free future
rather than a necessity at this point.

[1] https://lists.riscv.org/g/sig-perf-analysis/topic/110906276#msg458

> > > > 2. Apple-M is fixed already.
> > >
> > > No, James tried to repro the bug on a Juno board, not an Apple M, and
> > > didn't succeed. I don't know what kernel he tried. I was told by Mark
> > > Rutland (at LPC) that the tool fix was absolutely necessary and the
> > > PMU driver wouldn't be fixed, hence the series flipping behavior that
> > > I thought Intel would most likely block and wasn't keen to do in the
> > > 1st place (not least wade through all the test behavior changes and
> > > the bug tail). All of this was premised on a threat of reverting all
> > > of the hybrid support so that Apple M could be made to work again, and
> > > I was trying to do a less worse alternative.
> > > https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com
> >
> > Sorry, it's not clear to me what's the problem exactly.  Can you give me
> > an example command line?
>
> What broke: when arm PMUs were recognized as core and not uncore PMUs,
> as part of fixing hybrid, we encoded legacy events on them. So
> arm_blah/cycles/ became a type 0 config 0 event, no extended type as
> PMU support for that is tested first. A type 0 config 0 event is
> broken on the Apple-M PMUs, an event that doesn't count or something
> like that. Because they had a sysfs event of arm_blah/cycles/ before
> the change the broken legacy encoding on the PMU was never used, the
> legacy event broke things.
>
> Because they had this problem the Apple-M users were used to using
> arm_blah/cycles/ rather than cycles to avoid legacy events. This
> change, not your proposal, is making it so that without a PMU they
> also don't get legacy events because in no uncertain terms it was
> expressed they weren't going to work. There was a lot of advocating
> for removing all hybrid support from the tool.
>
> > > I don't understand what you are trying to say. I'm saying the behavior
> > > of perf list in its output is arbitrary. We use the same printing code
> > > for every kind of event. An aesthetic decision to put things on a line
> > > does not imply that it is more valid to use or not use a PMU, it just
> > > happens to be what the tool does. Did I break perf list as if you look
> > > in old perf list you see:
> > > ```
> > > $ perf list
> > > List of pre-defined events (to be used in -e or -M):
> > >
> > >  duration_time                                      [Tool event]
> > > ...
> > > ```
> > > while now you see:
> > > ```
> > > $ perf list
> > > List of pre-defined events (to be used in -e or -M):
> > > ...
> > > tool:
> > >  duration_time
> > >       [Wall clock interval time in nanoseconds. Unit: tool]
> > > ...
> > > ```
> > > I'm hoping people find it useful to have the unit documented.
> >
> > The most important information I think is the name of the event
> > (duration_time).  It'd be appropriate if you could call it
> > 'tool/duration_time/' but I'm not sure if it's acceptable cause
> > tool events are not real PMU events.  If so, maybe
> >
> >  duration_time or tool/duration_time/
> >
> > ?
>
> I don't mind showing a PMU and not showing a PMU. duration_time isn't
> a core event, does it also get allowed no PMU prefix in your new
> scheme? My point isn't to discuss duration_time it is to point out
> that `perf list` output isn't sacred and says different things over
> time. Those things may or may not include a PMU as there has never
> been any rigor, it is a mush of strings that are printed.
>
> In the perf list code we have an event and an alias. In my opinion if
> something is an alias of something else then it implies having the
> same perf_event_attr encoding. In your proposal this wouldn't be true
> for legacy events as it isn't true today. Which has always been my
> point about wanting to get this fixed.
>
> > I think people should use a PMU prefix before wildcard is enabled.
>
> I don't understand. You want to break uncore events without a PMU and
> disable wild carding, then enable wildcarding again. Like I say I
> think it is better you work on this behavior under a non `-e` command
> line option.
>
> > > > > What happens if an event is both in sysfs and json? Well the sysfs event
> > > > > will get the description from the json and then I believe it won't
> > > > > behave as you show. Did the event get broken, as perf list no longer
> > > > > shows it with a PMU, by having a json description written? I think not
> > > > > and I think having descriptions with events is a good thing.
> > > >
> > > > That's bad.  Probably we should fix it takes only one of the sources and
> > > > change the JSON event not to clash with sysfs.
> > >
> > > No, you are talking about breaking everything already, let's not break
> > > it yet further - not least as we lack a reasonable way to test it. I
> > > think if you are serious about having such breaking changes then it is
> > > best you add a new command line option, like with libpfm events.
> >
> > I don't want to break things.  What's the intended behavior in that case?
>
> The behavior is in pmu's update_event, but basically we prefer the
> json data over the sysfs data:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n506
> This allows the json/tool data to correct the sysfs data - as well as
> to add information like descriptions and topic.
> But my point isn't that I support your let's have two events instead
> of updating events. I have maintained this behavior as it has always
> been the behavior and I care about not breaking everything. Something
> that I assumed was taken for granted hence making `perf top` behave in
> a way where it is showing samples for processes that have terminated
> by default.
>
> Thanks,
> Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-02-07  6:15                                       ` Ian Rogers
  2025-02-07 17:18                                         ` Atish Kumar Patra
@ 2025-02-19 23:22                                         ` Namhyung Kim
  2025-02-19 23:32                                           ` Ian Rogers
  1 sibling, 1 reply; 53+ messages in thread
From: Namhyung Kim @ 2025-02-19 23:22 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Atish Kumar Patra, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Beeman Strong, Arnaldo Carvalho de Melo

On Thu, Feb 06, 2025 at 10:15:43PM -0800, Ian Rogers wrote:
> On Thu, Feb 6, 2025 at 8:44 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > On Wed, Feb 05, 2025 at 11:44:57PM -0800, Ian Rogers wrote:
> > > Why was adding a PMU to an event name, working around ARM's PMU bug,
> > > such an unsurmontable problem that the original change was reverted?
> > > Because 1 person didn't want to have to write a PMU prefix and
> > > considered it a monumental regression having to do so.
> >
> > Because it's a legacy event 'cycles' and he didn't expect the wildcard
> > behavior?
> 
> And someone who say with perf v6.14 can type `perf stat -e data_read
> ...` and then with your proposal now has to type `perf stat -e
> uncore_imc_free_running/data_read/ ...` because data_read isn't a core
> event, this is expected behavior because the error message mentions
> perf list?

I still think it's better to have PMU with events (and people do that)
but I feel like I have to drop my argument.  It's there for a while and
I don't want to break things..

> 
> > > > 1. RISC-V is working on a solution with the current status and it's not
> > > >    absoluted needed to change the current behavior.
> > >
> > > They said to you directly it was what they wanted, that's why I
> > > reposted this change and it is, has always been, in the cover letter.
> > > They've then followed up expressing their desire for this behavior but
> > > having to have a plan b as the original change was reverted and you
> > > are blocking this change landing.
> >
> > So they have the plan B.  But still prefer overriding legacy with JSON?
> 
> Yes.
> 
> > > > 2. Apple-M is fixed already.
> > >
> > > No, James tried to repro the bug on a Juno board, not an Apple M, and
> > > didn't succeed. I don't know what kernel he tried. I was told by Mark
> > > Rutland (at LPC) that the tool fix was absolutely necessary and the
> > > PMU driver wouldn't be fixed, hence the series flipping behavior that
> > > I thought Intel would most likely block and wasn't keen to do in the
> > > 1st place (not least wade through all the test behavior changes and
> > > the bug tail). All of this was premised on a threat of reverting all
> > > of the hybrid support so that Apple M could be made to work again, and
> > > I was trying to do a less worse alternative.
> > > https://lore.kernel.org/r/20231123042922.834425-1-irogers@google.com
> >
> > Sorry, it's not clear to me what's the problem exactly.  Can you give me
> > an example command line?
> 
> What broke: when arm PMUs were recognized as core and not uncore PMUs,
> as part of fixing hybrid, we encoded legacy events on them. So
> arm_blah/cycles/ became a type 0 config 0 event, no extended type as
> PMU support for that is tested first. A type 0 config 0 event is
> broken on the Apple-M PMUs, an event that doesn't count or something
> like that. Because they had a sysfs event of arm_blah/cycles/ before
> the change the broken legacy encoding on the PMU was never used, the
> legacy event broke things.

I think it's an Apple-M PMU's problem leaving it broken.  And it should
be ok as long as it can use the sysfs encoding with arm_blah/cycles/.

> 
> Because they had this problem the Apple-M users were used to using
> arm_blah/cycles/ rather than cycles to avoid legacy events. This
> change, not your proposal, is making it so that without a PMU they
> also don't get legacy events because in no uncertain terms it was
> expressed they weren't going to work. There was a lot of advocating
> for removing all hybrid support from the tool.

So that's, I believe, the expected behavior.  'cycles' should use the
legacy and arm_blah/cycles/ for sysfs.  Users on the platform knows the
legacy encoding is broken, and they use the sysfs.

But maybe I'm wrong and it's better to make the tools smarter so that it
can just work with the default event (cycles:P).

I'd like to hear others' opinion on this.

> 
> > > I don't understand what you are trying to say. I'm saying the behavior
> > > of perf list in its output is arbitrary. We use the same printing code
> > > for every kind of event. An aesthetic decision to put things on a line
> > > does not imply that it is more valid to use or not use a PMU, it just
> > > happens to be what the tool does. Did I break perf list as if you look
> > > in old perf list you see:
> > > ```
> > > $ perf list
> > > List of pre-defined events (to be used in -e or -M):
> > >
> > >  duration_time                                      [Tool event]
> > > ...
> > > ```
> > > while now you see:
> > > ```
> > > $ perf list
> > > List of pre-defined events (to be used in -e or -M):
> > > ...
> > > tool:
> > >  duration_time
> > >       [Wall clock interval time in nanoseconds. Unit: tool]
> > > ...
> > > ```
> > > I'm hoping people find it useful to have the unit documented.
> >
> > The most important information I think is the name of the event
> > (duration_time).  It'd be appropriate if you could call it
> > 'tool/duration_time/' but I'm not sure if it's acceptable cause
> > tool events are not real PMU events.  If so, maybe
> >
> >  duration_time or tool/duration_time/
> >
> > ?
> 
> I don't mind showing a PMU and not showing a PMU. duration_time isn't
> a core event, does it also get allowed no PMU prefix in your new
> scheme? My point isn't to discuss duration_time it is to point out
> that `perf list` output isn't sacred and says different things over
> time. Those things may or may not include a PMU as there has never
> been any rigor, it is a mush of strings that are printed.

Sorry for the distraction.  I meant users would learn the events from
`perf list` so it should guide them to use the event properly.  I
originally thought having PMU is the right thing, but for practical and
convenience reasons it'd be fine without PMUs.

> 
> In the perf list code we have an event and an alias. In my opinion if
> something is an alias of something else then it implies having the
> same perf_event_attr encoding. In your proposal this wouldn't be true
> for legacy events as it isn't true today. Which has always been my
> point about wanting to get this fixed.

Hmm.. right.  This is a concern.  I don't know.. let's listen to others
first.

> 
> > I think people should use a PMU prefix before wildcard is enabled.
> 
> I don't understand. You want to break uncore events without a PMU and
> disable wild carding, then enable wildcarding again. Like I say I
> think it is better you work on this behavior under a non `-e` command
> line option.

Sorry, I meant people used to add a PMU prefix in the past.  But let's
move on and use wildcard. :)

> 
> > > > > What happens if an event is both in sysfs and json? Well the sysfs event
> > > > > will get the description from the json and then I believe it won't
> > > > > behave as you show. Did the event get broken, as perf list no longer
> > > > > shows it with a PMU, by having a json description written? I think not
> > > > > and I think having descriptions with events is a good thing.
> > > >
> > > > That's bad.  Probably we should fix it takes only one of the sources and
> > > > change the JSON event not to clash with sysfs.
> > >
> > > No, you are talking about breaking everything already, let's not break
> > > it yet further - not least as we lack a reasonable way to test it. I
> > > think if you are serious about having such breaking changes then it is
> > > best you add a new command line option, like with libpfm events.
> >
> > I don't want to break things.  What's the intended behavior in that case?
> 
> The behavior is in pmu's update_event, but basically we prefer the
> json data over the sysfs data:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n506
> This allows the json/tool data to correct the sysfs data - as well as
> to add information like descriptions and topic.
> But my point isn't that I support your let's have two events instead
> of updating events. I have maintained this behavior as it has always
> been the behavior and I care about not breaking everything. Something
> that I assumed was taken for granted hence making `perf top` behave in
> a way where it is showing samples for processes that have terminated
> by default.

I'm ok with preferring JSON over sysfs.  In general I think they don't
have the same event names unless you want to override one.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-02-19 23:22                                         ` Namhyung Kim
@ 2025-02-19 23:32                                           ` Ian Rogers
  0 siblings, 0 replies; 53+ messages in thread
From: Ian Rogers @ 2025-02-19 23:32 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Atish Kumar Patra, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, bpf, Aditya Bodkhe, Leo Yan,
	Beeman Strong, Arnaldo Carvalho de Melo

On Wed, Feb 19, 2025 at 3:22 PM Namhyung Kim <namhyung@kernel.org> wrote:
> I'm ok with preferring JSON over sysfs.  In general I think they don't
> have the same event names unless you want to override one.

Thanks Namhyung. I'm not clear, what's the plan for this patch series?
I know the clean up parts of it were applied. Are there any actions on
me? Are there people to solicit feedback from?

Thanks,
Ian

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2025-02-19 23:32 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-09 22:21 [PATCH v5 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
2025-01-09 22:21 ` [PATCH v5 1/4] perf evsel: Add pmu_name helper Ian Rogers
2025-01-09 22:21 ` [PATCH v5 2/4] perf stat: Fix find_stat for mixed legacy/non-legacy events Ian Rogers
2025-01-09 22:21 ` [PATCH v5 3/4] perf record: Skip don't fail for events that don't open Ian Rogers
2025-01-10  1:25   ` Namhyung Kim
2025-01-10  4:44     ` Ian Rogers
2025-01-10 18:55       ` Namhyung Kim
2025-01-10 19:18         ` Ian Rogers
2025-01-14 19:29           ` Namhyung Kim
2025-01-14 23:55             ` Ian Rogers
2025-01-15 22:14               ` Namhyung Kim
2025-01-15 22:40                 ` Ian Rogers
2025-01-10 14:18     ` Arnaldo Carvalho de Melo
2025-01-10 16:42       ` Ian Rogers
2025-01-10 19:26         ` Namhyung Kim
2025-01-10 21:33           ` Ian Rogers
2025-01-13 20:51             ` Namhyung Kim
2025-01-13 23:04               ` Ian Rogers
2025-01-15 17:31                 ` Namhyung Kim
2025-01-15 17:56                   ` Ian Rogers
2025-01-29 21:24                     ` Namhyung Kim
2025-01-09 22:21 ` [PATCH v5 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy" Ian Rogers
2025-01-10 19:40   ` Namhyung Kim
2025-01-10 19:52     ` Atish Kumar Patra
2025-01-13 20:56       ` Namhyung Kim
2025-01-10 22:15     ` Ian Rogers
2025-01-13 22:01       ` Namhyung Kim
2025-01-13 22:51         ` Ian Rogers
2025-01-14  2:31           ` Ian Rogers
2025-01-15 17:59             ` Namhyung Kim
2025-01-15 21:20               ` Ian Rogers
2025-01-29 21:55                 ` Namhyung Kim
2025-01-30  1:16                   ` Ian Rogers
2025-01-30  5:16                     ` Namhyung Kim
2025-01-30  6:03                       ` Ian Rogers
2025-01-31 22:28                         ` Namhyung Kim
2025-01-30  6:12                   ` Atish Kumar Patra
2025-01-31 22:42                     ` Namhyung Kim
2025-02-01  8:45                       ` Ian Rogers
2025-02-04  0:15                         ` Namhyung Kim
2025-02-04  0:41                           ` Ian Rogers
2025-02-05  1:57                             ` Namhyung Kim
2025-02-05  4:48                               ` Ian Rogers
2025-02-06  5:09                                 ` Namhyung Kim
2025-02-06  7:44                                   ` Ian Rogers
2025-02-07  4:44                                     ` Namhyung Kim
2025-02-07  6:15                                       ` Ian Rogers
2025-02-07 17:18                                         ` Atish Kumar Patra
2025-02-19 23:22                                         ` Namhyung Kim
2025-02-19 23:32                                           ` Ian Rogers
2025-02-03  5:47                       ` Atish Kumar Patra
2025-01-29 22:05 ` [PATCH v5 0/4] Prefer sysfs/JSON events also when no PMU is provided Namhyung Kim
2025-01-30 17:46 ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).