linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided
@ 2025-04-16  4:51 Ian Rogers
  2025-04-16  4:51 ` [PATCH v8 1/4] perf record: Skip don't fail for events that don't open Ian Rogers
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Ian Rogers @ 2025-04-16  4:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, Aditya Bodkhe, Leo Yan,
	Thomas Falcon, Atish Patra

At the RISC-V summit the topic of avoiding event data being in the
RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
events being the priority when no PMU is provided so that legacy
events maybe supported via json. Originally Mark Rutland also
expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
M? processors, but James Clark more recently tested this and believes
the driver issues there may not have existed or have been resolved. In
any case, it is inconsistent that with a PMU event names avoid legacy
encodings, but when wildcarding PMUs (ie without a PMU with the event
name) the legacy encodings have priority.

The situation is further inconsistent as legacy events are case
sensitive, so on Intel that provides a sysfs instructions event, the
instructions event without a PMU and lowercase is legacy while with
uppercase letters it matches with sysfs which is case insensitive. Are
there legacy events with upper case letters? Yes there are, the cache
ones mix case freely:

L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node

meaning LLC that means L2 (which is wrong) both match as part of a
legacy cache name but llc and l2 would only match sysfs/json
events. The whole thing just points at the ridiculous nature of legacy
events and why we'd want them to be preffered I don't know. Why should
case of a letter or having a PMU prefix impact the encoding in the
perf_event_attr?

The patch doing this work was reverted in a v6.10 release candidate
as, even though the patch was posted for weeks and had been on
linux-next for weeks without issue, Linus was in the habit of using
explicit legacy events with unsupported precision options on his
Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
where ARM decided to call the events bus_cycles and cycles, the latter
being also a legacy event name. ARM haven't renamed the cycles event
to a more consistent cpu_cycles and avoided the problem. With these
changes the problematic event will now be skipped, a large warning
produced, and perf record will continue for the other PMU events. This
solution was proposed by Arnaldo.

v8: Change removing of failed to open events that are tracking so that
    the tracking moves to the next event. Make software events able to
    specified with a PMU. Change the perf_api_probe to not load all
    PMUs through scanning, specify a PMU when parsing events.

v7: Expand cover letter, fix a missed core_ok check in the v6
    rebase. Note, as with v6 there is an alternate series that
    prioritizes legacy events but that is silly and I'd prefer we
    didn't do it.

v6: Rebase of v5 (dropping already merged patches):
    https://lore.kernel.org/lkml/20250109222109.567031-1-irogers@google.com/
    that unusually had an RFC posted for it:
    https://lore.kernel.org/lkml/Z7Z5kv75BMML2A1q@google.com/
    Note, this patch conflicts/contradicts:
    https://lore.kernel.org/lkml/20250312211623.2495798-1-irogers@google.com/
    that I posted so that we could either consistently prioritize
    sysfs/json (these patches) or legacy events (the other
    patches). That lack of event printing and encoding inconsistency
    is most prominent in the encoding of events like "instructions"
    which on hybrid are reported as "cpu_core/instructions/" but
    "instructions" before these patches gets a legacy encoding while
    "cpu_core/instructions/" gets a sysfs/json encoding. These patches
    make "instructions" always get a sysfs/json encoding while the
    alternate patches make it always get a legacy encoding.

v5: Follow Namhyung's suggestion and ignore the case where command
    line dummy events fail to open alongside other events that all
    fail to open. Note, the Tested-by tags are left on the series as
    v4 and v5 were changing an error case that doesn't occur in
    testing but was manually tested by myself.

v4: Rework the no events opening change from v3 to make it handle
    multiple dummy events. Sadly an evlist isn't empty if it just
    contains dummy events as the dummy event may be used with "perf
    record -e dummy .." as a way to determine whether permission
    issues exist. Other software events like cpu-clock would suffice
    for this, but the using dummy genie has left the bottle.

    Another problem is that we appear to have an excessive number of
    dummy events added, for example, we can likely avoid a dummy event
    and add sideband data to the original event. For auxtrace more
    dummy events may be opened too. Anyway, this has led to the
    approach taken in patch 3 where the number of dummy parsed events
    is computed. If the number of removed/failing-to-open non-dummy
    events matches the number of non-dummy events then we want to
    fail, but only if there are no parsed dummy events or if there was
    one then it must have opened. The math here is hard to read, but
    passes my manual testing.

v3: Make no events opening for perf record a failure as suggested by
    James Clark and Aditya Bodkhe <Aditya.Bodkhe1@ibm.com>. Also,
    rebase.

v2: Rebase and add tested-by tags from James Clark, Leo Yan and Atish
    Patra who have tested on RISC-V and ARM CPUs, including the
    problem case from before.

Ian Rogers (4):
  perf record: Skip don't fail for events that don't open
  perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
    legacy"
  perf parse-events: Allow software events to be terms
  perf perf_api_probe: Avoid scanning all PMUs, try software PMU first

 tools/perf/builtin-record.c      | 63 +++++++++++++++++++---
 tools/perf/util/parse-events.c   | 47 +++++++++++++----
 tools/perf/util/parse-events.h   |  3 +-
 tools/perf/util/parse-events.l   | 90 ++++++++++++++++++--------------
 tools/perf/util/parse-events.y   | 85 ++++++++++++++++++++++--------
 tools/perf/util/perf_api_probe.c | 27 +++++++---
 tools/perf/util/pmu.c            |  9 ++--
 7 files changed, 235 insertions(+), 89 deletions(-)

-- 
2.49.0.777.g153de2bbd5-goog


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v8 1/4] perf record: Skip don't fail for events that don't open
  2025-04-16  4:51 [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
@ 2025-04-16  4:51 ` Ian Rogers
  2025-04-16  4:51 ` [PATCH v8 2/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy" Ian Rogers
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Ian Rogers @ 2025-04-16  4:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, Aditya Bodkhe, Leo Yan,
	Thomas Falcon, Atish Patra

Whilst for many tools it is an expected behavior that failure to open
a perf event is a failure, ARM decided to name PMU events the same as
legacy events and then failed to rename such events on a server uncore
SLC PMU. As perf's default behavior when no PMU is specified is to
open the event on all PMUs that advertise/"have" the event, this
yielded failures when trying to make the priority of legacy and
sysfs/json events uniform - something requested by RISC-V and ARM. A
legacy event user on ARM hardware may find their event opened on an
uncore PMU which for perf record will fail. Arnaldo suggested skipping
such events which this patch implements. Rather than have the skipping
conditional on running on ARM, the skipping is done on all
architectures as such a fundamental behavioral difference could lead
to problems with tools built/depending on perf.

An example of perf record failing to open events on x86 is:
```
$ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
"dmesg | grep -i perf" may provide additional information.

Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
"dmesg | grep -i perf" may provide additional information.

Error:
Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
The LLC-prefetch-read event is not supported.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]

$ perf report --stats
Aggregated stats:
               TOTAL events:      17255
                MMAP events:        284  ( 1.6%)
                COMM events:       1961  (11.4%)
                EXIT events:          1  ( 0.0%)
                FORK events:       1960  (11.4%)
              SAMPLE events:         87  ( 0.5%)
               MMAP2 events:      12836  (74.4%)
             KSYMBOL events:         83  ( 0.5%)
           BPF_EVENT events:         36  ( 0.2%)
      FINISHED_ROUND events:          2  ( 0.0%)
            ID_INDEX events:          1  ( 0.0%)
          THREAD_MAP events:          1  ( 0.0%)
             CPU_MAP events:          1  ( 0.0%)
           TIME_CONV events:          1  ( 0.0%)
       FINISHED_INIT events:          1  ( 0.0%)
cycles stats:
              SAMPLE events:         87
```

If all events fail to open then the perf record will fail:
```
$ perf record -e LLC-prefetch-read true
Error:
Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
The LLC-prefetch-read event is not supported.
Error:
Failure to open any events for recording
```

As an evlist may have dummy events that open when all command line
events fail we ignore dummy events when detecting if at least some
events open. This still permits the dummy event on its own to be used
as a permission check:
```
$ perf record -e dummy true
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.046 MB perf.data ]
```
but allows failure when a dummy event is implicilty inserted or when
there are insufficient permissions to open it:
```
$ perf record -e LLC-prefetch-read -a true
Error:
Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
The LLC-prefetch-read event is not supported.
Error:
Failure to open any events for recording
```

As the first parsed event in an evlist is marked as tracking, removing
this event can remove tracking from the evlist, removing mmap events
and breaking symbolization. To avoid this, if a tracking event is
removed then the next event has tracking added.

The issue with legacy events is that on RISC-V they want the driver to
not have mappings from legacy to non-legacy config encodings for each
vendor/model due to size, complexity and difficulty to update. It was
reported that on ARM Apple-M? CPUs the legacy mapping in the driver
was broken and the sysfs/json events should always take precedent,
however, it isn't clear this is still the case. It is the case that
without working around this issue a legacy event like cycles without a
PMU can encode differently than when specified with a PMU - the
non-PMU version favoring legacy encodings, the PMU one avoiding legacy
encodings. Legacy events are also case sensitive while sysfs/json
events are not.

The patch removes events and then adjusts the idx value for each
evsel. This is done so that the dense xyarrays used for file
descriptors, etc. don't contain broken entries. As event opening
happens relatively late in the record process, use of the idx value
before the open will have become corrupted, so it is expected there
are latent bugs hidden behind this change - the change is best
effort. As the only vendor that has broken event names is ARM, this
will principally effect ARM users. They will also experience warning
messages like those above because of the uncore PMU advertising legacy
event names.

Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Prior versions without adding the tracking data was:
Tested-by: James Clark <james.clark@linaro.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
---
 tools/perf/builtin-record.c | 63 +++++++++++++++++++++++++++++++++----
 1 file changed, 57 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index ba20bf7c011d..6f59a419ec5d 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -962,7 +962,6 @@ static int record__config_tracking_events(struct record *rec)
 	 */
 	if (opts->target.initial_delay || target__has_cpu(&opts->target) ||
 	    perf_pmus__num_core_pmus() > 1) {
-
 		/*
 		 * User space tasks can migrate between CPUs, so when tracing
 		 * selected CPUs, sideband for all CPUs is still needed.
@@ -1367,8 +1366,23 @@ static int record__open(struct record *rec)
 	struct perf_session *session = rec->session;
 	struct record_opts *opts = &rec->opts;
 	int rc = 0;
+	bool skipped = false;
+	bool removed_tracking = false;
 
 	evlist__for_each_entry(evlist, pos) {
+		if (removed_tracking) {
+			/*
+			 * Normally the head of the list has tracking enabled
+			 * for sideband data like mmaps. If this event is
+			 * removed, make sure to add tracking to the next
+			 * processed event.
+			 */
+			if (!pos->tracking) {
+				pos->tracking = true;
+				evsel__config(pos, opts, &callchain_param);
+			}
+			removed_tracking = false;
+		}
 try_again:
 		if (evsel__open(pos, pos->core.cpus, pos->core.threads) < 0) {
 			if (evsel__fallback(pos, &opts->target, errno, msg, sizeof(msg))) {
@@ -1382,15 +1396,52 @@ static int record__open(struct record *rec)
 			        pos = evlist__reset_weak_group(evlist, pos, true);
 				goto try_again;
 			}
-			rc = -errno;
 			evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
-			ui__error("%s\n", msg);
-			goto out;
+			ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
+				  evsel__name(pos), evsel__pmu_name(pos), msg);
+			if (pos->tracking)
+				removed_tracking = true;
+			pos->skippable = true;
+			skipped = true;
+		} else {
+			pos->supported = true;
 		}
-
-		pos->supported = true;
 	}
 
+	if (skipped) {
+		struct evsel *tmp;
+		int idx = 0;
+		bool evlist_empty = true;
+
+		/* Remove evsels that failed to open and update indices. */
+		evlist__for_each_entry_safe(evlist, tmp, pos) {
+			if (pos->skippable) {
+				evlist__remove(evlist, pos);
+				continue;
+			}
+
+			/*
+			 * Note, dummy events may be command line parsed or
+			 * added by the tool. We care about supporting `perf
+			 * record -e dummy` which may be used as a permission
+			 * check. Dummy events that are added to the command
+			 * line and opened along with other events that fail,
+			 * will still fail as if the dummy events were tool
+			 * added events for the sake of code simplicity.
+			 */
+			if (!evsel__is_dummy_event(pos))
+				evlist_empty = false;
+		}
+		evlist__for_each_entry(evlist, pos) {
+			pos->core.idx = idx++;
+		}
+		/* If list is empty then fail. */
+		if (evlist_empty) {
+			ui__error("Failure to open any events for recording.\n");
+			rc = -1;
+			goto out;
+		}
+	}
 	if (symbol_conf.kptr_restrict && !evlist__exclude_kernel(evlist)) {
 		pr_warning(
 "WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,\n"
-- 
2.49.0.777.g153de2bbd5-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 2/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2025-04-16  4:51 [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
  2025-04-16  4:51 ` [PATCH v8 1/4] perf record: Skip don't fail for events that don't open Ian Rogers
@ 2025-04-16  4:51 ` Ian Rogers
  2025-04-16  4:51 ` [PATCH v8 3/4] perf parse-events: Allow software events to be terms Ian Rogers
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 14+ messages in thread
From: Ian Rogers @ 2025-04-16  4:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, Aditya Bodkhe, Leo Yan,
	Thomas Falcon, Atish Patra
  Cc: Beeman Strong, Arnaldo Carvalho de Melo

Originally posted and merged from:
https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
This reverts commit 4f1b067359ac8364cdb7f9fda41085fa85789d0f although
the patch is now smaller due to related fixes being applied in commit
22a4db3c3603 ("perf evsel: Add alternate_hw_config and use in
evsel__match").
The original commit message was:

It was requested that RISC-V be able to add events to the perf tool so
the PMU driver didn't need to map legacy events to config encodings:
https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/

This change makes the priority of events specified without a PMU the
same as those specified with a PMU, namely sysfs and JSON events are
checked first before using the legacy encoding.

The hw_term is made more generic as a hardware_event that encodes a
pair of string and int value, allowing parse_events_multi_pmu_add to
fall back on a known encoding when the sysfs/JSON adding fails for
core events. As this covers PE_VALUE_SYM_HW, that token is removed and
related code simplified.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Tested-by: James Clark <james.clark@linaro.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/parse-events.c | 27 +++++++++---
 tools/perf/util/parse-events.l | 76 +++++++++++++++++-----------------
 tools/perf/util/parse-events.y | 60 ++++++++++++++++++---------
 3 files changed, 99 insertions(+), 64 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 5152fd5a6ead..f4236570aa4c 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1551,8 +1551,8 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
 	struct list_head *list = NULL;
 	struct perf_pmu *pmu = NULL;
 	YYLTYPE *loc = loc_;
-	int ok = 0;
-	const char *config;
+	int ok = 0, core_ok = 0;
+	const char *tmp;
 	struct parse_events_terms parsed_terms;
 
 	*listp = NULL;
@@ -1565,15 +1565,15 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
 			return ret;
 	}
 
-	config = strdup(event_name);
-	if (!config)
+	tmp = strdup(event_name);
+	if (!tmp)
 		goto out_err;
 
 	if (parse_events_term__num(&term,
 				   PARSE_EVENTS__TERM_TYPE_USER,
-				   config, /*num=*/1, /*novalue=*/true,
+				   tmp, /*num=*/1, /*novalue=*/true,
 				   loc, /*loc_val=*/NULL) < 0) {
-		zfree(&config);
+		zfree(&tmp);
 		goto out_err;
 	}
 	list_add_tail(&term->list, &parsed_terms.terms);
@@ -1604,6 +1604,8 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
 			pr_debug("%s -> %s/%s/\n", event_name, pmu->name, sb.buf);
 			strbuf_release(&sb);
 			ok++;
+			if (pmu->is_core)
+				core_ok++;
 		}
 	}
 
@@ -1617,9 +1619,22 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
 			pr_debug("%s -> fake/%s/\n", event_name, sb.buf);
 			strbuf_release(&sb);
 			ok++;
+			core_ok++;
 		}
 	}
 
+	if (hw_config != PERF_COUNT_HW_MAX && !core_ok) {
+		/*
+		 * The event wasn't found on core PMUs but it has a hardware
+		 * config version to try.
+		 */
+		if (!parse_events_add_numeric(parse_state, list,
+						PERF_TYPE_HARDWARE, hw_config,
+						const_parsed_terms,
+						/*wildcard=*/true))
+			ok++;
+	}
+
 out_err:
 	parse_events_terms__exit(&parsed_terms);
 	if (ok)
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 7ed86e3e34e3..324b7dc8a0d3 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -117,12 +117,12 @@ do {								\
 	yyless(0);						\
 } while (0)
 
-static int sym(yyscan_t scanner, int type, int config)
+static int sym(yyscan_t scanner, int config)
 {
 	YYSTYPE *yylval = parse_events_get_lval(scanner);
 
-	yylval->num = (type << 16) + config;
-	return type == PERF_TYPE_HARDWARE ? PE_VALUE_SYM_HW : PE_VALUE_SYM_SW;
+	yylval->num = config;
+	return PE_VALUE_SYM_SW;
 }
 
 static int term(yyscan_t scanner, enum parse_events__term_type type)
@@ -133,13 +133,13 @@ static int term(yyscan_t scanner, enum parse_events__term_type type)
 	return PE_TERM;
 }
 
-static int hw_term(yyscan_t scanner, int config)
+static int hw(yyscan_t scanner, int config)
 {
 	YYSTYPE *yylval = parse_events_get_lval(scanner);
 	char *text = parse_events_get_text(scanner);
 
-	yylval->hardware_term.str = strdup(text);
-	yylval->hardware_term.num = PERF_TYPE_HARDWARE + config;
+	yylval->hardware_event.str = strdup(text);
+	yylval->hardware_event.num = config;
 	return PE_TERM_HW;
 }
 
@@ -335,16 +335,16 @@ aux-output		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); }
 aux-action		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_ACTION); }
 aux-sample-size		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
 metric-id		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
-cpu-cycles|cycles				{ return hw_term(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
-stalled-cycles-frontend|idle-cycles-frontend	{ return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
-stalled-cycles-backend|idle-cycles-backend	{ return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
-instructions					{ return hw_term(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
-cache-references				{ return hw_term(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
-cache-misses					{ return hw_term(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
-branch-instructions|branches			{ return hw_term(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
-branch-misses					{ return hw_term(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
-bus-cycles					{ return hw_term(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
-ref-cycles					{ return hw_term(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
+cpu-cycles|cycles				{ return hw(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
+stalled-cycles-frontend|idle-cycles-frontend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
+stalled-cycles-backend|idle-cycles-backend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
+instructions					{ return hw(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
+cache-references				{ return hw(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
+cache-misses					{ return hw(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
+branch-instructions|branches			{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
+branch-misses					{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
+bus-cycles					{ return hw(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
+ref-cycles					{ return hw(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
 r{num_raw_hex}		{ return str(yyscanner, PE_RAW); }
 r0x{num_raw_hex}	{ return str(yyscanner, PE_RAW); }
 ,			{ return ','; }
@@ -390,28 +390,28 @@ r0x{num_raw_hex}	{ return str(yyscanner, PE_RAW); }
 <<EOF>>			{ BEGIN(INITIAL); }
 }
 
-cpu-cycles|cycles				{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CPU_CYCLES); }
-stalled-cycles-frontend|idle-cycles-frontend	{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
-stalled-cycles-backend|idle-cycles-backend	{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
-instructions					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS); }
-cache-references				{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_REFERENCES); }
-cache-misses					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_MISSES); }
-branch-instructions|branches			{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
-branch-misses					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_MISSES); }
-bus-cycles					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BUS_CYCLES); }
-ref-cycles					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_REF_CPU_CYCLES); }
-cpu-clock					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_CLOCK); }
-task-clock					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_TASK_CLOCK); }
-page-faults|faults				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS); }
-minor-faults					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MIN); }
-major-faults					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MAJ); }
-context-switches|cs				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CONTEXT_SWITCHES); }
-cpu-migrations|migrations			{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_MIGRATIONS); }
-alignment-faults				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
-emulation-faults				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_EMULATION_FAULTS); }
-dummy						{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_DUMMY); }
-bpf-output					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_BPF_OUTPUT); }
-cgroup-switches					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CGROUP_SWITCHES); }
+cpu-cycles|cycles				{ return hw(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
+stalled-cycles-frontend|idle-cycles-frontend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
+stalled-cycles-backend|idle-cycles-backend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
+instructions					{ return hw(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
+cache-references				{ return hw(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
+cache-misses					{ return hw(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
+branch-instructions|branches			{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
+branch-misses					{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
+bus-cycles					{ return hw(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
+ref-cycles					{ return hw(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
+cpu-clock					{ return sym(yyscanner, PERF_COUNT_SW_CPU_CLOCK); }
+task-clock					{ return sym(yyscanner, PERF_COUNT_SW_TASK_CLOCK); }
+page-faults|faults				{ return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS); }
+minor-faults					{ return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MIN); }
+major-faults					{ return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MAJ); }
+context-switches|cs				{ return sym(yyscanner, PERF_COUNT_SW_CONTEXT_SWITCHES); }
+cpu-migrations|migrations			{ return sym(yyscanner, PERF_COUNT_SW_CPU_MIGRATIONS); }
+alignment-faults				{ return sym(yyscanner, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
+emulation-faults				{ return sym(yyscanner, PERF_COUNT_SW_EMULATION_FAULTS); }
+dummy						{ return sym(yyscanner, PERF_COUNT_SW_DUMMY); }
+bpf-output					{ return sym(yyscanner, PERF_COUNT_SW_BPF_OUTPUT); }
+cgroup-switches					{ return sym(yyscanner, PERF_COUNT_SW_CGROUP_SWITCHES); }
 
 {lc_type}			{ return str(yyscanner, PE_LEGACY_CACHE); }
 {lc_type}-{lc_op_result}	{ return str(yyscanner, PE_LEGACY_CACHE); }
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index f888cbb076d6..d2ef1890007e 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -55,7 +55,7 @@ static void free_list_evsel(struct list_head* list_evsel)
 %}
 
 %token PE_START_EVENTS PE_START_TERMS
-%token PE_VALUE PE_VALUE_SYM_HW PE_VALUE_SYM_SW PE_TERM
+%token PE_VALUE PE_VALUE_SYM_SW PE_TERM
 %token PE_EVENT_NAME
 %token PE_RAW PE_NAME
 %token PE_MODIFIER_EVENT PE_MODIFIER_BP PE_BP_COLON PE_BP_SLASH
@@ -65,11 +65,9 @@ static void free_list_evsel(struct list_head* list_evsel)
 %token PE_DRV_CFG_TERM
 %token PE_TERM_HW
 %type <num> PE_VALUE
-%type <num> PE_VALUE_SYM_HW
 %type <num> PE_VALUE_SYM_SW
 %type <mod> PE_MODIFIER_EVENT
 %type <term_type> PE_TERM
-%type <num> value_sym
 %type <str> PE_RAW
 %type <str> PE_NAME
 %type <str> PE_LEGACY_CACHE
@@ -85,6 +83,7 @@ static void free_list_evsel(struct list_head* list_evsel)
 %type <list_terms> opt_pmu_config
 %destructor { parse_events_terms__delete ($$); } <list_terms>
 %type <list_evsel> event_pmu
+%type <list_evsel> event_legacy_hardware
 %type <list_evsel> event_legacy_symbol
 %type <list_evsel> event_legacy_cache
 %type <list_evsel> event_legacy_mem
@@ -102,8 +101,8 @@ static void free_list_evsel(struct list_head* list_evsel)
 %destructor { free_list_evsel ($$); } <list_evsel>
 %type <tracepoint_name> tracepoint_name
 %destructor { free ($$.sys); free ($$.event); } <tracepoint_name>
-%type <hardware_term> PE_TERM_HW
-%destructor { free ($$.str); } <hardware_term>
+%type <hardware_event> PE_TERM_HW
+%destructor { free ($$.str); } <hardware_event>
 
 %union
 {
@@ -118,10 +117,10 @@ static void free_list_evsel(struct list_head* list_evsel)
 		char *sys;
 		char *event;
 	} tracepoint_name;
-	struct hardware_term {
+	struct hardware_event {
 		char *str;
 		u64 num;
-	} hardware_term;
+	} hardware_event;
 }
 %%
 
@@ -264,6 +263,7 @@ PE_EVENT_NAME event_def
 event_def
 
 event_def: event_pmu |
+	   event_legacy_hardware |
 	   event_legacy_symbol |
 	   event_legacy_cache sep_dc |
 	   event_legacy_mem sep_dc |
@@ -306,24 +306,45 @@ PE_NAME sep_dc
 	$$ = list;
 }
 
-value_sym:
-PE_VALUE_SYM_HW
+event_legacy_hardware:
+PE_TERM_HW opt_pmu_config
+{
+	/* List of created evsels. */
+	struct list_head *list = NULL;
+	int err = parse_events_multi_pmu_add(_parse_state, $1.str, $1.num, $2, &list, &@1);
+
+	free($1.str);
+	parse_events_terms__delete($2);
+	if (err)
+		PE_ABORT(err);
+
+	$$ = list;
+}
 |
-PE_VALUE_SYM_SW
+PE_TERM_HW sep_dc
+{
+	struct list_head *list;
+	int err;
+
+	err = parse_events_multi_pmu_add(_parse_state, $1.str, $1.num, NULL, &list, &@1);
+	free($1.str);
+	if (err)
+		PE_ABORT(err);
+	$$ = list;
+}
 
 event_legacy_symbol:
-value_sym '/' event_config '/'
+PE_VALUE_SYM_SW '/' event_config '/'
 {
 	struct list_head *list;
-	int type = $1 >> 16;
-	int config = $1 & 255;
 	int err;
-	bool wildcard = (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE);
 
 	list = alloc_list();
 	if (!list)
 		YYNOMEM;
-	err = parse_events_add_numeric(_parse_state, list, type, config, $3, wildcard);
+	err = parse_events_add_numeric(_parse_state, list,
+				/*type=*/PERF_TYPE_SOFTWARE, /*config=*/$1,
+				$3, /*wildcard=*/false);
 	parse_events_terms__delete($3);
 	if (err) {
 		free_list_evsel(list);
@@ -332,18 +353,17 @@ value_sym '/' event_config '/'
 	$$ = list;
 }
 |
-value_sym sep_slash_slash_dc
+PE_VALUE_SYM_SW sep_slash_slash_dc
 {
 	struct list_head *list;
-	int type = $1 >> 16;
-	int config = $1 & 255;
-	bool wildcard = (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE);
 	int err;
 
 	list = alloc_list();
 	if (!list)
 		YYNOMEM;
-	err = parse_events_add_numeric(_parse_state, list, type, config, /*head_config=*/NULL, wildcard);
+	err = parse_events_add_numeric(_parse_state, list,
+				/*type=*/PERF_TYPE_SOFTWARE, /*config=*/$1,
+				/*head_config=*/NULL, /*wildcard=*/false);
 	if (err)
 		PE_ABORT(err);
 	$$ = list;
-- 
2.49.0.777.g153de2bbd5-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 3/4] perf parse-events: Allow software events to be terms
  2025-04-16  4:51 [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
  2025-04-16  4:51 ` [PATCH v8 1/4] perf record: Skip don't fail for events that don't open Ian Rogers
  2025-04-16  4:51 ` [PATCH v8 2/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy" Ian Rogers
@ 2025-04-16  4:51 ` Ian Rogers
  2025-04-16  4:51 ` [PATCH v8 4/4] perf perf_api_probe: Avoid scanning all PMUs, try software PMU first Ian Rogers
  2025-05-27 20:50 ` [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
  4 siblings, 0 replies; 14+ messages in thread
From: Ian Rogers @ 2025-04-16  4:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, Aditya Bodkhe, Leo Yan,
	Thomas Falcon, Atish Patra

Allow legacy software events to be specified as, for example,
software/cpu-clock/u for consistency in parsing with hardware
events. Rename the type for hardware_event to legacy_event given its
new dual role.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/parse-events.c | 20 +++++++++++---
 tools/perf/util/parse-events.h |  3 ++-
 tools/perf/util/parse-events.l | 48 ++++++++++++++++++++++------------
 tools/perf/util/parse-events.y | 43 ++++++++++++++++++++++--------
 tools/perf/util/pmu.c          |  9 ++++---
 5 files changed, 86 insertions(+), 37 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index f4236570aa4c..21506fcf0bfa 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -805,6 +805,7 @@ const char *parse_events__term_type_str(enum parse_events__term_type term_type)
 		[PARSE_EVENTS__TERM_TYPE_RAW]                   = "raw",
 		[PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE]          = "legacy-cache",
 		[PARSE_EVENTS__TERM_TYPE_HARDWARE]              = "hardware",
+		[PARSE_EVENTS__TERM_TYPE_SOFTWARE]              = "software",
 	};
 	if ((unsigned int)term_type >= __PARSE_EVENTS__TERM_TYPE_NR)
 		return "unknown term";
@@ -854,6 +855,7 @@ config_term_avail(enum parse_events__term_type term_type, struct parse_events_er
 	case PARSE_EVENTS__TERM_TYPE_RAW:
 	case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
 	case PARSE_EVENTS__TERM_TYPE_HARDWARE:
+	case PARSE_EVENTS__TERM_TYPE_SOFTWARE:
 	default:
 		if (!err)
 			return false;
@@ -985,6 +987,7 @@ do {									   \
 	case PARSE_EVENTS__TERM_TYPE_USER:
 	case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
 	case PARSE_EVENTS__TERM_TYPE_HARDWARE:
+	case PARSE_EVENTS__TERM_TYPE_SOFTWARE:
 	default:
 		parse_events_error__handle(err, term->err_term,
 					strdup(parse_events__term_type_str(term->type_term)),
@@ -1037,7 +1040,8 @@ static int config_term_pmu(struct perf_event_attr *attr,
 			term->no_value = true;
 		}
 	}
-	if (term->type_term == PARSE_EVENTS__TERM_TYPE_HARDWARE) {
+	if (term->type_term == PARSE_EVENTS__TERM_TYPE_HARDWARE ||
+	    term->type_term == PARSE_EVENTS__TERM_TYPE_SOFTWARE) {
 		struct perf_pmu *pmu = perf_pmus__find_by_type(attr->type);
 
 		if (!pmu) {
@@ -1057,10 +1061,15 @@ static int config_term_pmu(struct perf_event_attr *attr,
 			term->no_value = true;
 			term->alternate_hw_config = true;
 		} else {
-			attr->type = PERF_TYPE_HARDWARE;
 			attr->config = term->val.num;
-			if (perf_pmus__supports_extended_type())
-				attr->config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
+			if (term->type_term == PARSE_EVENTS__TERM_TYPE_HARDWARE) {
+				attr->type = PERF_TYPE_HARDWARE;
+				if (perf_pmus__supports_extended_type())
+					attr->config |= (__u64)pmu->type << PERF_PMU_TYPE_SHIFT;
+
+			} else {
+				attr->type = PERF_TYPE_SOFTWARE;
+			}
 		}
 		return 0;
 	}
@@ -1108,6 +1117,7 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
 	case PARSE_EVENTS__TERM_TYPE_RAW:
 	case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
 	case PARSE_EVENTS__TERM_TYPE_HARDWARE:
+	case PARSE_EVENTS__TERM_TYPE_SOFTWARE:
 	default:
 		if (err) {
 			parse_events_error__handle(err, term->err_term,
@@ -1242,6 +1252,7 @@ do {								\
 		case PARSE_EVENTS__TERM_TYPE_RAW:
 		case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
 		case PARSE_EVENTS__TERM_TYPE_HARDWARE:
+		case PARSE_EVENTS__TERM_TYPE_SOFTWARE:
 		default:
 			break;
 		}
@@ -1296,6 +1307,7 @@ static int get_config_chgs(struct perf_pmu *pmu, struct parse_events_terms *head
 		case PARSE_EVENTS__TERM_TYPE_RAW:
 		case PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE:
 		case PARSE_EVENTS__TERM_TYPE_HARDWARE:
+		case PARSE_EVENTS__TERM_TYPE_SOFTWARE:
 		default:
 			break;
 		}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index e176a34ab088..c0a594827f4f 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -80,7 +80,8 @@ enum parse_events__term_type {
 	PARSE_EVENTS__TERM_TYPE_RAW,
 	PARSE_EVENTS__TERM_TYPE_LEGACY_CACHE,
 	PARSE_EVENTS__TERM_TYPE_HARDWARE,
-#define	__PARSE_EVENTS__TERM_TYPE_NR (PARSE_EVENTS__TERM_TYPE_HARDWARE + 1)
+	PARSE_EVENTS__TERM_TYPE_SOFTWARE,
+#define	__PARSE_EVENTS__TERM_TYPE_NR (PARSE_EVENTS__TERM_TYPE_SOFTWARE + 1)
 };
 
 struct parse_events_term {
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 324b7dc8a0d3..a670d23ea9cc 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -117,12 +117,14 @@ do {								\
 	yyless(0);						\
 } while (0)
 
-static int sym(yyscan_t scanner, int config)
+static int sw(yyscan_t scanner, int config)
 {
 	YYSTYPE *yylval = parse_events_get_lval(scanner);
+	char *text = parse_events_get_text(scanner);
 
-	yylval->num = config;
-	return PE_VALUE_SYM_SW;
+	yylval->legacy_event.str = strdup(text);
+	yylval->legacy_event.num = config;
+	return PE_TERM_SW;
 }
 
 static int term(yyscan_t scanner, enum parse_events__term_type type)
@@ -138,8 +140,8 @@ static int hw(yyscan_t scanner, int config)
 	YYSTYPE *yylval = parse_events_get_lval(scanner);
 	char *text = parse_events_get_text(scanner);
 
-	yylval->hardware_event.str = strdup(text);
-	yylval->hardware_event.num = config;
+	yylval->legacy_event.str = strdup(text);
+	yylval->legacy_event.num = config;
 	return PE_TERM_HW;
 }
 
@@ -345,6 +347,18 @@ branch-instructions|branches			{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_INSTR
 branch-misses					{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
 bus-cycles					{ return hw(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
 ref-cycles					{ return hw(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
+cpu-clock					{ return sw(yyscanner, PERF_COUNT_SW_CPU_CLOCK); }
+task-clock					{ return sw(yyscanner, PERF_COUNT_SW_TASK_CLOCK); }
+page-faults|faults				{ return sw(yyscanner, PERF_COUNT_SW_PAGE_FAULTS); }
+minor-faults					{ return sw(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MIN); }
+major-faults					{ return sw(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MAJ); }
+context-switches|cs				{ return sw(yyscanner, PERF_COUNT_SW_CONTEXT_SWITCHES); }
+cpu-migrations|migrations			{ return sw(yyscanner, PERF_COUNT_SW_CPU_MIGRATIONS); }
+alignment-faults				{ return sw(yyscanner, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
+emulation-faults				{ return sw(yyscanner, PERF_COUNT_SW_EMULATION_FAULTS); }
+dummy						{ return sw(yyscanner, PERF_COUNT_SW_DUMMY); }
+bpf-output					{ return sw(yyscanner, PERF_COUNT_SW_BPF_OUTPUT); }
+cgroup-switches					{ return sw(yyscanner, PERF_COUNT_SW_CGROUP_SWITCHES); }
 r{num_raw_hex}		{ return str(yyscanner, PE_RAW); }
 r0x{num_raw_hex}	{ return str(yyscanner, PE_RAW); }
 ,			{ return ','; }
@@ -400,18 +414,18 @@ branch-instructions|branches			{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_INSTR
 branch-misses					{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
 bus-cycles					{ return hw(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
 ref-cycles					{ return hw(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
-cpu-clock					{ return sym(yyscanner, PERF_COUNT_SW_CPU_CLOCK); }
-task-clock					{ return sym(yyscanner, PERF_COUNT_SW_TASK_CLOCK); }
-page-faults|faults				{ return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS); }
-minor-faults					{ return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MIN); }
-major-faults					{ return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MAJ); }
-context-switches|cs				{ return sym(yyscanner, PERF_COUNT_SW_CONTEXT_SWITCHES); }
-cpu-migrations|migrations			{ return sym(yyscanner, PERF_COUNT_SW_CPU_MIGRATIONS); }
-alignment-faults				{ return sym(yyscanner, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
-emulation-faults				{ return sym(yyscanner, PERF_COUNT_SW_EMULATION_FAULTS); }
-dummy						{ return sym(yyscanner, PERF_COUNT_SW_DUMMY); }
-bpf-output					{ return sym(yyscanner, PERF_COUNT_SW_BPF_OUTPUT); }
-cgroup-switches					{ return sym(yyscanner, PERF_COUNT_SW_CGROUP_SWITCHES); }
+cpu-clock					{ return sw(yyscanner, PERF_COUNT_SW_CPU_CLOCK); }
+task-clock					{ return sw(yyscanner, PERF_COUNT_SW_TASK_CLOCK); }
+page-faults|faults				{ return sw(yyscanner, PERF_COUNT_SW_PAGE_FAULTS); }
+minor-faults					{ return sw(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MIN); }
+major-faults					{ return sw(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MAJ); }
+context-switches|cs				{ return sw(yyscanner, PERF_COUNT_SW_CONTEXT_SWITCHES); }
+cpu-migrations|migrations			{ return sw(yyscanner, PERF_COUNT_SW_CPU_MIGRATIONS); }
+alignment-faults				{ return sw(yyscanner, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
+emulation-faults				{ return sw(yyscanner, PERF_COUNT_SW_EMULATION_FAULTS); }
+dummy						{ return sw(yyscanner, PERF_COUNT_SW_DUMMY); }
+bpf-output					{ return sw(yyscanner, PERF_COUNT_SW_BPF_OUTPUT); }
+cgroup-switches					{ return sw(yyscanner, PERF_COUNT_SW_CGROUP_SWITCHES); }
 
 {lc_type}			{ return str(yyscanner, PE_LEGACY_CACHE); }
 {lc_type}-{lc_op_result}	{ return str(yyscanner, PE_LEGACY_CACHE); }
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index d2ef1890007e..4992a5bf3c44 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -55,7 +55,7 @@ static void free_list_evsel(struct list_head* list_evsel)
 %}
 
 %token PE_START_EVENTS PE_START_TERMS
-%token PE_VALUE PE_VALUE_SYM_SW PE_TERM
+%token PE_VALUE PE_TERM
 %token PE_EVENT_NAME
 %token PE_RAW PE_NAME
 %token PE_MODIFIER_EVENT PE_MODIFIER_BP PE_BP_COLON PE_BP_SLASH
@@ -63,9 +63,8 @@ static void free_list_evsel(struct list_head* list_evsel)
 %token PE_PREFIX_MEM
 %token PE_ERROR
 %token PE_DRV_CFG_TERM
-%token PE_TERM_HW
+%token PE_TERM_HW PE_TERM_SW
 %type <num> PE_VALUE
-%type <num> PE_VALUE_SYM_SW
 %type <mod> PE_MODIFIER_EVENT
 %type <term_type> PE_TERM
 %type <str> PE_RAW
@@ -101,8 +100,9 @@ static void free_list_evsel(struct list_head* list_evsel)
 %destructor { free_list_evsel ($$); } <list_evsel>
 %type <tracepoint_name> tracepoint_name
 %destructor { free ($$.sys); free ($$.event); } <tracepoint_name>
-%type <hardware_event> PE_TERM_HW
-%destructor { free ($$.str); } <hardware_event>
+%type <legacy_event> PE_TERM_HW
+%type <legacy_event> PE_TERM_SW
+%destructor { free ($$.str); } <legacy_event>
 
 %union
 {
@@ -117,10 +117,10 @@ static void free_list_evsel(struct list_head* list_evsel)
 		char *sys;
 		char *event;
 	} tracepoint_name;
-	struct hardware_event {
+	struct legacy_event {
 		char *str;
 		u64 num;
-	} hardware_event;
+	} legacy_event;
 }
 %%
 
@@ -334,16 +334,17 @@ PE_TERM_HW sep_dc
 }
 
 event_legacy_symbol:
-PE_VALUE_SYM_SW '/' event_config '/'
+PE_TERM_SW '/' event_config '/'
 {
 	struct list_head *list;
 	int err;
 
+	free($1.str);
 	list = alloc_list();
 	if (!list)
 		YYNOMEM;
 	err = parse_events_add_numeric(_parse_state, list,
-				/*type=*/PERF_TYPE_SOFTWARE, /*config=*/$1,
+				/*type=*/PERF_TYPE_SOFTWARE, /*config=*/$1.num,
 				$3, /*wildcard=*/false);
 	parse_events_terms__delete($3);
 	if (err) {
@@ -353,16 +354,17 @@ PE_VALUE_SYM_SW '/' event_config '/'
 	$$ = list;
 }
 |
-PE_VALUE_SYM_SW sep_slash_slash_dc
+PE_TERM_SW sep_slash_slash_dc
 {
 	struct list_head *list;
 	int err;
 
+	free($1.str);
 	list = alloc_list();
 	if (!list)
 		YYNOMEM;
 	err = parse_events_add_numeric(_parse_state, list,
-				/*type=*/PERF_TYPE_SOFTWARE, /*config=*/$1,
+				/*type=*/PERF_TYPE_SOFTWARE, /*config=*/$1.num,
 				/*head_config=*/NULL, /*wildcard=*/false);
 	if (err)
 		PE_ABORT(err);
@@ -615,6 +617,11 @@ PE_TERM_HW
 {
 	$$ = $1.str;
 }
+|
+PE_TERM_SW
+{
+	$$ = $1.str;
+}
 
 event_term:
 PE_RAW
@@ -696,6 +703,20 @@ PE_TERM_HW
 	$$ = term;
 }
 |
+PE_TERM_SW
+{
+	struct parse_events_term *term;
+	int err = parse_events_term__num(&term, PARSE_EVENTS__TERM_TYPE_SOFTWARE,
+					 $1.str, $1.num & 255, /*novalue=*/false,
+					 &@1, /*loc_val=*/NULL);
+
+	if (err) {
+		free($1.str);
+		PE_ABORT(err);
+	}
+	$$ = term;
+}
+|
 PE_TERM '=' name_or_raw
 {
 	struct parse_events_term *term;
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index b7ebac5ab1d1..ac1149658d9b 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -1429,7 +1429,7 @@ static int pmu_config_term(const struct perf_pmu *pmu,
 			break;
 		case PARSE_EVENTS__TERM_TYPE_USER: /* Not hardcoded. */
 			return -EINVAL;
-		case PARSE_EVENTS__TERM_TYPE_NAME ... PARSE_EVENTS__TERM_TYPE_HARDWARE:
+		case PARSE_EVENTS__TERM_TYPE_NAME ... PARSE_EVENTS__TERM_TYPE_SOFTWARE:
 			/* Skip non-config terms. */
 			break;
 		default:
@@ -1810,10 +1810,11 @@ int perf_pmu__for_each_format(struct perf_pmu *pmu, void *state, pmu_format_call
 
 	/*
 	 * max-events and driver-config are missing above as are the internal
-	 * types user, metric-id, raw, legacy cache and hardware. Assert against
-	 * the enum parse_events__term_type so they are kept in sync.
+	 * types user, metric-id, raw, legacy cache, hardware and
+	 * software. Assert against the enum parse_events__term_type so they are
+	 * kept in sync.
 	 */
-	_Static_assert(ARRAY_SIZE(terms) == __PARSE_EVENTS__TERM_TYPE_NR - 6,
+	_Static_assert(ARRAY_SIZE(terms) == __PARSE_EVENTS__TERM_TYPE_NR - 7,
 		       "perf_pmu__for_each_format()'s terms must be kept in sync with enum parse_events__term_type");
 	list_for_each_entry(format, &pmu->format, list) {
 		perf_pmu_format__load(pmu, format);
-- 
2.49.0.777.g153de2bbd5-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v8 4/4] perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
  2025-04-16  4:51 [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
                   ` (2 preceding siblings ...)
  2025-04-16  4:51 ` [PATCH v8 3/4] perf parse-events: Allow software events to be terms Ian Rogers
@ 2025-04-16  4:51 ` Ian Rogers
  2025-05-27 20:50 ` [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
  4 siblings, 0 replies; 14+ messages in thread
From: Ian Rogers @ 2025-04-16  4:51 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, Aditya Bodkhe, Leo Yan,
	Thomas Falcon, Atish Patra

Add PMU names when geting events to probe. The first is only ever used
but by not specifying a PMU name perf_api_probe will load all the
PMUs.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/perf_api_probe.c | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

diff --git a/tools/perf/util/perf_api_probe.c b/tools/perf/util/perf_api_probe.c
index 1de3b69cdf4a..a910cde4d78b 100644
--- a/tools/perf/util/perf_api_probe.c
+++ b/tools/perf/util/perf_api_probe.c
@@ -59,10 +59,10 @@ static int perf_do_probe_api(setup_probe_fn_t fn, struct perf_cpu cpu, const cha
 
 static bool perf_probe_api(setup_probe_fn_t fn)
 {
-	const char *try[] = {"cycles:u", "instructions:u", "cpu-clock:u", NULL};
+	struct perf_pmu *pmu;
 	struct perf_cpu_map *cpus;
 	struct perf_cpu cpu;
-	int ret, i = 0;
+	int ret = 0;
 
 	cpus = perf_cpu_map__new_online_cpus();
 	if (!cpus)
@@ -70,12 +70,23 @@ static bool perf_probe_api(setup_probe_fn_t fn)
 	cpu = perf_cpu_map__cpu(cpus, 0);
 	perf_cpu_map__put(cpus);
 
-	do {
-		ret = perf_do_probe_api(fn, cpu, try[i++]);
-		if (!ret)
-			return true;
-	} while (ret == -EAGAIN && try[i]);
-
+	ret = perf_do_probe_api(fn, cpu, "software/cpu-clock/u");
+	if (!ret)
+		return true;
+
+	pmu = perf_pmus__scan_core(/*pmu=*/NULL);
+	if (pmu) {
+		const char *try[] = {"cycles", "instructions", NULL};
+		char buf[256];
+		int i = 0;
+
+		 while (ret == -EAGAIN && try[i]) {
+			snprintf(buf, sizeof(buf), "%s/%s/u", pmu->name, try[i++]);
+			ret = perf_do_probe_api(fn, cpu, buf);
+			if (!ret)
+				return true;
+		}
+	}
 	return false;
 }
 
-- 
2.49.0.777.g153de2bbd5-goog


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2025-04-16  4:51 [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
                   ` (3 preceding siblings ...)
  2025-04-16  4:51 ` [PATCH v8 4/4] perf perf_api_probe: Avoid scanning all PMUs, try software PMU first Ian Rogers
@ 2025-05-27 20:50 ` Ian Rogers
  2025-06-03  4:23   ` Namhyung Kim
  4 siblings, 1 reply; 14+ messages in thread
From: Ian Rogers @ 2025-05-27 20:50 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Ian Rogers, Adrian Hunter, Kan Liang, James Clark, Ze Gao,
	Weilin Wang, Dominique Martinet, Jean-Philippe Romain, Junhao He,
	linux-perf-users, linux-kernel, Aditya Bodkhe, Leo Yan,
	Thomas Falcon, Atish Patra

On Tue, Apr 15, 2025 at 9:51 PM Ian Rogers <irogers@google.com> wrote:
>
> At the RISC-V summit the topic of avoiding event data being in the
> RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> events being the priority when no PMU is provided so that legacy
> events maybe supported via json. Originally Mark Rutland also
> expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> M? processors, but James Clark more recently tested this and believes
> the driver issues there may not have existed or have been resolved. In
> any case, it is inconsistent that with a PMU event names avoid legacy
> encodings, but when wildcarding PMUs (ie without a PMU with the event
> name) the legacy encodings have priority.
>
> The situation is further inconsistent as legacy events are case
> sensitive, so on Intel that provides a sysfs instructions event, the
> instructions event without a PMU and lowercase is legacy while with
> uppercase letters it matches with sysfs which is case insensitive. Are
> there legacy events with upper case letters? Yes there are, the cache
> ones mix case freely:
>
> L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node
>
> meaning LLC that means L2 (which is wrong) both match as part of a
> legacy cache name but llc and l2 would only match sysfs/json
> events. The whole thing just points at the ridiculous nature of legacy
> events and why we'd want them to be preffered I don't know. Why should
> case of a letter or having a PMU prefix impact the encoding in the
> perf_event_attr?
>
> The patch doing this work was reverted in a v6.10 release candidate
> as, even though the patch was posted for weeks and had been on
> linux-next for weeks without issue, Linus was in the habit of using
> explicit legacy events with unsupported precision options on his
> Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> where ARM decided to call the events bus_cycles and cycles, the latter
> being also a legacy event name. ARM haven't renamed the cycles event
> to a more consistent cpu_cycles and avoided the problem. With these
> changes the problematic event will now be skipped, a large warning
> produced, and perf record will continue for the other PMU events. This
> solution was proposed by Arnaldo.
>
> v8: Change removing of failed to open events that are tracking so that
>     the tracking moves to the next event. Make software events able to
>     specified with a PMU. Change the perf_api_probe to not load all
>     PMUs through scanning, specify a PMU when parsing events.
>
> v7: Expand cover letter, fix a missed core_ok check in the v6
>     rebase. Note, as with v6 there is an alternate series that
>     prioritizes legacy events but that is silly and I'd prefer we
>     didn't do it.
>
> v6: Rebase of v5 (dropping already merged patches):
>     https://lore.kernel.org/lkml/20250109222109.567031-1-irogers@google.com/
>     that unusually had an RFC posted for it:
>     https://lore.kernel.org/lkml/Z7Z5kv75BMML2A1q@google.com/
>     Note, this patch conflicts/contradicts:
>     https://lore.kernel.org/lkml/20250312211623.2495798-1-irogers@google.com/
>     that I posted so that we could either consistently prioritize
>     sysfs/json (these patches) or legacy events (the other
>     patches). That lack of event printing and encoding inconsistency
>     is most prominent in the encoding of events like "instructions"
>     which on hybrid are reported as "cpu_core/instructions/" but
>     "instructions" before these patches gets a legacy encoding while
>     "cpu_core/instructions/" gets a sysfs/json encoding. These patches
>     make "instructions" always get a sysfs/json encoding while the
>     alternate patches make it always get a legacy encoding.
>
> v5: Follow Namhyung's suggestion and ignore the case where command
>     line dummy events fail to open alongside other events that all
>     fail to open. Note, the Tested-by tags are left on the series as
>     v4 and v5 were changing an error case that doesn't occur in
>     testing but was manually tested by myself.
>
> v4: Rework the no events opening change from v3 to make it handle
>     multiple dummy events. Sadly an evlist isn't empty if it just
>     contains dummy events as the dummy event may be used with "perf
>     record -e dummy .." as a way to determine whether permission
>     issues exist. Other software events like cpu-clock would suffice
>     for this, but the using dummy genie has left the bottle.
>
>     Another problem is that we appear to have an excessive number of
>     dummy events added, for example, we can likely avoid a dummy event
>     and add sideband data to the original event. For auxtrace more
>     dummy events may be opened too. Anyway, this has led to the
>     approach taken in patch 3 where the number of dummy parsed events
>     is computed. If the number of removed/failing-to-open non-dummy
>     events matches the number of non-dummy events then we want to
>     fail, but only if there are no parsed dummy events or if there was
>     one then it must have opened. The math here is hard to read, but
>     passes my manual testing.
>
> v3: Make no events opening for perf record a failure as suggested by
>     James Clark and Aditya Bodkhe <Aditya.Bodkhe1@ibm.com>. Also,
>     rebase.
>
> v2: Rebase and add tested-by tags from James Clark, Leo Yan and Atish
>     Patra who have tested on RISC-V and ARM CPUs, including the
>     problem case from before.

Ping. Thanks,
Ian

> Ian Rogers (4):
>   perf record: Skip don't fail for events that don't open
>   perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
>     legacy"
>   perf parse-events: Allow software events to be terms
>   perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
>
>  tools/perf/builtin-record.c      | 63 +++++++++++++++++++---
>  tools/perf/util/parse-events.c   | 47 +++++++++++++----
>  tools/perf/util/parse-events.h   |  3 +-
>  tools/perf/util/parse-events.l   | 90 ++++++++++++++++++--------------
>  tools/perf/util/parse-events.y   | 85 ++++++++++++++++++++++--------
>  tools/perf/util/perf_api_probe.c | 27 +++++++---
>  tools/perf/util/pmu.c            |  9 ++--
>  7 files changed, 235 insertions(+), 89 deletions(-)
>
> --
> 2.49.0.777.g153de2bbd5-goog
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2025-05-27 20:50 ` [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
@ 2025-06-03  4:23   ` Namhyung Kim
  2025-06-03  6:08     ` Ian Rogers
  0 siblings, 1 reply; 14+ messages in thread
From: Namhyung Kim @ 2025-06-03  4:23 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	Aditya Bodkhe, Leo Yan, Thomas Falcon, Atish Patra

Hi Ian,

On Tue, May 27, 2025 at 01:50:32PM -0700, Ian Rogers wrote:
> On Tue, Apr 15, 2025 at 9:51 PM Ian Rogers <irogers@google.com> wrote:
> >
> > At the RISC-V summit the topic of avoiding event data being in the
> > RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> > events being the priority when no PMU is provided so that legacy
> > events maybe supported via json. Originally Mark Rutland also
> > expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> > M? processors, but James Clark more recently tested this and believes
> > the driver issues there may not have existed or have been resolved. In
> > any case, it is inconsistent that with a PMU event names avoid legacy
> > encodings, but when wildcarding PMUs (ie without a PMU with the event
> > name) the legacy encodings have priority.
> >
> > The situation is further inconsistent as legacy events are case
> > sensitive, so on Intel that provides a sysfs instructions event, the
> > instructions event without a PMU and lowercase is legacy while with
> > uppercase letters it matches with sysfs which is case insensitive. Are
> > there legacy events with upper case letters? Yes there are, the cache
> > ones mix case freely:
> >
> > L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node
> >
> > meaning LLC that means L2 (which is wrong) both match as part of a
> > legacy cache name but llc and l2 would only match sysfs/json
> > events. The whole thing just points at the ridiculous nature of legacy
> > events and why we'd want them to be preffered I don't know. Why should
> > case of a letter or having a PMU prefix impact the encoding in the
> > perf_event_attr?
> >
> > The patch doing this work was reverted in a v6.10 release candidate
> > as, even though the patch was posted for weeks and had been on
> > linux-next for weeks without issue, Linus was in the habit of using
> > explicit legacy events with unsupported precision options on his
> > Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> > where ARM decided to call the events bus_cycles and cycles, the latter
> > being also a legacy event name. ARM haven't renamed the cycles event
> > to a more consistent cpu_cycles and avoided the problem. With these
> > changes the problematic event will now be skipped, a large warning
> > produced, and perf record will continue for the other PMU events. This
> > solution was proposed by Arnaldo.
> >
> > v8: Change removing of failed to open events that are tracking so that
> >     the tracking moves to the next event. Make software events able to
> >     specified with a PMU. Change the perf_api_probe to not load all
> >     PMUs through scanning, specify a PMU when parsing events.
> >
> > v7: Expand cover letter, fix a missed core_ok check in the v6
> >     rebase. Note, as with v6 there is an alternate series that
> >     prioritizes legacy events but that is silly and I'd prefer we
> >     didn't do it.
> >
> > v6: Rebase of v5 (dropping already merged patches):
> >     https://lore.kernel.org/lkml/20250109222109.567031-1-irogers@google.com/
> >     that unusually had an RFC posted for it:
> >     https://lore.kernel.org/lkml/Z7Z5kv75BMML2A1q@google.com/
> >     Note, this patch conflicts/contradicts:
> >     https://lore.kernel.org/lkml/20250312211623.2495798-1-irogers@google.com/
> >     that I posted so that we could either consistently prioritize
> >     sysfs/json (these patches) or legacy events (the other
> >     patches). That lack of event printing and encoding inconsistency
> >     is most prominent in the encoding of events like "instructions"
> >     which on hybrid are reported as "cpu_core/instructions/" but
> >     "instructions" before these patches gets a legacy encoding while
> >     "cpu_core/instructions/" gets a sysfs/json encoding. These patches
> >     make "instructions" always get a sysfs/json encoding while the
> >     alternate patches make it always get a legacy encoding.
> >
> > v5: Follow Namhyung's suggestion and ignore the case where command
> >     line dummy events fail to open alongside other events that all
> >     fail to open. Note, the Tested-by tags are left on the series as
> >     v4 and v5 were changing an error case that doesn't occur in
> >     testing but was manually tested by myself.
> >
> > v4: Rework the no events opening change from v3 to make it handle
> >     multiple dummy events. Sadly an evlist isn't empty if it just
> >     contains dummy events as the dummy event may be used with "perf
> >     record -e dummy .." as a way to determine whether permission
> >     issues exist. Other software events like cpu-clock would suffice
> >     for this, but the using dummy genie has left the bottle.
> >
> >     Another problem is that we appear to have an excessive number of
> >     dummy events added, for example, we can likely avoid a dummy event
> >     and add sideband data to the original event. For auxtrace more
> >     dummy events may be opened too. Anyway, this has led to the
> >     approach taken in patch 3 where the number of dummy parsed events
> >     is computed. If the number of removed/failing-to-open non-dummy
> >     events matches the number of non-dummy events then we want to
> >     fail, but only if there are no parsed dummy events or if there was
> >     one then it must have opened. The math here is hard to read, but
> >     passes my manual testing.
> >
> > v3: Make no events opening for perf record a failure as suggested by
> >     James Clark and Aditya Bodkhe <Aditya.Bodkhe1@ibm.com>. Also,
> >     rebase.
> >
> > v2: Rebase and add tested-by tags from James Clark, Leo Yan and Atish
> >     Patra who have tested on RISC-V and ARM CPUs, including the
> >     problem case from before.
> 
> Ping. Thanks,
> Ian
> 
> > Ian Rogers (4):
> >   perf record: Skip don't fail for events that don't open
> >   perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
> >     legacy"
> >   perf parse-events: Allow software events to be terms
> >   perf perf_api_probe: Avoid scanning all PMUs, try software PMU first

Sorry for the delay.  But I think we wanted to move to this instead:

https://lore.kernel.org/linux-perf-users/20250312211623.2495798-1-irogers@google.com/

Thanks,
Namhyung

> >
> >  tools/perf/builtin-record.c      | 63 +++++++++++++++++++---
> >  tools/perf/util/parse-events.c   | 47 +++++++++++++----
> >  tools/perf/util/parse-events.h   |  3 +-
> >  tools/perf/util/parse-events.l   | 90 ++++++++++++++++++--------------
> >  tools/perf/util/parse-events.y   | 85 ++++++++++++++++++++++--------
> >  tools/perf/util/perf_api_probe.c | 27 +++++++---
> >  tools/perf/util/pmu.c            |  9 ++--
> >  7 files changed, 235 insertions(+), 89 deletions(-)
> >
> > --
> > 2.49.0.777.g153de2bbd5-goog
> >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2025-06-03  4:23   ` Namhyung Kim
@ 2025-06-03  6:08     ` Ian Rogers
  2025-06-03 22:50       ` Namhyung Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Ian Rogers @ 2025-06-03  6:08 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	Aditya Bodkhe, Leo Yan, Thomas Falcon, Atish Patra

On Mon, Jun 2, 2025 at 9:23 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hi Ian,
>
> On Tue, May 27, 2025 at 01:50:32PM -0700, Ian Rogers wrote:
> > On Tue, Apr 15, 2025 at 9:51 PM Ian Rogers <irogers@google.com> wrote:
> > >
> > > At the RISC-V summit the topic of avoiding event data being in the
> > > RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> > > events being the priority when no PMU is provided so that legacy
> > > events maybe supported via json. Originally Mark Rutland also
> > > expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> > > M? processors, but James Clark more recently tested this and believes
> > > the driver issues there may not have existed or have been resolved. In
> > > any case, it is inconsistent that with a PMU event names avoid legacy
> > > encodings, but when wildcarding PMUs (ie without a PMU with the event
> > > name) the legacy encodings have priority.
> > >
> > > The situation is further inconsistent as legacy events are case
> > > sensitive, so on Intel that provides a sysfs instructions event, the
> > > instructions event without a PMU and lowercase is legacy while with
> > > uppercase letters it matches with sysfs which is case insensitive. Are
> > > there legacy events with upper case letters? Yes there are, the cache
> > > ones mix case freely:
> > >
> > > L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node
> > >
> > > meaning LLC that means L2 (which is wrong) both match as part of a
> > > legacy cache name but llc and l2 would only match sysfs/json
> > > events. The whole thing just points at the ridiculous nature of legacy
> > > events and why we'd want them to be preffered I don't know. Why should
> > > case of a letter or having a PMU prefix impact the encoding in the
> > > perf_event_attr?
> > >
> > > The patch doing this work was reverted in a v6.10 release candidate
> > > as, even though the patch was posted for weeks and had been on
> > > linux-next for weeks without issue, Linus was in the habit of using
> > > explicit legacy events with unsupported precision options on his
> > > Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> > > where ARM decided to call the events bus_cycles and cycles, the latter
> > > being also a legacy event name. ARM haven't renamed the cycles event
> > > to a more consistent cpu_cycles and avoided the problem. With these
> > > changes the problematic event will now be skipped, a large warning
> > > produced, and perf record will continue for the other PMU events. This
> > > solution was proposed by Arnaldo.
> > >
> > > v8: Change removing of failed to open events that are tracking so that
> > >     the tracking moves to the next event. Make software events able to
> > >     specified with a PMU. Change the perf_api_probe to not load all
> > >     PMUs through scanning, specify a PMU when parsing events.
> > >
> > > v7: Expand cover letter, fix a missed core_ok check in the v6
> > >     rebase. Note, as with v6 there is an alternate series that
> > >     prioritizes legacy events but that is silly and I'd prefer we
> > >     didn't do it.
> > >
> > > v6: Rebase of v5 (dropping already merged patches):
> > >     https://lore.kernel.org/lkml/20250109222109.567031-1-irogers@google.com/
> > >     that unusually had an RFC posted for it:
> > >     https://lore.kernel.org/lkml/Z7Z5kv75BMML2A1q@google.com/
> > >     Note, this patch conflicts/contradicts:
> > >     https://lore.kernel.org/lkml/20250312211623.2495798-1-irogers@google.com/
> > >     that I posted so that we could either consistently prioritize
> > >     sysfs/json (these patches) or legacy events (the other
> > >     patches). That lack of event printing and encoding inconsistency
> > >     is most prominent in the encoding of events like "instructions"
> > >     which on hybrid are reported as "cpu_core/instructions/" but
> > >     "instructions" before these patches gets a legacy encoding while
> > >     "cpu_core/instructions/" gets a sysfs/json encoding. These patches
> > >     make "instructions" always get a sysfs/json encoding while the
> > >     alternate patches make it always get a legacy encoding.
> > >
> > > v5: Follow Namhyung's suggestion and ignore the case where command
> > >     line dummy events fail to open alongside other events that all
> > >     fail to open. Note, the Tested-by tags are left on the series as
> > >     v4 and v5 were changing an error case that doesn't occur in
> > >     testing but was manually tested by myself.
> > >
> > > v4: Rework the no events opening change from v3 to make it handle
> > >     multiple dummy events. Sadly an evlist isn't empty if it just
> > >     contains dummy events as the dummy event may be used with "perf
> > >     record -e dummy .." as a way to determine whether permission
> > >     issues exist. Other software events like cpu-clock would suffice
> > >     for this, but the using dummy genie has left the bottle.
> > >
> > >     Another problem is that we appear to have an excessive number of
> > >     dummy events added, for example, we can likely avoid a dummy event
> > >     and add sideband data to the original event. For auxtrace more
> > >     dummy events may be opened too. Anyway, this has led to the
> > >     approach taken in patch 3 where the number of dummy parsed events
> > >     is computed. If the number of removed/failing-to-open non-dummy
> > >     events matches the number of non-dummy events then we want to
> > >     fail, but only if there are no parsed dummy events or if there was
> > >     one then it must have opened. The math here is hard to read, but
> > >     passes my manual testing.
> > >
> > > v3: Make no events opening for perf record a failure as suggested by
> > >     James Clark and Aditya Bodkhe <Aditya.Bodkhe1@ibm.com>. Also,
> > >     rebase.
> > >
> > > v2: Rebase and add tested-by tags from James Clark, Leo Yan and Atish
> > >     Patra who have tested on RISC-V and ARM CPUs, including the
> > >     problem case from before.
> >
> > Ping. Thanks,
> > Ian
> >
> > > Ian Rogers (4):
> > >   perf record: Skip don't fail for events that don't open
> > >   perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
> > >     legacy"
> > >   perf parse-events: Allow software events to be terms
> > >   perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
>
> Sorry for the delay.  But I think we wanted to move to this instead:
>
> https://lore.kernel.org/linux-perf-users/20250312211623.2495798-1-irogers@google.com/

Hi Namhyung,

The preference for sysfs/json over legacy was done as a bug fix and
because ARM (Mark Rutland) argued strongly that it was the most
sensible priority. Intel (Kan Liang) approved the change in priority.
RISC-V have wanted this behavior as it enables the migration of event
mappings from the driver to the tool. As the primary maintainer of the
event parsing and metric code I prefer the priority as legacy events
are weird, for example they aren't case insensitive in their naming.
For example, on Intel with legacy events as the priority cpu-cycles
would be a legacy event, but cpu-Cyles a sysfs one. On ARM cpu_cycles
would be a sysfs event, but cpu-cycles a legacy one. A minor character
difference and very different and imo surprising event encodings.

On your RFC thread Arnaldo and James said that legacy events somehow
enabled a form of drill down. As event parsing is mapping a name to a
perf_event_attr I completely don't see this as the mapping is opaque.

I strongly believe we need consistency. If `perf stat -e cycles .. `
prints cpu_core/cycles/ as the event name on a hybrid Intel, then
`perf stat -e cpu_core/cycles/ .. ` should have the same
perf_event_attr. Both patch series achieve this but this one does it
with consistency, and from what I see it, the support of 3 vendors.

Thanks,
Ian

> Thanks,
> Namhyung
>
> > >
> > >  tools/perf/builtin-record.c      | 63 +++++++++++++++++++---
> > >  tools/perf/util/parse-events.c   | 47 +++++++++++++----
> > >  tools/perf/util/parse-events.h   |  3 +-
> > >  tools/perf/util/parse-events.l   | 90 ++++++++++++++++++--------------
> > >  tools/perf/util/parse-events.y   | 85 ++++++++++++++++++++++--------
> > >  tools/perf/util/perf_api_probe.c | 27 +++++++---
> > >  tools/perf/util/pmu.c            |  9 ++--
> > >  7 files changed, 235 insertions(+), 89 deletions(-)
> > >
> > > --
> > > 2.49.0.777.g153de2bbd5-goog
> > >

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2025-06-03  6:08     ` Ian Rogers
@ 2025-06-03 22:50       ` Namhyung Kim
  2025-06-03 23:36         ` Ian Rogers
  0 siblings, 1 reply; 14+ messages in thread
From: Namhyung Kim @ 2025-06-03 22:50 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	Aditya Bodkhe, Leo Yan, Thomas Falcon, Atish Patra

On Mon, Jun 02, 2025 at 11:08:34PM -0700, Ian Rogers wrote:
> On Mon, Jun 2, 2025 at 9:23 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > Hi Ian,
> >
> > On Tue, May 27, 2025 at 01:50:32PM -0700, Ian Rogers wrote:
> > > On Tue, Apr 15, 2025 at 9:51 PM Ian Rogers <irogers@google.com> wrote:
> > > >
> > > > At the RISC-V summit the topic of avoiding event data being in the
> > > > RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> > > > events being the priority when no PMU is provided so that legacy
> > > > events maybe supported via json. Originally Mark Rutland also
> > > > expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> > > > M? processors, but James Clark more recently tested this and believes
> > > > the driver issues there may not have existed or have been resolved. In
> > > > any case, it is inconsistent that with a PMU event names avoid legacy
> > > > encodings, but when wildcarding PMUs (ie without a PMU with the event
> > > > name) the legacy encodings have priority.
> > > >
> > > > The situation is further inconsistent as legacy events are case
> > > > sensitive, so on Intel that provides a sysfs instructions event, the
> > > > instructions event without a PMU and lowercase is legacy while with
> > > > uppercase letters it matches with sysfs which is case insensitive. Are
> > > > there legacy events with upper case letters? Yes there are, the cache
> > > > ones mix case freely:
> > > >
> > > > L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node
> > > >
> > > > meaning LLC that means L2 (which is wrong) both match as part of a
> > > > legacy cache name but llc and l2 would only match sysfs/json
> > > > events. The whole thing just points at the ridiculous nature of legacy
> > > > events and why we'd want them to be preffered I don't know. Why should
> > > > case of a letter or having a PMU prefix impact the encoding in the
> > > > perf_event_attr?
> > > >
> > > > The patch doing this work was reverted in a v6.10 release candidate
> > > > as, even though the patch was posted for weeks and had been on
> > > > linux-next for weeks without issue, Linus was in the habit of using
> > > > explicit legacy events with unsupported precision options on his
> > > > Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> > > > where ARM decided to call the events bus_cycles and cycles, the latter
> > > > being also a legacy event name. ARM haven't renamed the cycles event
> > > > to a more consistent cpu_cycles and avoided the problem. With these
> > > > changes the problematic event will now be skipped, a large warning
> > > > produced, and perf record will continue for the other PMU events. This
> > > > solution was proposed by Arnaldo.
> > > >
> > > > v8: Change removing of failed to open events that are tracking so that
> > > >     the tracking moves to the next event. Make software events able to
> > > >     specified with a PMU. Change the perf_api_probe to not load all
> > > >     PMUs through scanning, specify a PMU when parsing events.
> > > >
> > > > v7: Expand cover letter, fix a missed core_ok check in the v6
> > > >     rebase. Note, as with v6 there is an alternate series that
> > > >     prioritizes legacy events but that is silly and I'd prefer we
> > > >     didn't do it.
> > > >
> > > > v6: Rebase of v5 (dropping already merged patches):
> > > >     https://lore.kernel.org/lkml/20250109222109.567031-1-irogers@google.com/
> > > >     that unusually had an RFC posted for it:
> > > >     https://lore.kernel.org/lkml/Z7Z5kv75BMML2A1q@google.com/
> > > >     Note, this patch conflicts/contradicts:
> > > >     https://lore.kernel.org/lkml/20250312211623.2495798-1-irogers@google.com/
> > > >     that I posted so that we could either consistently prioritize
> > > >     sysfs/json (these patches) or legacy events (the other
> > > >     patches). That lack of event printing and encoding inconsistency
> > > >     is most prominent in the encoding of events like "instructions"
> > > >     which on hybrid are reported as "cpu_core/instructions/" but
> > > >     "instructions" before these patches gets a legacy encoding while
> > > >     "cpu_core/instructions/" gets a sysfs/json encoding. These patches
> > > >     make "instructions" always get a sysfs/json encoding while the
> > > >     alternate patches make it always get a legacy encoding.
> > > >
> > > > v5: Follow Namhyung's suggestion and ignore the case where command
> > > >     line dummy events fail to open alongside other events that all
> > > >     fail to open. Note, the Tested-by tags are left on the series as
> > > >     v4 and v5 were changing an error case that doesn't occur in
> > > >     testing but was manually tested by myself.
> > > >
> > > > v4: Rework the no events opening change from v3 to make it handle
> > > >     multiple dummy events. Sadly an evlist isn't empty if it just
> > > >     contains dummy events as the dummy event may be used with "perf
> > > >     record -e dummy .." as a way to determine whether permission
> > > >     issues exist. Other software events like cpu-clock would suffice
> > > >     for this, but the using dummy genie has left the bottle.
> > > >
> > > >     Another problem is that we appear to have an excessive number of
> > > >     dummy events added, for example, we can likely avoid a dummy event
> > > >     and add sideband data to the original event. For auxtrace more
> > > >     dummy events may be opened too. Anyway, this has led to the
> > > >     approach taken in patch 3 where the number of dummy parsed events
> > > >     is computed. If the number of removed/failing-to-open non-dummy
> > > >     events matches the number of non-dummy events then we want to
> > > >     fail, but only if there are no parsed dummy events or if there was
> > > >     one then it must have opened. The math here is hard to read, but
> > > >     passes my manual testing.
> > > >
> > > > v3: Make no events opening for perf record a failure as suggested by
> > > >     James Clark and Aditya Bodkhe <Aditya.Bodkhe1@ibm.com>. Also,
> > > >     rebase.
> > > >
> > > > v2: Rebase and add tested-by tags from James Clark, Leo Yan and Atish
> > > >     Patra who have tested on RISC-V and ARM CPUs, including the
> > > >     problem case from before.
> > >
> > > Ping. Thanks,
> > > Ian
> > >
> > > > Ian Rogers (4):
> > > >   perf record: Skip don't fail for events that don't open
> > > >   perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
> > > >     legacy"
> > > >   perf parse-events: Allow software events to be terms
> > > >   perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
> >
> > Sorry for the delay.  But I think we wanted to move to this instead:
> >
> > https://lore.kernel.org/linux-perf-users/20250312211623.2495798-1-irogers@google.com/
> 
> Hi Namhyung,
> 
> The preference for sysfs/json over legacy was done as a bug fix and
> because ARM (Mark Rutland) argued strongly that it was the most
> sensible priority. Intel (Kan Liang) approved the change in priority.
> RISC-V have wanted this behavior as it enables the migration of event
> mappings from the driver to the tool. As the primary maintainer of the
> event parsing and metric code I prefer the priority as legacy events
> are weird, for example they aren't case insensitive in their naming.
> For example, on Intel with legacy events as the priority cpu-cycles
> would be a legacy event, but cpu-Cyles a sysfs one. On ARM cpu_cycles
> would be a sysfs event, but cpu-cycles a legacy one. A minor character
> difference and very different and imo surprising event encodings.

Yeah, but it has worked like that for a long time.

> 
> On your RFC thread Arnaldo and James said that legacy events somehow
> enabled a form of drill down. As event parsing is mapping a name to a
> perf_event_attr I completely don't see this as the mapping is opaque.

Is it opaque?  (I'd say it standard event rather than legacy event.)  I
think the mapping for the standard events are clearly defined.

> 
> I strongly believe we need consistency. If `perf stat -e cycles .. `
> prints cpu_core/cycles/ as the event name on a hybrid Intel, then
> `perf stat -e cpu_core/cycles/ .. ` should have the same
> perf_event_attr. Both patch series achieve this but this one does it
> with consistency, and from what I see it, the support of 3 vendors.

Right, it's not consistent.  Maybe we need a different uniq event name
for extended (standard) events.  How about "cycles(cpu_core)"?  I guess
we don't want to add a space between the PMU and event names to avoid
potential user impact when they parse the output.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2025-06-03 22:50       ` Namhyung Kim
@ 2025-06-03 23:36         ` Ian Rogers
  2025-06-03 23:59           ` Namhyung Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Ian Rogers @ 2025-06-03 23:36 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	Aditya Bodkhe, Leo Yan, Thomas Falcon, Atish Patra

On Tue, Jun 3, 2025 at 3:50 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Mon, Jun 02, 2025 at 11:08:34PM -0700, Ian Rogers wrote:
> > On Mon, Jun 2, 2025 at 9:23 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > Hi Ian,
> > >
> > > On Tue, May 27, 2025 at 01:50:32PM -0700, Ian Rogers wrote:
> > > > On Tue, Apr 15, 2025 at 9:51 PM Ian Rogers <irogers@google.com> wrote:
> > > > >
> > > > > At the RISC-V summit the topic of avoiding event data being in the
> > > > > RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> > > > > events being the priority when no PMU is provided so that legacy
> > > > > events maybe supported via json. Originally Mark Rutland also
> > > > > expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> > > > > M? processors, but James Clark more recently tested this and believes
> > > > > the driver issues there may not have existed or have been resolved. In
> > > > > any case, it is inconsistent that with a PMU event names avoid legacy
> > > > > encodings, but when wildcarding PMUs (ie without a PMU with the event
> > > > > name) the legacy encodings have priority.
> > > > >
> > > > > The situation is further inconsistent as legacy events are case
> > > > > sensitive, so on Intel that provides a sysfs instructions event, the
> > > > > instructions event without a PMU and lowercase is legacy while with
> > > > > uppercase letters it matches with sysfs which is case insensitive. Are
> > > > > there legacy events with upper case letters? Yes there are, the cache
> > > > > ones mix case freely:
> > > > >
> > > > > L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node
> > > > >
> > > > > meaning LLC that means L2 (which is wrong) both match as part of a
> > > > > legacy cache name but llc and l2 would only match sysfs/json
> > > > > events. The whole thing just points at the ridiculous nature of legacy
> > > > > events and why we'd want them to be preffered I don't know. Why should
> > > > > case of a letter or having a PMU prefix impact the encoding in the
> > > > > perf_event_attr?
> > > > >
> > > > > The patch doing this work was reverted in a v6.10 release candidate
> > > > > as, even though the patch was posted for weeks and had been on
> > > > > linux-next for weeks without issue, Linus was in the habit of using
> > > > > explicit legacy events with unsupported precision options on his
> > > > > Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> > > > > where ARM decided to call the events bus_cycles and cycles, the latter
> > > > > being also a legacy event name. ARM haven't renamed the cycles event
> > > > > to a more consistent cpu_cycles and avoided the problem. With these
> > > > > changes the problematic event will now be skipped, a large warning
> > > > > produced, and perf record will continue for the other PMU events. This
> > > > > solution was proposed by Arnaldo.
> > > > >
> > > > > v8: Change removing of failed to open events that are tracking so that
> > > > >     the tracking moves to the next event. Make software events able to
> > > > >     specified with a PMU. Change the perf_api_probe to not load all
> > > > >     PMUs through scanning, specify a PMU when parsing events.
> > > > >
> > > > > v7: Expand cover letter, fix a missed core_ok check in the v6
> > > > >     rebase. Note, as with v6 there is an alternate series that
> > > > >     prioritizes legacy events but that is silly and I'd prefer we
> > > > >     didn't do it.
> > > > >
> > > > > v6: Rebase of v5 (dropping already merged patches):
> > > > >     https://lore.kernel.org/lkml/20250109222109.567031-1-irogers@google.com/
> > > > >     that unusually had an RFC posted for it:
> > > > >     https://lore.kernel.org/lkml/Z7Z5kv75BMML2A1q@google.com/
> > > > >     Note, this patch conflicts/contradicts:
> > > > >     https://lore.kernel.org/lkml/20250312211623.2495798-1-irogers@google.com/
> > > > >     that I posted so that we could either consistently prioritize
> > > > >     sysfs/json (these patches) or legacy events (the other
> > > > >     patches). That lack of event printing and encoding inconsistency
> > > > >     is most prominent in the encoding of events like "instructions"
> > > > >     which on hybrid are reported as "cpu_core/instructions/" but
> > > > >     "instructions" before these patches gets a legacy encoding while
> > > > >     "cpu_core/instructions/" gets a sysfs/json encoding. These patches
> > > > >     make "instructions" always get a sysfs/json encoding while the
> > > > >     alternate patches make it always get a legacy encoding.
> > > > >
> > > > > v5: Follow Namhyung's suggestion and ignore the case where command
> > > > >     line dummy events fail to open alongside other events that all
> > > > >     fail to open. Note, the Tested-by tags are left on the series as
> > > > >     v4 and v5 were changing an error case that doesn't occur in
> > > > >     testing but was manually tested by myself.
> > > > >
> > > > > v4: Rework the no events opening change from v3 to make it handle
> > > > >     multiple dummy events. Sadly an evlist isn't empty if it just
> > > > >     contains dummy events as the dummy event may be used with "perf
> > > > >     record -e dummy .." as a way to determine whether permission
> > > > >     issues exist. Other software events like cpu-clock would suffice
> > > > >     for this, but the using dummy genie has left the bottle.
> > > > >
> > > > >     Another problem is that we appear to have an excessive number of
> > > > >     dummy events added, for example, we can likely avoid a dummy event
> > > > >     and add sideband data to the original event. For auxtrace more
> > > > >     dummy events may be opened too. Anyway, this has led to the
> > > > >     approach taken in patch 3 where the number of dummy parsed events
> > > > >     is computed. If the number of removed/failing-to-open non-dummy
> > > > >     events matches the number of non-dummy events then we want to
> > > > >     fail, but only if there are no parsed dummy events or if there was
> > > > >     one then it must have opened. The math here is hard to read, but
> > > > >     passes my manual testing.
> > > > >
> > > > > v3: Make no events opening for perf record a failure as suggested by
> > > > >     James Clark and Aditya Bodkhe <Aditya.Bodkhe1@ibm.com>. Also,
> > > > >     rebase.
> > > > >
> > > > > v2: Rebase and add tested-by tags from James Clark, Leo Yan and Atish
> > > > >     Patra who have tested on RISC-V and ARM CPUs, including the
> > > > >     problem case from before.
> > > >
> > > > Ping. Thanks,
> > > > Ian
> > > >
> > > > > Ian Rogers (4):
> > > > >   perf record: Skip don't fail for events that don't open
> > > > >   perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
> > > > >     legacy"
> > > > >   perf parse-events: Allow software events to be terms
> > > > >   perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
> > >
> > > Sorry for the delay.  But I think we wanted to move to this instead:
> > >
> > > https://lore.kernel.org/linux-perf-users/20250312211623.2495798-1-irogers@google.com/
> >
> > Hi Namhyung,
> >
> > The preference for sysfs/json over legacy was done as a bug fix and
> > because ARM (Mark Rutland) argued strongly that it was the most
> > sensible priority. Intel (Kan Liang) approved the change in priority.
> > RISC-V have wanted this behavior as it enables the migration of event
> > mappings from the driver to the tool. As the primary maintainer of the
> > event parsing and metric code I prefer the priority as legacy events
> > are weird, for example they aren't case insensitive in their naming.
> > For example, on Intel with legacy events as the priority cpu-cycles
> > would be a legacy event, but cpu-Cyles a sysfs one. On ARM cpu_cycles
> > would be a sysfs event, but cpu-cycles a legacy one. A minor character
> > difference and very different and imo surprising event encodings.
>
> Yeah, but it has worked like that for a long time.
>
> >
> > On your RFC thread Arnaldo and James said that legacy events somehow
> > enabled a form of drill down. As event parsing is mapping a name to a
> > perf_event_attr I completely don't see this as the mapping is opaque.
>
> Is it opaque?  (I'd say it standard event rather than legacy event.)  I
> think the mapping for the standard events are clearly defined.

Which standard events? Going through them (abbreviated to avoid repetition):
 - PERF_COUNT_HW_CPU_CYCLES, ok.
 - PERF_COUNT_HW_INSTRUCTIONS, well does that include speculatively
executed instructions or not?
 - ...
 - PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, what does this count on an
in order CPU?
 - ...

The hardware cache events are far worse as things like LLC mean the L2
cache, however, it is far more typical for this to mean L3 these days.
Standard and clearly defined, sorry absolutely not. They are a
minefield of well intentioned event name components waiting to explode
when a vendor inadvertently creates a combination that happens to
match a combination perf thinks is significant.

There was a similar attempt for raw events where you can go r123 for
the hex 123 event config, it was missed that rEAD is a valid hex raw
event as well as a useful event name. The event parsing now has a lot
of special handling to avoid exploding on this - and yes the priority
is that sysfs/json has priority over the raw event encoding.

> >
> > I strongly believe we need consistency. If `perf stat -e cycles .. `
> > prints cpu_core/cycles/ as the event name on a hybrid Intel, then
> > `perf stat -e cpu_core/cycles/ .. ` should have the same
> > perf_event_attr. Both patch series achieve this but this one does it
> > with consistency, and from what I see it, the support of 3 vendors.
>
> Right, it's not consistent.  Maybe we need a different uniq event name
> for extended (standard) events.  How about "cycles(cpu_core)"?  I guess
> we don't want to add a space between the PMU and event names to avoid
> potential user impact when they parse the output.

We could and it would very likely break tooling. The intent is that
cpu-cycles matches cpu_core/cpu-cycles/ and cpu_atom/cpu-cycles/ and
they are expected to all be the same event. Currently with the PMU
they are sysfs encoded but without a PMU they are legacy encoded but
printed (uniquified) as if they were with a PMU and sysfs encoded.
This is misleading.

Thanks,
Ian

> Thanks,
> Namhyung
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2025-06-03 23:36         ` Ian Rogers
@ 2025-06-03 23:59           ` Namhyung Kim
  2025-06-04  0:26             ` Ian Rogers
  0 siblings, 1 reply; 14+ messages in thread
From: Namhyung Kim @ 2025-06-03 23:59 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	Aditya Bodkhe, Leo Yan, Thomas Falcon, Atish Patra

On Tue, Jun 03, 2025 at 04:36:34PM -0700, Ian Rogers wrote:
> On Tue, Jun 3, 2025 at 3:50 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Mon, Jun 02, 2025 at 11:08:34PM -0700, Ian Rogers wrote:
> > > On Mon, Jun 2, 2025 at 9:23 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > Hi Ian,
> > > >
> > > > On Tue, May 27, 2025 at 01:50:32PM -0700, Ian Rogers wrote:
> > > > > On Tue, Apr 15, 2025 at 9:51 PM Ian Rogers <irogers@google.com> wrote:
> > > > > >
> > > > > > At the RISC-V summit the topic of avoiding event data being in the
> > > > > > RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> > > > > > events being the priority when no PMU is provided so that legacy
> > > > > > events maybe supported via json. Originally Mark Rutland also
> > > > > > expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> > > > > > M? processors, but James Clark more recently tested this and believes
> > > > > > the driver issues there may not have existed or have been resolved. In
> > > > > > any case, it is inconsistent that with a PMU event names avoid legacy
> > > > > > encodings, but when wildcarding PMUs (ie without a PMU with the event
> > > > > > name) the legacy encodings have priority.
> > > > > >
> > > > > > The situation is further inconsistent as legacy events are case
> > > > > > sensitive, so on Intel that provides a sysfs instructions event, the
> > > > > > instructions event without a PMU and lowercase is legacy while with
> > > > > > uppercase letters it matches with sysfs which is case insensitive. Are
> > > > > > there legacy events with upper case letters? Yes there are, the cache
> > > > > > ones mix case freely:
> > > > > >
> > > > > > L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node
> > > > > >
> > > > > > meaning LLC that means L2 (which is wrong) both match as part of a
> > > > > > legacy cache name but llc and l2 would only match sysfs/json
> > > > > > events. The whole thing just points at the ridiculous nature of legacy
> > > > > > events and why we'd want them to be preffered I don't know. Why should
> > > > > > case of a letter or having a PMU prefix impact the encoding in the
> > > > > > perf_event_attr?
> > > > > >
> > > > > > The patch doing this work was reverted in a v6.10 release candidate
> > > > > > as, even though the patch was posted for weeks and had been on
> > > > > > linux-next for weeks without issue, Linus was in the habit of using
> > > > > > explicit legacy events with unsupported precision options on his
> > > > > > Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> > > > > > where ARM decided to call the events bus_cycles and cycles, the latter
> > > > > > being also a legacy event name. ARM haven't renamed the cycles event
> > > > > > to a more consistent cpu_cycles and avoided the problem. With these
> > > > > > changes the problematic event will now be skipped, a large warning
> > > > > > produced, and perf record will continue for the other PMU events. This
> > > > > > solution was proposed by Arnaldo.
> > > > > >
> > > > > > v8: Change removing of failed to open events that are tracking so that
> > > > > >     the tracking moves to the next event. Make software events able to
> > > > > >     specified with a PMU. Change the perf_api_probe to not load all
> > > > > >     PMUs through scanning, specify a PMU when parsing events.
> > > > > >
> > > > > > v7: Expand cover letter, fix a missed core_ok check in the v6
> > > > > >     rebase. Note, as with v6 there is an alternate series that
> > > > > >     prioritizes legacy events but that is silly and I'd prefer we
> > > > > >     didn't do it.
> > > > > >
> > > > > > v6: Rebase of v5 (dropping already merged patches):
> > > > > >     https://lore.kernel.org/lkml/20250109222109.567031-1-irogers@google.com/
> > > > > >     that unusually had an RFC posted for it:
> > > > > >     https://lore.kernel.org/lkml/Z7Z5kv75BMML2A1q@google.com/
> > > > > >     Note, this patch conflicts/contradicts:
> > > > > >     https://lore.kernel.org/lkml/20250312211623.2495798-1-irogers@google.com/
> > > > > >     that I posted so that we could either consistently prioritize
> > > > > >     sysfs/json (these patches) or legacy events (the other
> > > > > >     patches). That lack of event printing and encoding inconsistency
> > > > > >     is most prominent in the encoding of events like "instructions"
> > > > > >     which on hybrid are reported as "cpu_core/instructions/" but
> > > > > >     "instructions" before these patches gets a legacy encoding while
> > > > > >     "cpu_core/instructions/" gets a sysfs/json encoding. These patches
> > > > > >     make "instructions" always get a sysfs/json encoding while the
> > > > > >     alternate patches make it always get a legacy encoding.
> > > > > >
> > > > > > v5: Follow Namhyung's suggestion and ignore the case where command
> > > > > >     line dummy events fail to open alongside other events that all
> > > > > >     fail to open. Note, the Tested-by tags are left on the series as
> > > > > >     v4 and v5 were changing an error case that doesn't occur in
> > > > > >     testing but was manually tested by myself.
> > > > > >
> > > > > > v4: Rework the no events opening change from v3 to make it handle
> > > > > >     multiple dummy events. Sadly an evlist isn't empty if it just
> > > > > >     contains dummy events as the dummy event may be used with "perf
> > > > > >     record -e dummy .." as a way to determine whether permission
> > > > > >     issues exist. Other software events like cpu-clock would suffice
> > > > > >     for this, but the using dummy genie has left the bottle.
> > > > > >
> > > > > >     Another problem is that we appear to have an excessive number of
> > > > > >     dummy events added, for example, we can likely avoid a dummy event
> > > > > >     and add sideband data to the original event. For auxtrace more
> > > > > >     dummy events may be opened too. Anyway, this has led to the
> > > > > >     approach taken in patch 3 where the number of dummy parsed events
> > > > > >     is computed. If the number of removed/failing-to-open non-dummy
> > > > > >     events matches the number of non-dummy events then we want to
> > > > > >     fail, but only if there are no parsed dummy events or if there was
> > > > > >     one then it must have opened. The math here is hard to read, but
> > > > > >     passes my manual testing.
> > > > > >
> > > > > > v3: Make no events opening for perf record a failure as suggested by
> > > > > >     James Clark and Aditya Bodkhe <Aditya.Bodkhe1@ibm.com>. Also,
> > > > > >     rebase.
> > > > > >
> > > > > > v2: Rebase and add tested-by tags from James Clark, Leo Yan and Atish
> > > > > >     Patra who have tested on RISC-V and ARM CPUs, including the
> > > > > >     problem case from before.
> > > > >
> > > > > Ping. Thanks,
> > > > > Ian
> > > > >
> > > > > > Ian Rogers (4):
> > > > > >   perf record: Skip don't fail for events that don't open
> > > > > >   perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
> > > > > >     legacy"
> > > > > >   perf parse-events: Allow software events to be terms
> > > > > >   perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
> > > >
> > > > Sorry for the delay.  But I think we wanted to move to this instead:
> > > >
> > > > https://lore.kernel.org/linux-perf-users/20250312211623.2495798-1-irogers@google.com/
> > >
> > > Hi Namhyung,
> > >
> > > The preference for sysfs/json over legacy was done as a bug fix and
> > > because ARM (Mark Rutland) argued strongly that it was the most
> > > sensible priority. Intel (Kan Liang) approved the change in priority.
> > > RISC-V have wanted this behavior as it enables the migration of event
> > > mappings from the driver to the tool. As the primary maintainer of the
> > > event parsing and metric code I prefer the priority as legacy events
> > > are weird, for example they aren't case insensitive in their naming.
> > > For example, on Intel with legacy events as the priority cpu-cycles
> > > would be a legacy event, but cpu-Cyles a sysfs one. On ARM cpu_cycles
> > > would be a sysfs event, but cpu-cycles a legacy one. A minor character
> > > difference and very different and imo surprising event encodings.
> >
> > Yeah, but it has worked like that for a long time.
> >
> > >
> > > On your RFC thread Arnaldo and James said that legacy events somehow
> > > enabled a form of drill down. As event parsing is mapping a name to a
> > > perf_event_attr I completely don't see this as the mapping is opaque.
> >
> > Is it opaque?  (I'd say it standard event rather than legacy event.)  I
> > think the mapping for the standard events are clearly defined.
> 
> Which standard events? Going through them (abbreviated to avoid repetition):
>  - PERF_COUNT_HW_CPU_CYCLES, ok.
>  - PERF_COUNT_HW_INSTRUCTIONS, well does that include speculatively
> executed instructions or not?
>  - ...
>  - PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, what does this count on an
> in order CPU?
>  - ...

I mean the mapping from event name to event encoding (PERF_COUNT_HW_...).  
I think the internal event mapping is the driver's business.

> 
> The hardware cache events are far worse as things like LLC mean the L2
> cache, however, it is far more typical for this to mean L3 these days.
> Standard and clearly defined, sorry absolutely not. They are a
> minefield of well intentioned event name components waiting to explode
> when a vendor inadvertently creates a combination that happens to
> match a combination perf thinks is significant.

Again, it belongs to the driver.

> 
> There was a similar attempt for raw events where you can go r123 for
> the hex 123 event config, it was missed that rEAD is a valid hex raw
> event as well as a useful event name. The event parsing now has a lot
> of special handling to avoid exploding on this - and yes the priority
> is that sysfs/json has priority over the raw event encoding.

Agreed, the raw encoding can be a problem and it makes sense the sysfs/
JSON has the priority.

> 
> > >
> > > I strongly believe we need consistency. If `perf stat -e cycles .. `
> > > prints cpu_core/cycles/ as the event name on a hybrid Intel, then
> > > `perf stat -e cpu_core/cycles/ .. ` should have the same
> > > perf_event_attr. Both patch series achieve this but this one does it
> > > with consistency, and from what I see it, the support of 3 vendors.
> >
> > Right, it's not consistent.  Maybe we need a different uniq event name
> > for extended (standard) events.  How about "cycles(cpu_core)"?  I guess
> > we don't want to add a space between the PMU and event names to avoid
> > potential user impact when they parse the output.
> 
> We could and it would very likely break tooling. The intent is that
> cpu-cycles matches cpu_core/cpu-cycles/ and cpu_atom/cpu-cycles/ and
> they are expected to all be the same event. Currently with the PMU
> they are sysfs encoded but without a PMU they are legacy encoded but
> printed (uniquified) as if they were with a PMU and sysfs encoded.
> This is misleading.

Hmm.. I don't know what's the correct way to handle this.  Can we
change it not to use extended standard events and to convert to sysfs
events then?

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2025-06-03 23:59           ` Namhyung Kim
@ 2025-06-04  0:26             ` Ian Rogers
  2025-06-04 21:08               ` Namhyung Kim
  0 siblings, 1 reply; 14+ messages in thread
From: Ian Rogers @ 2025-06-04  0:26 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	Aditya Bodkhe, Leo Yan, Thomas Falcon, Atish Patra

On Tue, Jun 3, 2025 at 4:59 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Tue, Jun 03, 2025 at 04:36:34PM -0700, Ian Rogers wrote:
> > On Tue, Jun 3, 2025 at 3:50 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Mon, Jun 02, 2025 at 11:08:34PM -0700, Ian Rogers wrote:
> > > > On Mon, Jun 2, 2025 at 9:23 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > Hi Ian,
> > > > >
> > > > > On Tue, May 27, 2025 at 01:50:32PM -0700, Ian Rogers wrote:
> > > > > > On Tue, Apr 15, 2025 at 9:51 PM Ian Rogers <irogers@google.com> wrote:
> > > > > > >
> > > > > > > At the RISC-V summit the topic of avoiding event data being in the
> > > > > > > RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> > > > > > > events being the priority when no PMU is provided so that legacy
> > > > > > > events maybe supported via json. Originally Mark Rutland also
> > > > > > > expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> > > > > > > M? processors, but James Clark more recently tested this and believes
> > > > > > > the driver issues there may not have existed or have been resolved. In
> > > > > > > any case, it is inconsistent that with a PMU event names avoid legacy
> > > > > > > encodings, but when wildcarding PMUs (ie without a PMU with the event
> > > > > > > name) the legacy encodings have priority.
> > > > > > >
> > > > > > > The situation is further inconsistent as legacy events are case
> > > > > > > sensitive, so on Intel that provides a sysfs instructions event, the
> > > > > > > instructions event without a PMU and lowercase is legacy while with
> > > > > > > uppercase letters it matches with sysfs which is case insensitive. Are
> > > > > > > there legacy events with upper case letters? Yes there are, the cache
> > > > > > > ones mix case freely:
> > > > > > >
> > > > > > > L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node
> > > > > > >
> > > > > > > meaning LLC that means L2 (which is wrong) both match as part of a
> > > > > > > legacy cache name but llc and l2 would only match sysfs/json
> > > > > > > events. The whole thing just points at the ridiculous nature of legacy
> > > > > > > events and why we'd want them to be preffered I don't know. Why should
> > > > > > > case of a letter or having a PMU prefix impact the encoding in the
> > > > > > > perf_event_attr?
> > > > > > >
> > > > > > > The patch doing this work was reverted in a v6.10 release candidate
> > > > > > > as, even though the patch was posted for weeks and had been on
> > > > > > > linux-next for weeks without issue, Linus was in the habit of using
> > > > > > > explicit legacy events with unsupported precision options on his
> > > > > > > Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> > > > > > > where ARM decided to call the events bus_cycles and cycles, the latter
> > > > > > > being also a legacy event name. ARM haven't renamed the cycles event
> > > > > > > to a more consistent cpu_cycles and avoided the problem. With these
> > > > > > > changes the problematic event will now be skipped, a large warning
> > > > > > > produced, and perf record will continue for the other PMU events. This
> > > > > > > solution was proposed by Arnaldo.
> > > > > > >
> > > > > > > v8: Change removing of failed to open events that are tracking so that
> > > > > > >     the tracking moves to the next event. Make software events able to
> > > > > > >     specified with a PMU. Change the perf_api_probe to not load all
> > > > > > >     PMUs through scanning, specify a PMU when parsing events.
> > > > > > >
> > > > > > > v7: Expand cover letter, fix a missed core_ok check in the v6
> > > > > > >     rebase. Note, as with v6 there is an alternate series that
> > > > > > >     prioritizes legacy events but that is silly and I'd prefer we
> > > > > > >     didn't do it.
> > > > > > >
> > > > > > > v6: Rebase of v5 (dropping already merged patches):
> > > > > > >     https://lore.kernel.org/lkml/20250109222109.567031-1-irogers@google.com/
> > > > > > >     that unusually had an RFC posted for it:
> > > > > > >     https://lore.kernel.org/lkml/Z7Z5kv75BMML2A1q@google.com/
> > > > > > >     Note, this patch conflicts/contradicts:
> > > > > > >     https://lore.kernel.org/lkml/20250312211623.2495798-1-irogers@google.com/
> > > > > > >     that I posted so that we could either consistently prioritize
> > > > > > >     sysfs/json (these patches) or legacy events (the other
> > > > > > >     patches). That lack of event printing and encoding inconsistency
> > > > > > >     is most prominent in the encoding of events like "instructions"
> > > > > > >     which on hybrid are reported as "cpu_core/instructions/" but
> > > > > > >     "instructions" before these patches gets a legacy encoding while
> > > > > > >     "cpu_core/instructions/" gets a sysfs/json encoding. These patches
> > > > > > >     make "instructions" always get a sysfs/json encoding while the
> > > > > > >     alternate patches make it always get a legacy encoding.
> > > > > > >
> > > > > > > v5: Follow Namhyung's suggestion and ignore the case where command
> > > > > > >     line dummy events fail to open alongside other events that all
> > > > > > >     fail to open. Note, the Tested-by tags are left on the series as
> > > > > > >     v4 and v5 were changing an error case that doesn't occur in
> > > > > > >     testing but was manually tested by myself.
> > > > > > >
> > > > > > > v4: Rework the no events opening change from v3 to make it handle
> > > > > > >     multiple dummy events. Sadly an evlist isn't empty if it just
> > > > > > >     contains dummy events as the dummy event may be used with "perf
> > > > > > >     record -e dummy .." as a way to determine whether permission
> > > > > > >     issues exist. Other software events like cpu-clock would suffice
> > > > > > >     for this, but the using dummy genie has left the bottle.
> > > > > > >
> > > > > > >     Another problem is that we appear to have an excessive number of
> > > > > > >     dummy events added, for example, we can likely avoid a dummy event
> > > > > > >     and add sideband data to the original event. For auxtrace more
> > > > > > >     dummy events may be opened too. Anyway, this has led to the
> > > > > > >     approach taken in patch 3 where the number of dummy parsed events
> > > > > > >     is computed. If the number of removed/failing-to-open non-dummy
> > > > > > >     events matches the number of non-dummy events then we want to
> > > > > > >     fail, but only if there are no parsed dummy events or if there was
> > > > > > >     one then it must have opened. The math here is hard to read, but
> > > > > > >     passes my manual testing.
> > > > > > >
> > > > > > > v3: Make no events opening for perf record a failure as suggested by
> > > > > > >     James Clark and Aditya Bodkhe <Aditya.Bodkhe1@ibm.com>. Also,
> > > > > > >     rebase.
> > > > > > >
> > > > > > > v2: Rebase and add tested-by tags from James Clark, Leo Yan and Atish
> > > > > > >     Patra who have tested on RISC-V and ARM CPUs, including the
> > > > > > >     problem case from before.
> > > > > >
> > > > > > Ping. Thanks,
> > > > > > Ian
> > > > > >
> > > > > > > Ian Rogers (4):
> > > > > > >   perf record: Skip don't fail for events that don't open
> > > > > > >   perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
> > > > > > >     legacy"
> > > > > > >   perf parse-events: Allow software events to be terms
> > > > > > >   perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
> > > > >
> > > > > Sorry for the delay.  But I think we wanted to move to this instead:
> > > > >
> > > > > https://lore.kernel.org/linux-perf-users/20250312211623.2495798-1-irogers@google.com/
> > > >
> > > > Hi Namhyung,
> > > >
> > > > The preference for sysfs/json over legacy was done as a bug fix and
> > > > because ARM (Mark Rutland) argued strongly that it was the most
> > > > sensible priority. Intel (Kan Liang) approved the change in priority.
> > > > RISC-V have wanted this behavior as it enables the migration of event
> > > > mappings from the driver to the tool. As the primary maintainer of the
> > > > event parsing and metric code I prefer the priority as legacy events
> > > > are weird, for example they aren't case insensitive in their naming.
> > > > For example, on Intel with legacy events as the priority cpu-cycles
> > > > would be a legacy event, but cpu-Cyles a sysfs one. On ARM cpu_cycles
> > > > would be a sysfs event, but cpu-cycles a legacy one. A minor character
> > > > difference and very different and imo surprising event encodings.
> > >
> > > Yeah, but it has worked like that for a long time.
> > >
> > > >
> > > > On your RFC thread Arnaldo and James said that legacy events somehow
> > > > enabled a form of drill down. As event parsing is mapping a name to a
> > > > perf_event_attr I completely don't see this as the mapping is opaque.
> > >
> > > Is it opaque?  (I'd say it standard event rather than legacy event.)  I
> > > think the mapping for the standard events are clearly defined.
> >
> > Which standard events? Going through them (abbreviated to avoid repetition):
> >  - PERF_COUNT_HW_CPU_CYCLES, ok.
> >  - PERF_COUNT_HW_INSTRUCTIONS, well does that include speculatively
> > executed instructions or not?
> >  - ...
> >  - PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, what does this count on an
> > in order CPU?
> >  - ...
>
> I mean the mapping from event name to event encoding (PERF_COUNT_HW_...).
> I think the internal event mapping is the driver's business.
>
> >
> > The hardware cache events are far worse as things like LLC mean the L2
> > cache, however, it is far more typical for this to mean L3 these days.
> > Standard and clearly defined, sorry absolutely not. They are a
> > minefield of well intentioned event name components waiting to explode
> > when a vendor inadvertently creates a combination that happens to
> > match a combination perf thinks is significant.
>
> Again, it belongs to the driver.

But we know there are broken drivers. That is why this started. The
Apple-M core PMU had broken mappings and was reliant on being treated
as an uncore PMU so that legacy encodings weren't used. By forcing the
use of legacy encodings you'd break new perf tools on v6.6 kernels on
Apple-M IIRC.

When an event is encoded in json it can have a description. The
description can identify what the exact behaviors wrt speculation,
what LLC means, etc. are. The legacy event names are a minefield and
ambiguous.

Why are they a minefield? We import metrics from sources like the TMA
spreadsheet:
https://github.com/intel/perfmon/blob/main/TMA_Metrics-full.csv
The events named in here are intended for use across a range of tools
like vtune and emon. When we import event names in the perf build we
assume they are case insensitive and match them by lower-casing them:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/jevents.py?h=perf-tools-next#n326
Legacy cache events are made up of components:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.l?h=perf-tools-next#n258
```
lc_type (L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node)
lc_op_result (load|loads|read|store|stores|write|prefetch|prefetches|speculative-read|speculative-load|refs|Reference|ops|access|misses|miss)
```
If legacy events have the priority then an event that accidentally
combines these components can switch from using the event json that
comes with the metric to using a legacy event that has a different
behavior. It may also result in parse errors.

We are avoiding stepping on mines by luck currently and the approach
of using legacy first is essentially error prone. (It also breaks
Apple-M and is the opposite of what RISC-V have requested)

> >
> > There was a similar attempt for raw events where you can go r123 for
> > the hex 123 event config, it was missed that rEAD is a valid hex raw
> > event as well as a useful event name. The event parsing now has a lot
> > of special handling to avoid exploding on this - and yes the priority
> > is that sysfs/json has priority over the raw event encoding.
>
> Agreed, the raw encoding can be a problem and it makes sense the sysfs/
> JSON has the priority.
>
> >
> > > >
> > > > I strongly believe we need consistency. If `perf stat -e cycles .. `
> > > > prints cpu_core/cycles/ as the event name on a hybrid Intel, then
> > > > `perf stat -e cpu_core/cycles/ .. ` should have the same
> > > > perf_event_attr. Both patch series achieve this but this one does it
> > > > with consistency, and from what I see it, the support of 3 vendors.
> > >
> > > Right, it's not consistent.  Maybe we need a different uniq event name
> > > for extended (standard) events.  How about "cycles(cpu_core)"?  I guess
> > > we don't want to add a space between the PMU and event names to avoid
> > > potential user impact when they parse the output.
> >
> > We could and it would very likely break tooling. The intent is that
> > cpu-cycles matches cpu_core/cpu-cycles/ and cpu_atom/cpu-cycles/ and
> > they are expected to all be the same event. Currently with the PMU
> > they are sysfs encoded but without a PMU they are legacy encoded but
> > printed (uniquified) as if they were with a PMU and sysfs encoded.
> > This is misleading.
>
> Hmm.. I don't know what's the correct way to handle this.  Can we
> change it not to use extended standard events and to convert to sysfs
> events then?

Tbh, I think a lot of this is coming down to 1 event, cycles. It is
the only unambiguous legacy event. If we make cycles only mean core
PMU cycles (ie cpu_core, cpu_atom on hybrid, cpu on non-hybrid) then I
think this is matching what you are expecting to happen. I mean we
only match cycles as legacy first and everything else is sysfs/json
first and legacy last. I think other legacy events cpu-cycles aren't
special and don't get legacy first treatment. Having just 1 special
legacy only event cycles is a lot less offensive to me than having say
branch-prefetch-read also be legacy. We accept cycles is a minefield
and special, but it is unique in this.

Thanks,
Ian

> Thanks,
> Namhyung
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2025-06-04  0:26             ` Ian Rogers
@ 2025-06-04 21:08               ` Namhyung Kim
  2025-06-04 21:40                 ` Ian Rogers
  0 siblings, 1 reply; 14+ messages in thread
From: Namhyung Kim @ 2025-06-04 21:08 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	Aditya Bodkhe, Leo Yan, Thomas Falcon, Atish Patra

On Tue, Jun 03, 2025 at 05:26:37PM -0700, Ian Rogers wrote:
> On Tue, Jun 3, 2025 at 4:59 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Tue, Jun 03, 2025 at 04:36:34PM -0700, Ian Rogers wrote:
> > > On Tue, Jun 3, 2025 at 3:50 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > On Mon, Jun 02, 2025 at 11:08:34PM -0700, Ian Rogers wrote:
> > > > > On Mon, Jun 2, 2025 at 9:23 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > Hi Ian,
> > > > > >
> > > > > > On Tue, May 27, 2025 at 01:50:32PM -0700, Ian Rogers wrote:
> > > > > > > On Tue, Apr 15, 2025 at 9:51 PM Ian Rogers <irogers@google.com> wrote:
> > > > > > > >
> > > > > > > > At the RISC-V summit the topic of avoiding event data being in the
> > > > > > > > RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> > > > > > > > events being the priority when no PMU is provided so that legacy
> > > > > > > > events maybe supported via json. Originally Mark Rutland also
> > > > > > > > expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> > > > > > > > M? processors, but James Clark more recently tested this and believes
> > > > > > > > the driver issues there may not have existed or have been resolved. In
> > > > > > > > any case, it is inconsistent that with a PMU event names avoid legacy
> > > > > > > > encodings, but when wildcarding PMUs (ie without a PMU with the event
> > > > > > > > name) the legacy encodings have priority.
> > > > > > > >
> > > > > > > > The situation is further inconsistent as legacy events are case
> > > > > > > > sensitive, so on Intel that provides a sysfs instructions event, the
> > > > > > > > instructions event without a PMU and lowercase is legacy while with
> > > > > > > > uppercase letters it matches with sysfs which is case insensitive. Are
> > > > > > > > there legacy events with upper case letters? Yes there are, the cache
> > > > > > > > ones mix case freely:
> > > > > > > >
> > > > > > > > L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node
> > > > > > > >
> > > > > > > > meaning LLC that means L2 (which is wrong) both match as part of a
> > > > > > > > legacy cache name but llc and l2 would only match sysfs/json
> > > > > > > > events. The whole thing just points at the ridiculous nature of legacy
> > > > > > > > events and why we'd want them to be preffered I don't know. Why should
> > > > > > > > case of a letter or having a PMU prefix impact the encoding in the
> > > > > > > > perf_event_attr?
> > > > > > > >
> > > > > > > > The patch doing this work was reverted in a v6.10 release candidate
> > > > > > > > as, even though the patch was posted for weeks and had been on
> > > > > > > > linux-next for weeks without issue, Linus was in the habit of using
> > > > > > > > explicit legacy events with unsupported precision options on his
> > > > > > > > Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> > > > > > > > where ARM decided to call the events bus_cycles and cycles, the latter
> > > > > > > > being also a legacy event name. ARM haven't renamed the cycles event
> > > > > > > > to a more consistent cpu_cycles and avoided the problem. With these
> > > > > > > > changes the problematic event will now be skipped, a large warning
> > > > > > > > produced, and perf record will continue for the other PMU events. This
> > > > > > > > solution was proposed by Arnaldo.
> > > > > > > >
> > > > > > > > v8: Change removing of failed to open events that are tracking so that
> > > > > > > >     the tracking moves to the next event. Make software events able to
> > > > > > > >     specified with a PMU. Change the perf_api_probe to not load all
> > > > > > > >     PMUs through scanning, specify a PMU when parsing events.
> > > > > > > >
> > > > > > > > v7: Expand cover letter, fix a missed core_ok check in the v6
> > > > > > > >     rebase. Note, as with v6 there is an alternate series that
> > > > > > > >     prioritizes legacy events but that is silly and I'd prefer we
> > > > > > > >     didn't do it.
> > > > > > > >
> > > > > > > > v6: Rebase of v5 (dropping already merged patches):
> > > > > > > >     https://lore.kernel.org/lkml/20250109222109.567031-1-irogers@google.com/
> > > > > > > >     that unusually had an RFC posted for it:
> > > > > > > >     https://lore.kernel.org/lkml/Z7Z5kv75BMML2A1q@google.com/
> > > > > > > >     Note, this patch conflicts/contradicts:
> > > > > > > >     https://lore.kernel.org/lkml/20250312211623.2495798-1-irogers@google.com/
> > > > > > > >     that I posted so that we could either consistently prioritize
> > > > > > > >     sysfs/json (these patches) or legacy events (the other
> > > > > > > >     patches). That lack of event printing and encoding inconsistency
> > > > > > > >     is most prominent in the encoding of events like "instructions"
> > > > > > > >     which on hybrid are reported as "cpu_core/instructions/" but
> > > > > > > >     "instructions" before these patches gets a legacy encoding while
> > > > > > > >     "cpu_core/instructions/" gets a sysfs/json encoding. These patches
> > > > > > > >     make "instructions" always get a sysfs/json encoding while the
> > > > > > > >     alternate patches make it always get a legacy encoding.
> > > > > > > >
> > > > > > > > v5: Follow Namhyung's suggestion and ignore the case where command
> > > > > > > >     line dummy events fail to open alongside other events that all
> > > > > > > >     fail to open. Note, the Tested-by tags are left on the series as
> > > > > > > >     v4 and v5 were changing an error case that doesn't occur in
> > > > > > > >     testing but was manually tested by myself.
> > > > > > > >
> > > > > > > > v4: Rework the no events opening change from v3 to make it handle
> > > > > > > >     multiple dummy events. Sadly an evlist isn't empty if it just
> > > > > > > >     contains dummy events as the dummy event may be used with "perf
> > > > > > > >     record -e dummy .." as a way to determine whether permission
> > > > > > > >     issues exist. Other software events like cpu-clock would suffice
> > > > > > > >     for this, but the using dummy genie has left the bottle.
> > > > > > > >
> > > > > > > >     Another problem is that we appear to have an excessive number of
> > > > > > > >     dummy events added, for example, we can likely avoid a dummy event
> > > > > > > >     and add sideband data to the original event. For auxtrace more
> > > > > > > >     dummy events may be opened too. Anyway, this has led to the
> > > > > > > >     approach taken in patch 3 where the number of dummy parsed events
> > > > > > > >     is computed. If the number of removed/failing-to-open non-dummy
> > > > > > > >     events matches the number of non-dummy events then we want to
> > > > > > > >     fail, but only if there are no parsed dummy events or if there was
> > > > > > > >     one then it must have opened. The math here is hard to read, but
> > > > > > > >     passes my manual testing.
> > > > > > > >
> > > > > > > > v3: Make no events opening for perf record a failure as suggested by
> > > > > > > >     James Clark and Aditya Bodkhe <Aditya.Bodkhe1@ibm.com>. Also,
> > > > > > > >     rebase.
> > > > > > > >
> > > > > > > > v2: Rebase and add tested-by tags from James Clark, Leo Yan and Atish
> > > > > > > >     Patra who have tested on RISC-V and ARM CPUs, including the
> > > > > > > >     problem case from before.
> > > > > > >
> > > > > > > Ping. Thanks,
> > > > > > > Ian
> > > > > > >
> > > > > > > > Ian Rogers (4):
> > > > > > > >   perf record: Skip don't fail for events that don't open
> > > > > > > >   perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
> > > > > > > >     legacy"
> > > > > > > >   perf parse-events: Allow software events to be terms
> > > > > > > >   perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
> > > > > >
> > > > > > Sorry for the delay.  But I think we wanted to move to this instead:
> > > > > >
> > > > > > https://lore.kernel.org/linux-perf-users/20250312211623.2495798-1-irogers@google.com/
> > > > >
> > > > > Hi Namhyung,
> > > > >
> > > > > The preference for sysfs/json over legacy was done as a bug fix and
> > > > > because ARM (Mark Rutland) argued strongly that it was the most
> > > > > sensible priority. Intel (Kan Liang) approved the change in priority.
> > > > > RISC-V have wanted this behavior as it enables the migration of event
> > > > > mappings from the driver to the tool. As the primary maintainer of the
> > > > > event parsing and metric code I prefer the priority as legacy events
> > > > > are weird, for example they aren't case insensitive in their naming.
> > > > > For example, on Intel with legacy events as the priority cpu-cycles
> > > > > would be a legacy event, but cpu-Cyles a sysfs one. On ARM cpu_cycles
> > > > > would be a sysfs event, but cpu-cycles a legacy one. A minor character
> > > > > difference and very different and imo surprising event encodings.
> > > >
> > > > Yeah, but it has worked like that for a long time.
> > > >
> > > > >
> > > > > On your RFC thread Arnaldo and James said that legacy events somehow
> > > > > enabled a form of drill down. As event parsing is mapping a name to a
> > > > > perf_event_attr I completely don't see this as the mapping is opaque.
> > > >
> > > > Is it opaque?  (I'd say it standard event rather than legacy event.)  I
> > > > think the mapping for the standard events are clearly defined.
> > >
> > > Which standard events? Going through them (abbreviated to avoid repetition):
> > >  - PERF_COUNT_HW_CPU_CYCLES, ok.
> > >  - PERF_COUNT_HW_INSTRUCTIONS, well does that include speculatively
> > > executed instructions or not?
> > >  - ...
> > >  - PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, what does this count on an
> > > in order CPU?
> > >  - ...
> >
> > I mean the mapping from event name to event encoding (PERF_COUNT_HW_...).
> > I think the internal event mapping is the driver's business.
> >
> > >
> > > The hardware cache events are far worse as things like LLC mean the L2
> > > cache, however, it is far more typical for this to mean L3 these days.
> > > Standard and clearly defined, sorry absolutely not. They are a
> > > minefield of well intentioned event name components waiting to explode
> > > when a vendor inadvertently creates a combination that happens to
> > > match a combination perf thinks is significant.
> >
> > Again, it belongs to the driver.
> 
> But we know there are broken drivers. That is why this started. The
> Apple-M core PMU had broken mappings and was reliant on being treated
> as an uncore PMU so that legacy encodings weren't used. By forcing the
> use of legacy encodings you'd break new perf tools on v6.6 kernels on
> Apple-M IIRC.

Didn't they use sysfs events (i.e. '-e PMU/EVENT/' form) always?

> 
> When an event is encoded in json it can have a description. The
> description can identify what the exact behaviors wrt speculation,
> what LLC means, etc. are. The legacy event names are a minefield and
> ambiguous.

Right, sometimes it's not 100% clear and descriptions would be nice.

> 
> Why are they a minefield? We import metrics from sources like the TMA
> spreadsheet:
> https://github.com/intel/perfmon/blob/main/TMA_Metrics-full.csv
> The events named in here are intended for use across a range of tools
> like vtune and emon. When we import event names in the perf build we
> assume they are case insensitive and match them by lower-casing them:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/jevents.py?h=perf-tools-next#n326
> Legacy cache events are made up of components:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.l?h=perf-tools-next#n258
> ```
> lc_type (L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node)
> lc_op_result (load|loads|read|store|stores|write|prefetch|prefetches|speculative-read|speculative-load|refs|Reference|ops|access|misses|miss)
> ```

Oh, looks like it's too permissive.  Do we have speculative-load in the
standard cache events?


> If legacy events have the priority then an event that accidentally
> combines these components can switch from using the event json that
> comes with the metric to using a legacy event that has a different
> behavior. It may also result in parse errors.
> 
> We are avoiding stepping on mines by luck currently and the approach
> of using legacy first is essentially error prone. (It also breaks
> Apple-M and is the opposite of what RISC-V have requested)

Hmm.. ok.  I feel like cache events should have lower priority at least.
If it's all about metrics, maybe we can add something like a new parse
option like parse_evens_option_args.prefer_json to have different
priority settings and use it for metrics?

> 
> > >
> > > There was a similar attempt for raw events where you can go r123 for
> > > the hex 123 event config, it was missed that rEAD is a valid hex raw
> > > event as well as a useful event name. The event parsing now has a lot
> > > of special handling to avoid exploding on this - and yes the priority
> > > is that sysfs/json has priority over the raw event encoding.
> >
> > Agreed, the raw encoding can be a problem and it makes sense the sysfs/
> > JSON has the priority.
> >
> > >
> > > > >
> > > > > I strongly believe we need consistency. If `perf stat -e cycles .. `
> > > > > prints cpu_core/cycles/ as the event name on a hybrid Intel, then
> > > > > `perf stat -e cpu_core/cycles/ .. ` should have the same
> > > > > perf_event_attr. Both patch series achieve this but this one does it
> > > > > with consistency, and from what I see it, the support of 3 vendors.
> > > >
> > > > Right, it's not consistent.  Maybe we need a different uniq event name
> > > > for extended (standard) events.  How about "cycles(cpu_core)"?  I guess
> > > > we don't want to add a space between the PMU and event names to avoid
> > > > potential user impact when they parse the output.
> > >
> > > We could and it would very likely break tooling. The intent is that
> > > cpu-cycles matches cpu_core/cpu-cycles/ and cpu_atom/cpu-cycles/ and
> > > they are expected to all be the same event. Currently with the PMU
> > > they are sysfs encoded but without a PMU they are legacy encoded but
> > > printed (uniquified) as if they were with a PMU and sysfs encoded.
> > > This is misleading.
> >
> > Hmm.. I don't know what's the correct way to handle this.  Can we
> > change it not to use extended standard events and to convert to sysfs
> > events then?
> 
> Tbh, I think a lot of this is coming down to 1 event, cycles. It is
> the only unambiguous legacy event. If we make cycles only mean core
> PMU cycles (ie cpu_core, cpu_atom on hybrid, cpu on non-hybrid) then I
> think this is matching what you are expecting to happen. I mean we
> only match cycles as legacy first and everything else is sysfs/json
> first and legacy last. I think other legacy events cpu-cycles aren't
> special and don't get legacy first treatment. Having just 1 special
> legacy only event cycles is a lot less offensive to me than having say
> branch-prefetch-read also be legacy. We accept cycles is a minefield
> and special, but it is unique in this.

Well.. I think cycles is an alias to cpu-cycles so they are the same.
So it's not about special casing 'cycles'.  Any PMU may add conflicting
event names to sysfs (which is discouraging though).

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2025-06-04 21:08               ` Namhyung Kim
@ 2025-06-04 21:40                 ` Ian Rogers
  0 siblings, 0 replies; 14+ messages in thread
From: Ian Rogers @ 2025-06-04 21:40 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Dominique Martinet,
	Jean-Philippe Romain, Junhao He, linux-perf-users, linux-kernel,
	Aditya Bodkhe, Leo Yan, Thomas Falcon, Atish Patra

On Wed, Jun 4, 2025 at 2:08 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Tue, Jun 03, 2025 at 05:26:37PM -0700, Ian Rogers wrote:
> > On Tue, Jun 3, 2025 at 4:59 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > On Tue, Jun 03, 2025 at 04:36:34PM -0700, Ian Rogers wrote:
> > > > On Tue, Jun 3, 2025 at 3:50 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > On Mon, Jun 02, 2025 at 11:08:34PM -0700, Ian Rogers wrote:
> > > > > > On Mon, Jun 2, 2025 at 9:23 PM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > > >
> > > > > > > Hi Ian,
> > > > > > >
> > > > > > > On Tue, May 27, 2025 at 01:50:32PM -0700, Ian Rogers wrote:
> > > > > > > > On Tue, Apr 15, 2025 at 9:51 PM Ian Rogers <irogers@google.com> wrote:
> > > > > > > > >
> > > > > > > > > At the RISC-V summit the topic of avoiding event data being in the
> > > > > > > > > RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> > > > > > > > > events being the priority when no PMU is provided so that legacy
> > > > > > > > > events maybe supported via json. Originally Mark Rutland also
> > > > > > > > > expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> > > > > > > > > M? processors, but James Clark more recently tested this and believes
> > > > > > > > > the driver issues there may not have existed or have been resolved. In
> > > > > > > > > any case, it is inconsistent that with a PMU event names avoid legacy
> > > > > > > > > encodings, but when wildcarding PMUs (ie without a PMU with the event
> > > > > > > > > name) the legacy encodings have priority.
> > > > > > > > >
> > > > > > > > > The situation is further inconsistent as legacy events are case
> > > > > > > > > sensitive, so on Intel that provides a sysfs instructions event, the
> > > > > > > > > instructions event without a PMU and lowercase is legacy while with
> > > > > > > > > uppercase letters it matches with sysfs which is case insensitive. Are
> > > > > > > > > there legacy events with upper case letters? Yes there are, the cache
> > > > > > > > > ones mix case freely:
> > > > > > > > >
> > > > > > > > > L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node
> > > > > > > > >
> > > > > > > > > meaning LLC that means L2 (which is wrong) both match as part of a
> > > > > > > > > legacy cache name but llc and l2 would only match sysfs/json
> > > > > > > > > events. The whole thing just points at the ridiculous nature of legacy
> > > > > > > > > events and why we'd want them to be preffered I don't know. Why should
> > > > > > > > > case of a letter or having a PMU prefix impact the encoding in the
> > > > > > > > > perf_event_attr?
> > > > > > > > >
> > > > > > > > > The patch doing this work was reverted in a v6.10 release candidate
> > > > > > > > > as, even though the patch was posted for weeks and had been on
> > > > > > > > > linux-next for weeks without issue, Linus was in the habit of using
> > > > > > > > > explicit legacy events with unsupported precision options on his
> > > > > > > > > Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> > > > > > > > > where ARM decided to call the events bus_cycles and cycles, the latter
> > > > > > > > > being also a legacy event name. ARM haven't renamed the cycles event
> > > > > > > > > to a more consistent cpu_cycles and avoided the problem. With these
> > > > > > > > > changes the problematic event will now be skipped, a large warning
> > > > > > > > > produced, and perf record will continue for the other PMU events. This
> > > > > > > > > solution was proposed by Arnaldo.
> > > > > > > > >
> > > > > > > > > v8: Change removing of failed to open events that are tracking so that
> > > > > > > > >     the tracking moves to the next event. Make software events able to
> > > > > > > > >     specified with a PMU. Change the perf_api_probe to not load all
> > > > > > > > >     PMUs through scanning, specify a PMU when parsing events.
> > > > > > > > >
> > > > > > > > > v7: Expand cover letter, fix a missed core_ok check in the v6
> > > > > > > > >     rebase. Note, as with v6 there is an alternate series that
> > > > > > > > >     prioritizes legacy events but that is silly and I'd prefer we
> > > > > > > > >     didn't do it.
> > > > > > > > >
> > > > > > > > > v6: Rebase of v5 (dropping already merged patches):
> > > > > > > > >     https://lore.kernel.org/lkml/20250109222109.567031-1-irogers@google.com/
> > > > > > > > >     that unusually had an RFC posted for it:
> > > > > > > > >     https://lore.kernel.org/lkml/Z7Z5kv75BMML2A1q@google.com/
> > > > > > > > >     Note, this patch conflicts/contradicts:
> > > > > > > > >     https://lore.kernel.org/lkml/20250312211623.2495798-1-irogers@google.com/
> > > > > > > > >     that I posted so that we could either consistently prioritize
> > > > > > > > >     sysfs/json (these patches) or legacy events (the other
> > > > > > > > >     patches). That lack of event printing and encoding inconsistency
> > > > > > > > >     is most prominent in the encoding of events like "instructions"
> > > > > > > > >     which on hybrid are reported as "cpu_core/instructions/" but
> > > > > > > > >     "instructions" before these patches gets a legacy encoding while
> > > > > > > > >     "cpu_core/instructions/" gets a sysfs/json encoding. These patches
> > > > > > > > >     make "instructions" always get a sysfs/json encoding while the
> > > > > > > > >     alternate patches make it always get a legacy encoding.
> > > > > > > > >
> > > > > > > > > v5: Follow Namhyung's suggestion and ignore the case where command
> > > > > > > > >     line dummy events fail to open alongside other events that all
> > > > > > > > >     fail to open. Note, the Tested-by tags are left on the series as
> > > > > > > > >     v4 and v5 were changing an error case that doesn't occur in
> > > > > > > > >     testing but was manually tested by myself.
> > > > > > > > >
> > > > > > > > > v4: Rework the no events opening change from v3 to make it handle
> > > > > > > > >     multiple dummy events. Sadly an evlist isn't empty if it just
> > > > > > > > >     contains dummy events as the dummy event may be used with "perf
> > > > > > > > >     record -e dummy .." as a way to determine whether permission
> > > > > > > > >     issues exist. Other software events like cpu-clock would suffice
> > > > > > > > >     for this, but the using dummy genie has left the bottle.
> > > > > > > > >
> > > > > > > > >     Another problem is that we appear to have an excessive number of
> > > > > > > > >     dummy events added, for example, we can likely avoid a dummy event
> > > > > > > > >     and add sideband data to the original event. For auxtrace more
> > > > > > > > >     dummy events may be opened too. Anyway, this has led to the
> > > > > > > > >     approach taken in patch 3 where the number of dummy parsed events
> > > > > > > > >     is computed. If the number of removed/failing-to-open non-dummy
> > > > > > > > >     events matches the number of non-dummy events then we want to
> > > > > > > > >     fail, but only if there are no parsed dummy events or if there was
> > > > > > > > >     one then it must have opened. The math here is hard to read, but
> > > > > > > > >     passes my manual testing.
> > > > > > > > >
> > > > > > > > > v3: Make no events opening for perf record a failure as suggested by
> > > > > > > > >     James Clark and Aditya Bodkhe <Aditya.Bodkhe1@ibm.com>. Also,
> > > > > > > > >     rebase.
> > > > > > > > >
> > > > > > > > > v2: Rebase and add tested-by tags from James Clark, Leo Yan and Atish
> > > > > > > > >     Patra who have tested on RISC-V and ARM CPUs, including the
> > > > > > > > >     problem case from before.
> > > > > > > >
> > > > > > > > Ping. Thanks,
> > > > > > > > Ian
> > > > > > > >
> > > > > > > > > Ian Rogers (4):
> > > > > > > > >   perf record: Skip don't fail for events that don't open
> > > > > > > > >   perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
> > > > > > > > >     legacy"
> > > > > > > > >   perf parse-events: Allow software events to be terms
> > > > > > > > >   perf perf_api_probe: Avoid scanning all PMUs, try software PMU first
> > > > > > >
> > > > > > > Sorry for the delay.  But I think we wanted to move to this instead:
> > > > > > >
> > > > > > > https://lore.kernel.org/linux-perf-users/20250312211623.2495798-1-irogers@google.com/
> > > > > >
> > > > > > Hi Namhyung,
> > > > > >
> > > > > > The preference for sysfs/json over legacy was done as a bug fix and
> > > > > > because ARM (Mark Rutland) argued strongly that it was the most
> > > > > > sensible priority. Intel (Kan Liang) approved the change in priority.
> > > > > > RISC-V have wanted this behavior as it enables the migration of event
> > > > > > mappings from the driver to the tool. As the primary maintainer of the
> > > > > > event parsing and metric code I prefer the priority as legacy events
> > > > > > are weird, for example they aren't case insensitive in their naming.
> > > > > > For example, on Intel with legacy events as the priority cpu-cycles
> > > > > > would be a legacy event, but cpu-Cyles a sysfs one. On ARM cpu_cycles
> > > > > > would be a sysfs event, but cpu-cycles a legacy one. A minor character
> > > > > > difference and very different and imo surprising event encodings.
> > > > >
> > > > > Yeah, but it has worked like that for a long time.
> > > > >
> > > > > >
> > > > > > On your RFC thread Arnaldo and James said that legacy events somehow
> > > > > > enabled a form of drill down. As event parsing is mapping a name to a
> > > > > > perf_event_attr I completely don't see this as the mapping is opaque.
> > > > >
> > > > > Is it opaque?  (I'd say it standard event rather than legacy event.)  I
> > > > > think the mapping for the standard events are clearly defined.
> > > >
> > > > Which standard events? Going through them (abbreviated to avoid repetition):
> > > >  - PERF_COUNT_HW_CPU_CYCLES, ok.
> > > >  - PERF_COUNT_HW_INSTRUCTIONS, well does that include speculatively
> > > > executed instructions or not?
> > > >  - ...
> > > >  - PERF_COUNT_HW_STALLED_CYCLES_FRONTEND, what does this count on an
> > > > in order CPU?
> > > >  - ...
> > >
> > > I mean the mapping from event name to event encoding (PERF_COUNT_HW_...).
> > > I think the internal event mapping is the driver's business.
> > >
> > > >
> > > > The hardware cache events are far worse as things like LLC mean the L2
> > > > cache, however, it is far more typical for this to mean L3 these days.
> > > > Standard and clearly defined, sorry absolutely not. They are a
> > > > minefield of well intentioned event name components waiting to explode
> > > > when a vendor inadvertently creates a combination that happens to
> > > > match a combination perf thinks is significant.
> > >
> > > Again, it belongs to the driver.
> >
> > But we know there are broken drivers. That is why this started. The
> > Apple-M core PMU had broken mappings and was reliant on being treated
> > as an uncore PMU so that legacy encodings weren't used. By forcing the
> > use of legacy encodings you'd break new perf tools on v6.6 kernels on
> > Apple-M IIRC.
>
> Didn't they use sysfs events (i.e. '-e PMU/EVENT/' form) always?
>
> >
> > When an event is encoded in json it can have a description. The
> > description can identify what the exact behaviors wrt speculation,
> > what LLC means, etc. are. The legacy event names are a minefield and
> > ambiguous.
>
> Right, sometimes it's not 100% clear and descriptions would be nice.
>
> >
> > Why are they a minefield? We import metrics from sources like the TMA
> > spreadsheet:
> > https://github.com/intel/perfmon/blob/main/TMA_Metrics-full.csv
> > The events named in here are intended for use across a range of tools
> > like vtune and emon. When we import event names in the perf build we
> > assume they are case insensitive and match them by lower-casing them:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/pmu-events/jevents.py?h=perf-tools-next#n326
> > Legacy cache events are made up of components:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/parse-events.l?h=perf-tools-next#n258
> > ```
> > lc_type (L1-dcache|l1-d|l1d|L1-data|L1-icache|l1-i|l1i|L1-instruction|LLC|L2|dTLB|d-tlb|Data-TLB|iTLB|i-tlb|Instruction-TLB|branch|branches|bpu|btb|bpc|node)
> > lc_op_result (load|loads|read|store|stores|write|prefetch|prefetches|speculative-read|speculative-load|refs|Reference|ops|access|misses|miss)
> > ```
>
> Oh, looks like it's too permissive.  Do we have speculative-load in the
> standard cache events?
>
>
> > If legacy events have the priority then an event that accidentally
> > combines these components can switch from using the event json that
> > comes with the metric to using a legacy event that has a different
> > behavior. It may also result in parse errors.
> >
> > We are avoiding stepping on mines by luck currently and the approach
> > of using legacy first is essentially error prone. (It also breaks
> > Apple-M and is the opposite of what RISC-V have requested)
>
> Hmm.. ok.  I feel like cache events should have lower priority at least.
> If it's all about metrics, maybe we can add something like a new parse
> option like parse_evens_option_args.prefer_json to have different
> priority settings and use it for metrics?

There are only 10 legacy non-cache events, but after cycles the
meaning is ambiguous and likely often just not implemented. Saying the
meaning is known because of the driver, I'm not aware of any driver
giving descriptions.

For a flag, the issue is that we prefer json when there is a PMU but
not when it is wildcarded. So when is the flag used or not?

My eventual hope is that we can standardize on sysfs encoding. So we
translate json to sysfs, then compress the files/directories and
access them built into the perf tool this way. Then, as with say
addr2line or objdump, we have a priority list like:
"sysfs(/sys),sysfs(inbuilt),legacy". Then we could allow
"sysfs(/home/ian/myevents),sysfs(/sys),sysfs(inbuilt),legacy" so users
can manually add events and metrics. A priority list for the behavior
today would be something like
"legacy(no-pmu),sysfs(/sys),sysfs(inbuilt),legacy(with-pmu)". If we
have hardware cache events at a low priority may it is
"legacy(no-pmu+type_hardware),sysfs(/sys),sysfs(inbuilt),legacy(with-pmu+type_hw_cache)"
what this series is trying to achieve is just
"sysfs(/sys),sysfs(inbuilt),legacy" and the alternate series is
"legacy,sysfs(/sys),sysfs(inbuilt)".

On x86 events are 70% of the binary size but just compressing the json
shows that this could be halved.

Making the lookup use a list and user supplied directories may allow
us to write tests that create a hybrid looking sysfs to test on
non-hybrid, etc.

> >
> > > >
> > > > There was a similar attempt for raw events where you can go r123 for
> > > > the hex 123 event config, it was missed that rEAD is a valid hex raw
> > > > event as well as a useful event name. The event parsing now has a lot
> > > > of special handling to avoid exploding on this - and yes the priority
> > > > is that sysfs/json has priority over the raw event encoding.
> > >
> > > Agreed, the raw encoding can be a problem and it makes sense the sysfs/
> > > JSON has the priority.
> > >
> > > >
> > > > > >
> > > > > > I strongly believe we need consistency. If `perf stat -e cycles .. `
> > > > > > prints cpu_core/cycles/ as the event name on a hybrid Intel, then
> > > > > > `perf stat -e cpu_core/cycles/ .. ` should have the same
> > > > > > perf_event_attr. Both patch series achieve this but this one does it
> > > > > > with consistency, and from what I see it, the support of 3 vendors.
> > > > >
> > > > > Right, it's not consistent.  Maybe we need a different uniq event name
> > > > > for extended (standard) events.  How about "cycles(cpu_core)"?  I guess
> > > > > we don't want to add a space between the PMU and event names to avoid
> > > > > potential user impact when they parse the output.
> > > >
> > > > We could and it would very likely break tooling. The intent is that
> > > > cpu-cycles matches cpu_core/cpu-cycles/ and cpu_atom/cpu-cycles/ and
> > > > they are expected to all be the same event. Currently with the PMU
> > > > they are sysfs encoded but without a PMU they are legacy encoded but
> > > > printed (uniquified) as if they were with a PMU and sysfs encoded.
> > > > This is misleading.
> > >
> > > Hmm.. I don't know what's the correct way to handle this.  Can we
> > > change it not to use extended standard events and to convert to sysfs
> > > events then?
> >
> > Tbh, I think a lot of this is coming down to 1 event, cycles. It is
> > the only unambiguous legacy event. If we make cycles only mean core
> > PMU cycles (ie cpu_core, cpu_atom on hybrid, cpu on non-hybrid) then I
> > think this is matching what you are expecting to happen. I mean we
> > only match cycles as legacy first and everything else is sysfs/json
> > first and legacy last. I think other legacy events cpu-cycles aren't
> > special and don't get legacy first treatment. Having just 1 special
> > legacy only event cycles is a lot less offensive to me than having say
> > branch-prefetch-read also be legacy. We accept cycles is a minefield
> > and special, but it is unique in this.
>
> Well.. I think cycles is an alias to cpu-cycles so they are the same.
> So it's not about special casing 'cycles'.  Any PMU may add conflicting
> event names to sysfs (which is discouraging though).

So it has been standard for Intel to have a cpu-cycles sysfs event:
```
$ cat /sys/bus/event_source/devices/cpu/events/cpu-cycles
event=0x3c
```
I'm only aware of ARM using cycles as an event name and they also use
cpu_cycles, I wish they'd pick a lane (preferably not the one that
conflicts with a legacy event name).
I'm not particularly keen on having cycles not be wildcard (other than
for core PMUs) and meaning the legacy encoding. The issue is that the
legacy encoding will break broken PMUs like on older Apple-M linux,
and what to do in the RISC-V case?
I can understand not wanting to change what users see and I think that
may be the only expedient way to do it. I don't think anyone cares
about the cpu-cycles event name and on Intel it seems they'd prefer
the sysfs encoding, hence advertising the event.

Thanks,
Ian

> Thanks,
> Namhyung
>

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-06-04 21:40 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-16  4:51 [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
2025-04-16  4:51 ` [PATCH v8 1/4] perf record: Skip don't fail for events that don't open Ian Rogers
2025-04-16  4:51 ` [PATCH v8 2/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy" Ian Rogers
2025-04-16  4:51 ` [PATCH v8 3/4] perf parse-events: Allow software events to be terms Ian Rogers
2025-04-16  4:51 ` [PATCH v8 4/4] perf perf_api_probe: Avoid scanning all PMUs, try software PMU first Ian Rogers
2025-05-27 20:50 ` [PATCH v8 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
2025-06-03  4:23   ` Namhyung Kim
2025-06-03  6:08     ` Ian Rogers
2025-06-03 22:50       ` Namhyung Kim
2025-06-03 23:36         ` Ian Rogers
2025-06-03 23:59           ` Namhyung Kim
2025-06-04  0:26             ` Ian Rogers
2025-06-04 21:08               ` Namhyung Kim
2025-06-04 21:40                 ` Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).