linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided
@ 2024-10-26 12:17 Ian Rogers
  2024-10-26 12:17 ` [PATCH v1 1/4] perf evsel: Add pmu_name helper Ian Rogers
                   ` (5 more replies)
  0 siblings, 6 replies; 24+ messages in thread
From: Ian Rogers @ 2024-10-26 12:17 UTC (permalink / raw)
  To: Atish Patra, linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Ben Gainey,
	Dominique Martinet, Junhao He, linux-perf-users, linux-kernel

At the RISC-V summit the topic of avoiding event data being in the
RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
events being the priority when no PMU is provided so that legacy
events maybe supported via json. Originally Mark Rutland also
expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
M? processors, but James Clark more recently tested this and believes
the driver issues there may not have existed or have been resolved. In
any case, it is inconsistent that with a PMU event names avoid legacy
encodings, but when wildcarding PMUs (ie without a PMU with the event
name) the legacy encodings have priority.

The patch doing this work was reverted in a v6.10 release candidate
as, even though the patch was posted for weeks and had been on
linux-next for weeks without issue, Linus was in the habit of using
explicit legacy events with unsupported precision options on his
Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
where ARM decided to call the events bus_cycles and cycles, the latter
being also a legacy event name. ARM haven't renamed the cycles event
to a more consistent cpu_cycles and avoided the problem. With these
changes the problematic event will now be skipped, a large warning
produced, and perf record will continue for the other PMU events. This
solution was proposed by Arnaldo.

Two minor changes have been added to help with the error message and
to work around issues occurring with "perf stat metrics (shadow stat)
test".

The patches have only been tested on my x86 non-hybrid laptop.

Ian Rogers (4):
  perf evsel: Add pmu_name helper
  perf stat: Fix find_stat for mixed legacy/non-legacy events
  perf record: Skip don't fail for events that don't open
  perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
    legacy"

 tools/perf/builtin-record.c    | 22 +++++++---
 tools/perf/util/evsel.c        | 10 +++++
 tools/perf/util/evsel.h        |  1 +
 tools/perf/util/parse-events.c | 26 +++++++++---
 tools/perf/util/parse-events.l | 76 +++++++++++++++++-----------------
 tools/perf/util/parse-events.y | 60 ++++++++++++++++++---------
 tools/perf/util/pmus.c         | 20 +++++++--
 tools/perf/util/stat-shadow.c  |  3 +-
 8 files changed, 145 insertions(+), 73 deletions(-)

-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH v1 1/4] perf evsel: Add pmu_name helper
  2024-10-26 12:17 [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
@ 2024-10-26 12:17 ` Ian Rogers
  2024-10-26 12:17 ` [PATCH v1 2/4] perf stat: Fix find_stat for mixed legacy/non-legacy events Ian Rogers
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Ian Rogers @ 2024-10-26 12:17 UTC (permalink / raw)
  To: Atish Patra, linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Ben Gainey,
	Dominique Martinet, Junhao He, linux-perf-users, linux-kernel

Add helper to get the name of the evsel's PMU. This handles the case
where there's no sysfs PMU via parse_events event_type helper.

Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/evsel.c | 10 ++++++++++
 tools/perf/util/evsel.h |  1 +
 2 files changed, 11 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f780e30aa259..87ed417df133 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -236,6 +236,16 @@ int evsel__object_config(size_t object_size, int (*init)(struct evsel *evsel),
 	return 0;
 }
 
+const char *evsel__pmu_name(const struct evsel *evsel)
+{
+	struct perf_pmu *pmu = evsel__find_pmu(evsel);
+
+	if (pmu)
+		return pmu->name;
+
+	return event_type(evsel->core.attr.type);
+}
+
 #define FD(e, x, y) (*(int *)xyarray__entry(e->core.fd, x, y))
 
 int __evsel__sample_size(u64 sample_type)
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 04934a7af174..5774a9d4d725 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -232,6 +232,7 @@ int evsel__object_config(size_t object_size,
 			 void (*fini)(struct evsel *evsel));
 
 struct perf_pmu *evsel__find_pmu(const struct evsel *evsel);
+const char *evsel__pmu_name(const struct evsel *evsel);
 bool evsel__is_aux_event(const struct evsel *evsel);
 
 struct evsel *evsel__new_idx(struct perf_event_attr *attr, int idx);
-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v1 2/4] perf stat: Fix find_stat for mixed legacy/non-legacy events
  2024-10-26 12:17 [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
  2024-10-26 12:17 ` [PATCH v1 1/4] perf evsel: Add pmu_name helper Ian Rogers
@ 2024-10-26 12:17 ` Ian Rogers
  2024-10-26 12:17 ` [PATCH v1 3/4] perf record: Skip don't fail for events that don't open Ian Rogers
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 24+ messages in thread
From: Ian Rogers @ 2024-10-26 12:17 UTC (permalink / raw)
  To: Atish Patra, linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Ben Gainey,
	Dominique Martinet, Junhao He, linux-perf-users, linux-kernel

Legacy events typically don't have a PMU when added leading to
mismatched legacy/non-legacy cases in find_stat. Use evsel__find_pmu
to make sure the evsel PMU is looked up. Update the evsel__find_pmu
code to look for the PMU using the extended config type or, for legacy
hardware/hw_cache events on non-hybrid systems, just use the core PMU.

Before:
```
$ perf stat -e cycles,cpu/instructions/ -a sleep 1
 Performance counter stats for 'system wide':

       215,309,764      cycles
        44,326,491      cpu/instructions/

       1.002555314 seconds time elapsed
```
After:
```
$ perf stat -e cycles,cpu/instructions/ -a sleep 1

 Performance counter stats for 'system wide':

       990,676,332      cycles
     1,235,762,487      cpu/instructions/                #    1.25  insn per cycle

       1.002667198 seconds time elapsed
```

Fixes: 3612ca8e2935 ("perf stat: Fix the hard-coded metrics
calculation on the hybrid")
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/util/pmus.c        | 20 +++++++++++++++++---
 tools/perf/util/stat-shadow.c |  3 ++-
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index 107de86c2637..8d8eeeb28868 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -708,11 +708,25 @@ char *perf_pmus__default_pmu_name(void)
 struct perf_pmu *evsel__find_pmu(const struct evsel *evsel)
 {
 	struct perf_pmu *pmu = evsel->pmu;
+	bool legacy_core_type;
 
-	if (!pmu) {
-		pmu = perf_pmus__find_by_type(evsel->core.attr.type);
-		((struct evsel *)evsel)->pmu = pmu;
+	if (pmu)
+		return pmu;
+
+	pmu = perf_pmus__find_by_type(evsel->core.attr.type);
+	legacy_core_type =
+		evsel->core.attr.type == PERF_TYPE_HARDWARE ||
+		evsel->core.attr.type == PERF_TYPE_HW_CACHE;
+	if (!pmu && legacy_core_type) {
+		if (perf_pmus__supports_extended_type()) {
+			u32 type = evsel->core.attr.config >> PERF_PMU_TYPE_SHIFT;
+
+			pmu = perf_pmus__find_by_type(type);
+		} else {
+			pmu = perf_pmus__find_core_pmu();
+		}
 	}
+	((struct evsel *)evsel)->pmu = pmu;
 	return pmu;
 }
 
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index ada787947e16..f6d34e12e65c 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -151,6 +151,7 @@ static double find_stat(const struct evsel *evsel, int aggr_idx, enum stat_type
 {
 	struct evsel *cur;
 	int evsel_ctx = evsel_context(evsel);
+	struct perf_pmu *evsel_pmu = evsel__find_pmu(evsel);
 
 	evlist__for_each_entry(evsel->evlist, cur) {
 		struct perf_stat_aggr *aggr;
@@ -177,7 +178,7 @@ static double find_stat(const struct evsel *evsel, int aggr_idx, enum stat_type
 		 * Except the SW CLOCK events,
 		 * ignore if not the PMU we're looking for.
 		 */
-		if ((type != STAT_NSECS) && (evsel->pmu != cur->pmu))
+		if ((type != STAT_NSECS) && (evsel_pmu != evsel__find_pmu(cur)))
 			continue;
 
 		aggr = &cur->stats->aggr[aggr_idx];
-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v1 3/4] perf record: Skip don't fail for events that don't open
  2024-10-26 12:17 [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
  2024-10-26 12:17 ` [PATCH v1 1/4] perf evsel: Add pmu_name helper Ian Rogers
  2024-10-26 12:17 ` [PATCH v1 2/4] perf stat: Fix find_stat for mixed legacy/non-legacy events Ian Rogers
@ 2024-10-26 12:17 ` Ian Rogers
  2024-11-11 15:49   ` James Clark
  2024-11-12 19:53   ` Leo Yan
  2024-10-26 12:17 ` [PATCH v1 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy" Ian Rogers
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 24+ messages in thread
From: Ian Rogers @ 2024-10-26 12:17 UTC (permalink / raw)
  To: Atish Patra, linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Ben Gainey,
	Dominique Martinet, Junhao He, linux-perf-users, linux-kernel

Whilst for many tools it is an expected behavior that failure to open
a perf event is a failure, ARM decided to name PMU events the same as
legacy events and then failed to rename such events on a server uncore
SLC PMU. As perf's default behavior when no PMU is specified is to
open the event on all PMUs that advertise/"have" the event, this
yielded failures when trying to make the priority of legacy and
sysfs/json events uniform - something requested by RISC-V and ARM. A
legacy event user on ARM hardware may find their event opened on an
uncore PMU which for perf record will fail. Arnaldo suggested skipping
such events which this patch implements. Rather than have the skipping
conditional on running on ARM, the skipping is done on all
architectures as such a fundamental behavioral difference could lead
to problems with tools built/depending on perf.

An example of perf record failing to open events on x86 is:
```
$ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
"dmesg | grep -i perf" may provide additional information.

Error:
Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
"dmesg | grep -i perf" may provide additional information.

Error:
Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
The LLC-prefetch-read event is not supported.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]

$ perf report --stats
Aggregated stats:
               TOTAL events:      17255
                MMAP events:        284  ( 1.6%)
                COMM events:       1961  (11.4%)
                EXIT events:          1  ( 0.0%)
                FORK events:       1960  (11.4%)
              SAMPLE events:         87  ( 0.5%)
               MMAP2 events:      12836  (74.4%)
             KSYMBOL events:         83  ( 0.5%)
           BPF_EVENT events:         36  ( 0.2%)
      FINISHED_ROUND events:          2  ( 0.0%)
            ID_INDEX events:          1  ( 0.0%)
          THREAD_MAP events:          1  ( 0.0%)
             CPU_MAP events:          1  ( 0.0%)
           TIME_CONV events:          1  ( 0.0%)
       FINISHED_INIT events:          1  ( 0.0%)
cycles stats:
              SAMPLE events:         87
```

Note, if all events fail to open then the data file will contain no
samples. This is deliberate as at the point the events are opened
there are other events, such as the dummy event for sideband data, and
these events will succeed in opening even if the user specified ones
don't. Having a mix of open and broken events leads to a problem of
identifying different sources of events.

The issue with legacy events is that on RISC-V they want the driver to
not have mappings from legacy to non-legacy config encodings for each
vendor/model due to size, complexity and difficulty to update. It was
reported that on ARM Apple-M? CPUs the legacy mapping in the driver
was broken and the sysfs/json events should always take precedent,
however, it isn't clear this is still the case. It is the case that
without working around this issue a legacy event like cycles without a
PMU can encode differently than when specified with a PMU - the
non-PMU version favoring legacy encodings, the PMU one avoiding legacy
encodings.

The patch removes events and then adjusts the idx value for each
evsel. This is done so that the dense xyarrays used for file
descriptors, etc. don't contain broken entries. As event opening
happens relatively late in the record process, use of the idx value
before the open will have become corrupted, so it is expected there
are latent bugs hidden behind this change - the change is best
effort. As the only vendor that has broken event names is ARM, this
will principally effect ARM users. They will also experience warning
messages like those above because of the uncore PMU advertising legacy
event names.

Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
---
 tools/perf/builtin-record.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index f83252472921..7e99743f7e42 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -1364,6 +1364,7 @@ static int record__open(struct record *rec)
 	struct perf_session *session = rec->session;
 	struct record_opts *opts = &rec->opts;
 	int rc = 0;
+	bool skipped = false;
 
 	evlist__for_each_entry(evlist, pos) {
 try_again:
@@ -1379,15 +1380,26 @@ static int record__open(struct record *rec)
 			        pos = evlist__reset_weak_group(evlist, pos, true);
 				goto try_again;
 			}
-			rc = -errno;
 			evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
-			ui__error("%s\n", msg);
-			goto out;
+			ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
+				  evsel__name(pos), evsel__pmu_name(pos), msg);
+			pos->skippable = true;
+			skipped = true;
+		} else {
+			pos->supported = true;
 		}
-
-		pos->supported = true;
 	}
 
+	if (skipped) {
+		struct evsel *tmp;
+		int idx = 0;
+
+		evlist__for_each_entry_safe(evlist, tmp, pos) {
+			if (pos->skippable)
+				evlist__remove(evlist, pos);
+			pos->core.idx = idx++;
+		}
+	}
 	if (symbol_conf.kptr_restrict && !evlist__exclude_kernel(evlist)) {
 		pr_warning(
 "WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,\n"
-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH v1 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy"
  2024-10-26 12:17 [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
                   ` (2 preceding siblings ...)
  2024-10-26 12:17 ` [PATCH v1 3/4] perf record: Skip don't fail for events that don't open Ian Rogers
@ 2024-10-26 12:17 ` Ian Rogers
  2024-11-07 18:51 ` [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
  2024-11-11 23:23 ` Atish Kumar Patra
  5 siblings, 0 replies; 24+ messages in thread
From: Ian Rogers @ 2024-10-26 12:17 UTC (permalink / raw)
  To: Atish Patra, linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
	Kan Liang, James Clark, Ze Gao, Weilin Wang, Ben Gainey,
	Dominique Martinet, Junhao He, linux-perf-users, linux-kernel
  Cc: James Clark, Arnaldo Carvalho de Melo

Originally posted and merged from:
https://lore.kernel.org/r/20240416061533.921723-10-irogers@google.com
This reverts commit 4f1b067359ac8364cdb7f9fda41085fa85789d0f although
the patch is now smaller due to related fixes being applied in commit
22a4db3c3603 ("perf evsel: Add alternate_hw_config and use in
evsel__match").
The original commit message was:

It was requested that RISC-V be able to add events to the perf tool so
the PMU driver didn't need to map legacy events to config encodings:
https://lore.kernel.org/lkml/20240217005738.3744121-1-atishp@rivosinc.com/

This change makes the priority of events specified without a PMU the
same as those specified with a PMU, namely sysfs and JSON events are
checked first before using the legacy encoding.

The hw_term is made more generic as a hardware_event that encodes a
pair of string and int value, allowing parse_events_multi_pmu_add to
fall back on a known encoding when the sysfs/JSON adding fails for
core events. As this covers PE_VALUE_SYM_HW, that token is removed and
related code simplified.

Signed-off-by: Ian Rogers <irogers@google.com>
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Beeman Strong <beeman@rivosinc.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/parse-events.c | 26 +++++++++---
 tools/perf/util/parse-events.l | 76 +++++++++++++++++-----------------
 tools/perf/util/parse-events.y | 60 ++++++++++++++++++---------
 3 files changed, 98 insertions(+), 64 deletions(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index afeb8d815bbf..fd33e2b4860b 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1549,8 +1549,8 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
 	struct list_head *list = NULL;
 	struct perf_pmu *pmu = NULL;
 	YYLTYPE *loc = loc_;
-	int ok = 0;
-	const char *config;
+	int ok = 0, core_ok = 0;
+	const char *tmp;
 	struct parse_events_terms parsed_terms;
 
 	*listp = NULL;
@@ -1563,15 +1563,15 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
 			return ret;
 	}
 
-	config = strdup(event_name);
-	if (!config)
+	tmp = strdup(event_name);
+	if (!tmp)
 		goto out_err;
 
 	if (parse_events_term__num(&term,
 				   PARSE_EVENTS__TERM_TYPE_USER,
-				   config, /*num=*/1, /*novalue=*/true,
+				   tmp, /*num=*/1, /*novalue=*/true,
 				   loc, /*loc_val=*/NULL) < 0) {
-		zfree(&config);
+		zfree(&tmp);
 		goto out_err;
 	}
 	list_add_tail(&term->list, &parsed_terms.terms);
@@ -1602,6 +1602,8 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
 			pr_debug("%s -> %s/%s/\n", event_name, pmu->name, sb.buf);
 			strbuf_release(&sb);
 			ok++;
+			if (pmu->is_core)
+				core_ok++;
 		}
 	}
 
@@ -1618,6 +1620,18 @@ int parse_events_multi_pmu_add(struct parse_events_state *parse_state,
 		}
 	}
 
+	if (hw_config != PERF_COUNT_HW_MAX && !core_ok) {
+		/*
+		 * The event wasn't found on core PMUs but it has a hardware
+		 * config version to try.
+		 */
+		if (!parse_events_add_numeric(parse_state, list,
+						PERF_TYPE_HARDWARE, hw_config,
+						const_parsed_terms,
+						/*wildcard=*/true))
+			ok++;
+	}
+
 out_err:
 	parse_events_terms__exit(&parsed_terms);
 	if (ok)
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 14e5bd856a18..7c2641cf9b79 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -113,12 +113,12 @@ do {								\
 	yyless(0);						\
 } while (0)
 
-static int sym(yyscan_t scanner, int type, int config)
+static int sym(yyscan_t scanner, int config)
 {
 	YYSTYPE *yylval = parse_events_get_lval(scanner);
 
-	yylval->num = (type << 16) + config;
-	return type == PERF_TYPE_HARDWARE ? PE_VALUE_SYM_HW : PE_VALUE_SYM_SW;
+	yylval->num = config;
+	return PE_VALUE_SYM_SW;
 }
 
 static int term(yyscan_t scanner, enum parse_events__term_type type)
@@ -129,13 +129,13 @@ static int term(yyscan_t scanner, enum parse_events__term_type type)
 	return PE_TERM;
 }
 
-static int hw_term(yyscan_t scanner, int config)
+static int hw(yyscan_t scanner, int config)
 {
 	YYSTYPE *yylval = parse_events_get_lval(scanner);
 	char *text = parse_events_get_text(scanner);
 
-	yylval->hardware_term.str = strdup(text);
-	yylval->hardware_term.num = PERF_TYPE_HARDWARE + config;
+	yylval->hardware_event.str = strdup(text);
+	yylval->hardware_event.num = config;
 	return PE_TERM_HW;
 }
 
@@ -323,16 +323,16 @@ percore			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_PERCORE); }
 aux-output		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_OUTPUT); }
 aux-sample-size		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SAMPLE_SIZE); }
 metric-id		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_METRIC_ID); }
-cpu-cycles|cycles				{ return hw_term(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
-stalled-cycles-frontend|idle-cycles-frontend	{ return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
-stalled-cycles-backend|idle-cycles-backend	{ return hw_term(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
-instructions					{ return hw_term(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
-cache-references				{ return hw_term(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
-cache-misses					{ return hw_term(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
-branch-instructions|branches			{ return hw_term(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
-branch-misses					{ return hw_term(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
-bus-cycles					{ return hw_term(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
-ref-cycles					{ return hw_term(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
+cpu-cycles|cycles				{ return hw(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
+stalled-cycles-frontend|idle-cycles-frontend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
+stalled-cycles-backend|idle-cycles-backend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
+instructions					{ return hw(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
+cache-references				{ return hw(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
+cache-misses					{ return hw(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
+branch-instructions|branches			{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
+branch-misses					{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
+bus-cycles					{ return hw(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
+ref-cycles					{ return hw(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
 r{num_raw_hex}		{ return str(yyscanner, PE_RAW); }
 r0x{num_raw_hex}	{ return str(yyscanner, PE_RAW); }
 ,			{ return ','; }
@@ -376,28 +376,28 @@ r0x{num_raw_hex}	{ return str(yyscanner, PE_RAW); }
 <<EOF>>			{ BEGIN(INITIAL); }
 }
 
-cpu-cycles|cycles				{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CPU_CYCLES); }
-stalled-cycles-frontend|idle-cycles-frontend	{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
-stalled-cycles-backend|idle-cycles-backend	{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
-instructions					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_INSTRUCTIONS); }
-cache-references				{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_REFERENCES); }
-cache-misses					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_CACHE_MISSES); }
-branch-instructions|branches			{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
-branch-misses					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BRANCH_MISSES); }
-bus-cycles					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_BUS_CYCLES); }
-ref-cycles					{ return sym(yyscanner, PERF_TYPE_HARDWARE, PERF_COUNT_HW_REF_CPU_CYCLES); }
-cpu-clock					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_CLOCK); }
-task-clock					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_TASK_CLOCK); }
-page-faults|faults				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS); }
-minor-faults					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MIN); }
-major-faults					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_PAGE_FAULTS_MAJ); }
-context-switches|cs				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CONTEXT_SWITCHES); }
-cpu-migrations|migrations			{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CPU_MIGRATIONS); }
-alignment-faults				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
-emulation-faults				{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_EMULATION_FAULTS); }
-dummy						{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_DUMMY); }
-bpf-output					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_BPF_OUTPUT); }
-cgroup-switches					{ return sym(yyscanner, PERF_TYPE_SOFTWARE, PERF_COUNT_SW_CGROUP_SWITCHES); }
+cpu-cycles|cycles				{ return hw(yyscanner, PERF_COUNT_HW_CPU_CYCLES); }
+stalled-cycles-frontend|idle-cycles-frontend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND); }
+stalled-cycles-backend|idle-cycles-backend	{ return hw(yyscanner, PERF_COUNT_HW_STALLED_CYCLES_BACKEND); }
+instructions					{ return hw(yyscanner, PERF_COUNT_HW_INSTRUCTIONS); }
+cache-references				{ return hw(yyscanner, PERF_COUNT_HW_CACHE_REFERENCES); }
+cache-misses					{ return hw(yyscanner, PERF_COUNT_HW_CACHE_MISSES); }
+branch-instructions|branches			{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_INSTRUCTIONS); }
+branch-misses					{ return hw(yyscanner, PERF_COUNT_HW_BRANCH_MISSES); }
+bus-cycles					{ return hw(yyscanner, PERF_COUNT_HW_BUS_CYCLES); }
+ref-cycles					{ return hw(yyscanner, PERF_COUNT_HW_REF_CPU_CYCLES); }
+cpu-clock					{ return sym(yyscanner, PERF_COUNT_SW_CPU_CLOCK); }
+task-clock					{ return sym(yyscanner, PERF_COUNT_SW_TASK_CLOCK); }
+page-faults|faults				{ return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS); }
+minor-faults					{ return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MIN); }
+major-faults					{ return sym(yyscanner, PERF_COUNT_SW_PAGE_FAULTS_MAJ); }
+context-switches|cs				{ return sym(yyscanner, PERF_COUNT_SW_CONTEXT_SWITCHES); }
+cpu-migrations|migrations			{ return sym(yyscanner, PERF_COUNT_SW_CPU_MIGRATIONS); }
+alignment-faults				{ return sym(yyscanner, PERF_COUNT_SW_ALIGNMENT_FAULTS); }
+emulation-faults				{ return sym(yyscanner, PERF_COUNT_SW_EMULATION_FAULTS); }
+dummy						{ return sym(yyscanner, PERF_COUNT_SW_DUMMY); }
+bpf-output					{ return sym(yyscanner, PERF_COUNT_SW_BPF_OUTPUT); }
+cgroup-switches					{ return sym(yyscanner, PERF_COUNT_SW_CGROUP_SWITCHES); }
 
 {lc_type}			{ return str(yyscanner, PE_LEGACY_CACHE); }
 {lc_type}-{lc_op_result}	{ return str(yyscanner, PE_LEGACY_CACHE); }
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index f888cbb076d6..d2ef1890007e 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -55,7 +55,7 @@ static void free_list_evsel(struct list_head* list_evsel)
 %}
 
 %token PE_START_EVENTS PE_START_TERMS
-%token PE_VALUE PE_VALUE_SYM_HW PE_VALUE_SYM_SW PE_TERM
+%token PE_VALUE PE_VALUE_SYM_SW PE_TERM
 %token PE_EVENT_NAME
 %token PE_RAW PE_NAME
 %token PE_MODIFIER_EVENT PE_MODIFIER_BP PE_BP_COLON PE_BP_SLASH
@@ -65,11 +65,9 @@ static void free_list_evsel(struct list_head* list_evsel)
 %token PE_DRV_CFG_TERM
 %token PE_TERM_HW
 %type <num> PE_VALUE
-%type <num> PE_VALUE_SYM_HW
 %type <num> PE_VALUE_SYM_SW
 %type <mod> PE_MODIFIER_EVENT
 %type <term_type> PE_TERM
-%type <num> value_sym
 %type <str> PE_RAW
 %type <str> PE_NAME
 %type <str> PE_LEGACY_CACHE
@@ -85,6 +83,7 @@ static void free_list_evsel(struct list_head* list_evsel)
 %type <list_terms> opt_pmu_config
 %destructor { parse_events_terms__delete ($$); } <list_terms>
 %type <list_evsel> event_pmu
+%type <list_evsel> event_legacy_hardware
 %type <list_evsel> event_legacy_symbol
 %type <list_evsel> event_legacy_cache
 %type <list_evsel> event_legacy_mem
@@ -102,8 +101,8 @@ static void free_list_evsel(struct list_head* list_evsel)
 %destructor { free_list_evsel ($$); } <list_evsel>
 %type <tracepoint_name> tracepoint_name
 %destructor { free ($$.sys); free ($$.event); } <tracepoint_name>
-%type <hardware_term> PE_TERM_HW
-%destructor { free ($$.str); } <hardware_term>
+%type <hardware_event> PE_TERM_HW
+%destructor { free ($$.str); } <hardware_event>
 
 %union
 {
@@ -118,10 +117,10 @@ static void free_list_evsel(struct list_head* list_evsel)
 		char *sys;
 		char *event;
 	} tracepoint_name;
-	struct hardware_term {
+	struct hardware_event {
 		char *str;
 		u64 num;
-	} hardware_term;
+	} hardware_event;
 }
 %%
 
@@ -264,6 +263,7 @@ PE_EVENT_NAME event_def
 event_def
 
 event_def: event_pmu |
+	   event_legacy_hardware |
 	   event_legacy_symbol |
 	   event_legacy_cache sep_dc |
 	   event_legacy_mem sep_dc |
@@ -306,24 +306,45 @@ PE_NAME sep_dc
 	$$ = list;
 }
 
-value_sym:
-PE_VALUE_SYM_HW
+event_legacy_hardware:
+PE_TERM_HW opt_pmu_config
+{
+	/* List of created evsels. */
+	struct list_head *list = NULL;
+	int err = parse_events_multi_pmu_add(_parse_state, $1.str, $1.num, $2, &list, &@1);
+
+	free($1.str);
+	parse_events_terms__delete($2);
+	if (err)
+		PE_ABORT(err);
+
+	$$ = list;
+}
 |
-PE_VALUE_SYM_SW
+PE_TERM_HW sep_dc
+{
+	struct list_head *list;
+	int err;
+
+	err = parse_events_multi_pmu_add(_parse_state, $1.str, $1.num, NULL, &list, &@1);
+	free($1.str);
+	if (err)
+		PE_ABORT(err);
+	$$ = list;
+}
 
 event_legacy_symbol:
-value_sym '/' event_config '/'
+PE_VALUE_SYM_SW '/' event_config '/'
 {
 	struct list_head *list;
-	int type = $1 >> 16;
-	int config = $1 & 255;
 	int err;
-	bool wildcard = (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE);
 
 	list = alloc_list();
 	if (!list)
 		YYNOMEM;
-	err = parse_events_add_numeric(_parse_state, list, type, config, $3, wildcard);
+	err = parse_events_add_numeric(_parse_state, list,
+				/*type=*/PERF_TYPE_SOFTWARE, /*config=*/$1,
+				$3, /*wildcard=*/false);
 	parse_events_terms__delete($3);
 	if (err) {
 		free_list_evsel(list);
@@ -332,18 +353,17 @@ value_sym '/' event_config '/'
 	$$ = list;
 }
 |
-value_sym sep_slash_slash_dc
+PE_VALUE_SYM_SW sep_slash_slash_dc
 {
 	struct list_head *list;
-	int type = $1 >> 16;
-	int config = $1 & 255;
-	bool wildcard = (type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE);
 	int err;
 
 	list = alloc_list();
 	if (!list)
 		YYNOMEM;
-	err = parse_events_add_numeric(_parse_state, list, type, config, /*head_config=*/NULL, wildcard);
+	err = parse_events_add_numeric(_parse_state, list,
+				/*type=*/PERF_TYPE_SOFTWARE, /*config=*/$1,
+				/*head_config=*/NULL, /*wildcard=*/false);
 	if (err)
 		PE_ABORT(err);
 	$$ = list;
-- 
2.47.0.163.g1226f6d8fa-goog


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2024-10-26 12:17 [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
                   ` (3 preceding siblings ...)
  2024-10-26 12:17 ` [PATCH v1 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy" Ian Rogers
@ 2024-11-07 18:51 ` Ian Rogers
  2024-11-08 12:16   ` James Clark
  2024-11-11 23:23 ` Atish Kumar Patra
  5 siblings, 1 reply; 24+ messages in thread
From: Ian Rogers @ 2024-11-07 18:51 UTC (permalink / raw)
  To: Atish Patra, James Clark, linux-perf-users
  Cc: linux-kernel, Ben Gainey, Junhao He, linux-riscv, beeman,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Alexander Shishkin,
	Kan Liang, Ze Gao, Weilin Wang, Dominique Martinet

On Sat, Oct 26, 2024 at 5:18 AM Ian Rogers <irogers@google.com> wrote:
>
> At the RISC-V summit the topic of avoiding event data being in the
> RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> events being the priority when no PMU is provided so that legacy
> events maybe supported via json. Originally Mark Rutland also
> expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> M? processors, but James Clark more recently tested this and believes
> the driver issues there may not have existed or have been resolved. In
> any case, it is inconsistent that with a PMU event names avoid legacy
> encodings, but when wildcarding PMUs (ie without a PMU with the event
> name) the legacy encodings have priority.
>
> The patch doing this work was reverted in a v6.10 release candidate
> as, even though the patch was posted for weeks and had been on
> linux-next for weeks without issue, Linus was in the habit of using
> explicit legacy events with unsupported precision options on his
> Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> where ARM decided to call the events bus_cycles and cycles, the latter
> being also a legacy event name. ARM haven't renamed the cycles event
> to a more consistent cpu_cycles and avoided the problem. With these
> changes the problematic event will now be skipped, a large warning
> produced, and perf record will continue for the other PMU events. This
> solution was proposed by Arnaldo.
>
> Two minor changes have been added to help with the error message and
> to work around issues occurring with "perf stat metrics (shadow stat)
> test".
>
> The patches have only been tested on my x86 non-hybrid laptop.

Hi Atish and James,

Could I get your tags for this series?

The patches were originally motivated by wanting to make the behavior
of events parsed like "cycles" match that of "cpu/cycles/", the PMU is
wildcarded to "cpu" in the first case. This was divergent because of
ARM we switched from preferring legacy (type = PERF_TYPE_HARDWARE,
config = PERF_COUNT_HW_CPU_CYCLES) to sysfs/json (type=<core PMU's
type>, config=<encoding from event>) when a PMU name was given. This
aligns with RISC-V wanting to use json encodings to avoid complexity
in the PMU driver.

James, could you show the neoverse with the cmn PMU behavior for perf
record of "cycles:pp" due to sensitivities there.

Thanks,
Ian




> Ian Rogers (4):
>   perf evsel: Add pmu_name helper
>   perf stat: Fix find_stat for mixed legacy/non-legacy events
>   perf record: Skip don't fail for events that don't open
>   perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
>     legacy"
>
>  tools/perf/builtin-record.c    | 22 +++++++---
>  tools/perf/util/evsel.c        | 10 +++++
>  tools/perf/util/evsel.h        |  1 +
>  tools/perf/util/parse-events.c | 26 +++++++++---
>  tools/perf/util/parse-events.l | 76 +++++++++++++++++-----------------
>  tools/perf/util/parse-events.y | 60 ++++++++++++++++++---------
>  tools/perf/util/pmus.c         | 20 +++++++--
>  tools/perf/util/stat-shadow.c  |  3 +-
>  8 files changed, 145 insertions(+), 73 deletions(-)
>
> --
> 2.47.0.163.g1226f6d8fa-goog
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2024-11-07 18:51 ` [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
@ 2024-11-08 12:16   ` James Clark
  2024-11-08 18:37     ` Atish Kumar Patra
  0 siblings, 1 reply; 24+ messages in thread
From: James Clark @ 2024-11-08 12:16 UTC (permalink / raw)
  To: Ian Rogers, Atish Patra, linux-perf-users
  Cc: linux-kernel, Ben Gainey, Junhao He, linux-riscv, beeman,
	Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Mark Rutland, Jiri Olsa, Alexander Shishkin,
	Kan Liang, Ze Gao, Weilin Wang, Dominique Martinet



On 07/11/2024 18:51, Ian Rogers wrote:
> On Sat, Oct 26, 2024 at 5:18 AM Ian Rogers <irogers@google.com> wrote:
>>
>> At the RISC-V summit the topic of avoiding event data being in the
>> RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
>> events being the priority when no PMU is provided so that legacy
>> events maybe supported via json. Originally Mark Rutland also
>> expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
>> M? processors, but James Clark more recently tested this and believes
>> the driver issues there may not have existed or have been resolved. In
>> any case, it is inconsistent that with a PMU event names avoid legacy
>> encodings, but when wildcarding PMUs (ie without a PMU with the event
>> name) the legacy encodings have priority.
>>
>> The patch doing this work was reverted in a v6.10 release candidate
>> as, even though the patch was posted for weeks and had been on
>> linux-next for weeks without issue, Linus was in the habit of using
>> explicit legacy events with unsupported precision options on his
>> Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
>> where ARM decided to call the events bus_cycles and cycles, the latter
>> being also a legacy event name. ARM haven't renamed the cycles event
>> to a more consistent cpu_cycles and avoided the problem. With these
>> changes the problematic event will now be skipped, a large warning
>> produced, and perf record will continue for the other PMU events. This
>> solution was proposed by Arnaldo.
>>
>> Two minor changes have been added to help with the error message and
>> to work around issues occurring with "perf stat metrics (shadow stat)
>> test".
>>
>> The patches have only been tested on my x86 non-hybrid laptop.
> 
> Hi Atish and James,
> 
> Could I get your tags for this series?
> 
> The patches were originally motivated by wanting to make the behavior
> of events parsed like "cycles" match that of "cpu/cycles/", the PMU is
> wildcarded to "cpu" in the first case. This was divergent because of
> ARM we switched from preferring legacy (type = PERF_TYPE_HARDWARE,
> config = PERF_COUNT_HW_CPU_CYCLES) to sysfs/json (type=<core PMU's
> type>, config=<encoding from event>) when a PMU name was given. This
> aligns with RISC-V wanting to use json encodings to avoid complexity
> in the PMU driver.
> 

I couldn't find the thread, but I remember fairly recently it was 
mentioned that RISC-V would be supporting the legacy events after all, 
maybe it was a comment from Atish? I'm not sure if that changes the 
requirements for this or not?

I still can't really imagine how tooling would work if every tool has to 
maintain the mappings of basic events like instructions and branches. 
For example all the perf_event_open tests in ltp use the legacy events.

And wouldn't porting existing software to RISC-V would be an issue if it 
doesn't behave in a similar way to what's there already?

> James, could you show the neoverse with the cmn PMU behavior for perf
> record of "cycles:pp" due to sensitivities there.
> 

Yep I can check this on Monday.

> Thanks,
> Ian
> 


> 
> 
> 
>> Ian Rogers (4):
>>    perf evsel: Add pmu_name helper
>>    perf stat: Fix find_stat for mixed legacy/non-legacy events
>>    perf record: Skip don't fail for events that don't open
>>    perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
>>      legacy"
>>
>>   tools/perf/builtin-record.c    | 22 +++++++---
>>   tools/perf/util/evsel.c        | 10 +++++
>>   tools/perf/util/evsel.h        |  1 +
>>   tools/perf/util/parse-events.c | 26 +++++++++---
>>   tools/perf/util/parse-events.l | 76 +++++++++++++++++-----------------
>>   tools/perf/util/parse-events.y | 60 ++++++++++++++++++---------
>>   tools/perf/util/pmus.c         | 20 +++++++--
>>   tools/perf/util/stat-shadow.c  |  3 +-
>>   8 files changed, 145 insertions(+), 73 deletions(-)
>>
>> --
>> 2.47.0.163.g1226f6d8fa-goog
>>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2024-11-08 12:16   ` James Clark
@ 2024-11-08 18:37     ` Atish Kumar Patra
  2024-11-08 18:59       ` Ian Rogers
  2024-11-11 10:45       ` James Clark
  0 siblings, 2 replies; 24+ messages in thread
From: Atish Kumar Patra @ 2024-11-08 18:37 UTC (permalink / raw)
  To: James Clark
  Cc: Ian Rogers, linux-perf-users, linux-kernel, Ben Gainey, Junhao He,
	linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa,
	Alexander Shishkin, Kan Liang, Ze Gao, Weilin Wang,
	Dominique Martinet

On Fri, Nov 8, 2024 at 4:16 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 07/11/2024 18:51, Ian Rogers wrote:
> > On Sat, Oct 26, 2024 at 5:18 AM Ian Rogers <irogers@google.com> wrote:
> >>
> >> At the RISC-V summit the topic of avoiding event data being in the
> >> RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> >> events being the priority when no PMU is provided so that legacy
> >> events maybe supported via json. Originally Mark Rutland also
> >> expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> >> M? processors, but James Clark more recently tested this and believes
> >> the driver issues there may not have existed or have been resolved. In
> >> any case, it is inconsistent that with a PMU event names avoid legacy
> >> encodings, but when wildcarding PMUs (ie without a PMU with the event
> >> name) the legacy encodings have priority.
> >>
> >> The patch doing this work was reverted in a v6.10 release candidate
> >> as, even though the patch was posted for weeks and had been on
> >> linux-next for weeks without issue, Linus was in the habit of using
> >> explicit legacy events with unsupported precision options on his
> >> Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> >> where ARM decided to call the events bus_cycles and cycles, the latter
> >> being also a legacy event name. ARM haven't renamed the cycles event
> >> to a more consistent cpu_cycles and avoided the problem. With these
> >> changes the problematic event will now be skipped, a large warning
> >> produced, and perf record will continue for the other PMU events. This
> >> solution was proposed by Arnaldo.
> >>
> >> Two minor changes have been added to help with the error message and
> >> to work around issues occurring with "perf stat metrics (shadow stat)
> >> test".
> >>
> >> The patches have only been tested on my x86 non-hybrid laptop.
> >
> > Hi Atish and James,
> >
> > Could I get your tags for this series?
> >

Hi Ian,
Thanks for your patches. It definitely helps to have a clean slate
implementation
for the perf tool. However, I have some open questions about other use cases
that we discussed during the RVI Summit.

> > The patches were originally motivated by wanting to make the behavior
> > of events parsed like "cycles" match that of "cpu/cycles/", the PMU is
> > wildcarded to "cpu" in the first case. This was divergent because of
> > ARM we switched from preferring legacy (type = PERF_TYPE_HARDWARE,
> > config = PERF_COUNT_HW_CPU_CYCLES) to sysfs/json (type=<core PMU's
> > type>, config=<encoding from event>) when a PMU name was given. This
> > aligns with RISC-V wanting to use json encodings to avoid complexity
> > in the PMU driver.
> >
>
> I couldn't find the thread, but I remember fairly recently it was
> mentioned that RISC-V would be supporting the legacy events after all,
> maybe it was a comment from Atish? I'm not sure if that changes the
> requirements for this or not?
>
> I still can't really imagine how tooling would work if every tool has to
> maintain the mappings of basic events like instructions and branches.
> For example all the perf_event_open tests in ltp use the legacy events.
>

No it has not changed. While this series helps to avoid clunky vendor
specific encodings
in the driver for perf tool, I am still unsure how we will manage
other applications
(directly passing legacy events through perf_event_open or
perf_evlist__open) will work.

I have only anecdotal data about folks relying perf legacy events
directly to profile
their application. All of them use mostly cycle/instruction events though.
Are there any users who use other legacy events directly without perf tool ?

If not, we may have only cycle/instruction mapping in the driver and
rely on json for everything else.
The other use case is virtualization. I have been playing with these
patches to find a clean solution to
enable all the use cases. If you have any other ideas, please let me know.

> And wouldn't porting existing software to RISC-V would be an issue if it
> doesn't behave in a similar way to what's there already?
>
> > James, could you show the neoverse with the cmn PMU behavior for perf
> > record of "cycles:pp" due to sensitivities there.
> >
>
> Yep I can check this on Monday.
>
> > Thanks,
> > Ian
> >
>
>
> >
> >
> >
> >> Ian Rogers (4):
> >>    perf evsel: Add pmu_name helper
> >>    perf stat: Fix find_stat for mixed legacy/non-legacy events
> >>    perf record: Skip don't fail for events that don't open
> >>    perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
> >>      legacy"
> >>
> >>   tools/perf/builtin-record.c    | 22 +++++++---
> >>   tools/perf/util/evsel.c        | 10 +++++
> >>   tools/perf/util/evsel.h        |  1 +
> >>   tools/perf/util/parse-events.c | 26 +++++++++---
> >>   tools/perf/util/parse-events.l | 76 +++++++++++++++++-----------------
> >>   tools/perf/util/parse-events.y | 60 ++++++++++++++++++---------
> >>   tools/perf/util/pmus.c         | 20 +++++++--
> >>   tools/perf/util/stat-shadow.c  |  3 +-
> >>   8 files changed, 145 insertions(+), 73 deletions(-)
> >>
> >> --
> >> 2.47.0.163.g1226f6d8fa-goog
> >>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2024-11-08 18:37     ` Atish Kumar Patra
@ 2024-11-08 18:59       ` Ian Rogers
  2024-11-08 22:06         ` Atish Kumar Patra
  2024-11-11 10:45       ` James Clark
  1 sibling, 1 reply; 24+ messages in thread
From: Ian Rogers @ 2024-11-08 18:59 UTC (permalink / raw)
  To: Atish Kumar Patra
  Cc: James Clark, linux-perf-users, linux-kernel, Ben Gainey,
	Junhao He, linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa,
	Alexander Shishkin, Kan Liang, Ze Gao, Weilin Wang,
	Dominique Martinet

On Fri, Nov 8, 2024 at 10:38 AM Atish Kumar Patra <atishp@rivosinc.com> wrote:
>
> On Fri, Nov 8, 2024 at 4:16 AM James Clark <james.clark@linaro.org> wrote:
> >
> >
> >
> > On 07/11/2024 18:51, Ian Rogers wrote:
> > > On Sat, Oct 26, 2024 at 5:18 AM Ian Rogers <irogers@google.com> wrote:
> > >>
> > >> At the RISC-V summit the topic of avoiding event data being in the
> > >> RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> > >> events being the priority when no PMU is provided so that legacy
> > >> events maybe supported via json. Originally Mark Rutland also
> > >> expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> > >> M? processors, but James Clark more recently tested this and believes
> > >> the driver issues there may not have existed or have been resolved. In
> > >> any case, it is inconsistent that with a PMU event names avoid legacy
> > >> encodings, but when wildcarding PMUs (ie without a PMU with the event
> > >> name) the legacy encodings have priority.
> > >>
> > >> The patch doing this work was reverted in a v6.10 release candidate
> > >> as, even though the patch was posted for weeks and had been on
> > >> linux-next for weeks without issue, Linus was in the habit of using
> > >> explicit legacy events with unsupported precision options on his
> > >> Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> > >> where ARM decided to call the events bus_cycles and cycles, the latter
> > >> being also a legacy event name. ARM haven't renamed the cycles event
> > >> to a more consistent cpu_cycles and avoided the problem. With these
> > >> changes the problematic event will now be skipped, a large warning
> > >> produced, and perf record will continue for the other PMU events. This
> > >> solution was proposed by Arnaldo.
> > >>
> > >> Two minor changes have been added to help with the error message and
> > >> to work around issues occurring with "perf stat metrics (shadow stat)
> > >> test".
> > >>
> > >> The patches have only been tested on my x86 non-hybrid laptop.
> > >
> > > Hi Atish and James,
> > >
> > > Could I get your tags for this series?
> > >
>
> Hi Ian,
> Thanks for your patches. It definitely helps to have a clean slate
> implementation
> for the perf tool. However, I have some open questions about other use cases
> that we discussed during the RVI Summit.

Thanks Atish, could I get your acked/reviewed/tested tags?

Ian

> > > The patches were originally motivated by wanting to make the behavior
> > > of events parsed like "cycles" match that of "cpu/cycles/", the PMU is
> > > wildcarded to "cpu" in the first case. This was divergent because of
> > > ARM we switched from preferring legacy (type = PERF_TYPE_HARDWARE,
> > > config = PERF_COUNT_HW_CPU_CYCLES) to sysfs/json (type=<core PMU's
> > > type>, config=<encoding from event>) when a PMU name was given. This
> > > aligns with RISC-V wanting to use json encodings to avoid complexity
> > > in the PMU driver.
> > >
> >
> > I couldn't find the thread, but I remember fairly recently it was
> > mentioned that RISC-V would be supporting the legacy events after all,
> > maybe it was a comment from Atish? I'm not sure if that changes the
> > requirements for this or not?
> >
> > I still can't really imagine how tooling would work if every tool has to
> > maintain the mappings of basic events like instructions and branches.
> > For example all the perf_event_open tests in ltp use the legacy events.
> >
>
> No it has not changed. While this series helps to avoid clunky vendor
> specific encodings
> in the driver for perf tool, I am still unsure how we will manage
> other applications
> (directly passing legacy events through perf_event_open or
> perf_evlist__open) will work.
>
> I have only anecdotal data about folks relying perf legacy events
> directly to profile
> their application. All of them use mostly cycle/instruction events though.
> Are there any users who use other legacy events directly without perf tool ?
>
> If not, we may have only cycle/instruction mapping in the driver and
> rely on json for everything else.
> The other use case is virtualization. I have been playing with these
> patches to find a clean solution to
> enable all the use cases. If you have any other ideas, please let me know.
>
> > And wouldn't porting existing software to RISC-V would be an issue if it
> > doesn't behave in a similar way to what's there already?
> >
> > > James, could you show the neoverse with the cmn PMU behavior for perf
> > > record of "cycles:pp" due to sensitivities there.
> > >
> >
> > Yep I can check this on Monday.
> >
> > > Thanks,
> > > Ian
> > >
> >
> >
> > >
> > >
> > >
> > >> Ian Rogers (4):
> > >>    perf evsel: Add pmu_name helper
> > >>    perf stat: Fix find_stat for mixed legacy/non-legacy events
> > >>    perf record: Skip don't fail for events that don't open
> > >>    perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
> > >>      legacy"
> > >>
> > >>   tools/perf/builtin-record.c    | 22 +++++++---
> > >>   tools/perf/util/evsel.c        | 10 +++++
> > >>   tools/perf/util/evsel.h        |  1 +
> > >>   tools/perf/util/parse-events.c | 26 +++++++++---
> > >>   tools/perf/util/parse-events.l | 76 +++++++++++++++++-----------------
> > >>   tools/perf/util/parse-events.y | 60 ++++++++++++++++++---------
> > >>   tools/perf/util/pmus.c         | 20 +++++++--
> > >>   tools/perf/util/stat-shadow.c  |  3 +-
> > >>   8 files changed, 145 insertions(+), 73 deletions(-)
> > >>
> > >> --
> > >> 2.47.0.163.g1226f6d8fa-goog
> > >>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2024-11-08 18:59       ` Ian Rogers
@ 2024-11-08 22:06         ` Atish Kumar Patra
  0 siblings, 0 replies; 24+ messages in thread
From: Atish Kumar Patra @ 2024-11-08 22:06 UTC (permalink / raw)
  To: Ian Rogers
  Cc: James Clark, linux-perf-users, linux-kernel, Ben Gainey,
	Junhao He, linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa,
	Alexander Shishkin, Kan Liang, Ze Gao, Weilin Wang,
	Dominique Martinet, Anup Patel, mikey

On Fri, Nov 8, 2024 at 11:00 AM Ian Rogers <irogers@google.com> wrote:
>
> On Fri, Nov 8, 2024 at 10:38 AM Atish Kumar Patra <atishp@rivosinc.com> wrote:
> >
> > On Fri, Nov 8, 2024 at 4:16 AM James Clark <james.clark@linaro.org> wrote:
> > >
> > >
> > >
> > > On 07/11/2024 18:51, Ian Rogers wrote:
> > > > On Sat, Oct 26, 2024 at 5:18 AM Ian Rogers <irogers@google.com> wrote:
> > > >>
> > > >> At the RISC-V summit the topic of avoiding event data being in the
> > > >> RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> > > >> events being the priority when no PMU is provided so that legacy
> > > >> events maybe supported via json. Originally Mark Rutland also
> > > >> expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> > > >> M? processors, but James Clark more recently tested this and believes
> > > >> the driver issues there may not have existed or have been resolved. In
> > > >> any case, it is inconsistent that with a PMU event names avoid legacy
> > > >> encodings, but when wildcarding PMUs (ie without a PMU with the event
> > > >> name) the legacy encodings have priority.
> > > >>
> > > >> The patch doing this work was reverted in a v6.10 release candidate
> > > >> as, even though the patch was posted for weeks and had been on
> > > >> linux-next for weeks without issue, Linus was in the habit of using
> > > >> explicit legacy events with unsupported precision options on his
> > > >> Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> > > >> where ARM decided to call the events bus_cycles and cycles, the latter
> > > >> being also a legacy event name. ARM haven't renamed the cycles event
> > > >> to a more consistent cpu_cycles and avoided the problem. With these
> > > >> changes the problematic event will now be skipped, a large warning
> > > >> produced, and perf record will continue for the other PMU events. This
> > > >> solution was proposed by Arnaldo.
> > > >>
> > > >> Two minor changes have been added to help with the error message and
> > > >> to work around issues occurring with "perf stat metrics (shadow stat)
> > > >> test".
> > > >>
> > > >> The patches have only been tested on my x86 non-hybrid laptop.
> > > >
> > > > Hi Atish and James,
> > > >
> > > > Could I get your tags for this series?
> > > >
> >
> > Hi Ian,
> > Thanks for your patches. It definitely helps to have a clean slate
> > implementation
> > for the perf tool. However, I have some open questions about other use cases
> > that we discussed during the RVI Summit.
>
> Thanks Atish, could I get your acked/reviewed/tested tags?
>

Sure. I will finish the testing and send those.

> Ian
>
> > > > The patches were originally motivated by wanting to make the behavior
> > > > of events parsed like "cycles" match that of "cpu/cycles/", the PMU is
> > > > wildcarded to "cpu" in the first case. This was divergent because of
> > > > ARM we switched from preferring legacy (type = PERF_TYPE_HARDWARE,
> > > > config = PERF_COUNT_HW_CPU_CYCLES) to sysfs/json (type=<core PMU's
> > > > type>, config=<encoding from event>) when a PMU name was given. This
> > > > aligns with RISC-V wanting to use json encodings to avoid complexity
> > > > in the PMU driver.
> > > >
> > >
> > > I couldn't find the thread, but I remember fairly recently it was
> > > mentioned that RISC-V would be supporting the legacy events after all,
> > > maybe it was a comment from Atish? I'm not sure if that changes the
> > > requirements for this or not?
> > >
> > > I still can't really imagine how tooling would work if every tool has to
> > > maintain the mappings of basic events like instructions and branches.
> > > For example all the perf_event_open tests in ltp use the legacy events.
> > >
> >
> > No it has not changed. While this series helps to avoid clunky vendor
> > specific encodings
> > in the driver for perf tool, I am still unsure how we will manage
> > other applications
> > (directly passing legacy events through perf_event_open or
> > perf_evlist__open) will work.
> >
> > I have only anecdotal data about folks relying perf legacy events
> > directly to profile
> > their application. All of them use mostly cycle/instruction events though.
> > Are there any users who use other legacy events directly without perf tool ?
> >

+ Michale from tensorrent who was suggesting that they use the direct perf calls
in their profiling application.

@Michale : Do you have more details of direct usage of perf legacy
events to profile your application ?

> > If not, we may have only cycle/instruction mapping in the driver and
> > rely on json for everything else.
> > The other use case is virtualization. I have been playing with these
> > patches to find a clean solution to
> > enable all the use cases. If you have any other ideas, please let me know.
> >

@Ian

Any thoughts on this ? Let me explain the hypervisor use case a little bit more.
RISC-V KVM relies on SBI PMU[1] (equivalent to hypercall in x86 or HVC in ARM).
As the RISC-V ISA doesn't have any event encodings, the SBI PMU
defines a standard set corresponding to
each perf legacy event. When a guest tries to allocate a counter for
an event, it makes an SBI call (CFG_MATCH)
with SBI event encodings (matching perf legacy) or a raw event
encoding. The host kernel allocates a virtual counter
and programs the corresponding event CSRs and enables the counter.

There are two possible approaches to support it.

1. The guest OS has the correct version of the perf tool with the json
file that provides the encoding of the events supported by
the running host. The guest OS passes the exact encoding of the event
during the CFG_MATCH SBI call as a raw event and the host programs
the event CSR.  It is a much simpler scheme and less management on the
host side. But the perf tool on guests has to pass any perf legacy
events
as raw events to the driver instead of PERF_HARDWARE/CACHE or indicate
event encoding is coming from json in some other way.

The other issue is that VMM can not modify the vendorid,implid,archid
shown to the guest (the default is the same as the host).
Migration across CPU generation or vendors won't be possible if perf
in use. This may not be an issue as VM migration across CPU
Generations are not a common thing.

2. The guest OS driver always relies on the SBI PMU event encoding
(i.e perf legacy event) which the host can translate to the event
encoding the hardware
supports if it is baked into the driver. The obvious downside is the
vendor specific encodings in the driver which we are trying to avoid.

[1] https://github.com/riscv-non-isa/riscv-sbi-doc/blob/master/src/ext-pmu.adoc
> > > And wouldn't porting existing software to RISC-V would be an issue if it
> > > doesn't behave in a similar way to what's there already?
> > >
> > > > James, could you show the neoverse with the cmn PMU behavior for perf
> > > > record of "cycles:pp" due to sensitivities there.
> > > >
> > >
> > > Yep I can check this on Monday.
> > >
> > > > Thanks,
> > > > Ian
> > > >
> > >
> > >
> > > >
> > > >
> > > >
> > > >> Ian Rogers (4):
> > > >>    perf evsel: Add pmu_name helper
> > > >>    perf stat: Fix find_stat for mixed legacy/non-legacy events
> > > >>    perf record: Skip don't fail for events that don't open
> > > >>    perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
> > > >>      legacy"
> > > >>
> > > >>   tools/perf/builtin-record.c    | 22 +++++++---
> > > >>   tools/perf/util/evsel.c        | 10 +++++
> > > >>   tools/perf/util/evsel.h        |  1 +
> > > >>   tools/perf/util/parse-events.c | 26 +++++++++---
> > > >>   tools/perf/util/parse-events.l | 76 +++++++++++++++++-----------------
> > > >>   tools/perf/util/parse-events.y | 60 ++++++++++++++++++---------
> > > >>   tools/perf/util/pmus.c         | 20 +++++++--
> > > >>   tools/perf/util/stat-shadow.c  |  3 +-
> > > >>   8 files changed, 145 insertions(+), 73 deletions(-)
> > > >>
> > > >> --
> > > >> 2.47.0.163.g1226f6d8fa-goog
> > > >>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2024-11-08 18:37     ` Atish Kumar Patra
  2024-11-08 18:59       ` Ian Rogers
@ 2024-11-11 10:45       ` James Clark
  2024-11-11 17:19         ` Ian Rogers
  1 sibling, 1 reply; 24+ messages in thread
From: James Clark @ 2024-11-11 10:45 UTC (permalink / raw)
  To: Atish Kumar Patra, Ian Rogers
  Cc: linux-perf-users, linux-kernel, Ben Gainey, Junhao He,
	linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa,
	Alexander Shishkin, Kan Liang, Ze Gao, Weilin Wang,
	Dominique Martinet



On 08/11/2024 6:37 pm, Atish Kumar Patra wrote:
> On Fri, Nov 8, 2024 at 4:16 AM James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 07/11/2024 18:51, Ian Rogers wrote:
>>> On Sat, Oct 26, 2024 at 5:18 AM Ian Rogers <irogers@google.com> wrote:
>>>>
>>>> At the RISC-V summit the topic of avoiding event data being in the
>>>> RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
>>>> events being the priority when no PMU is provided so that legacy
>>>> events maybe supported via json. Originally Mark Rutland also
>>>> expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
>>>> M? processors, but James Clark more recently tested this and believes
>>>> the driver issues there may not have existed or have been resolved. In
>>>> any case, it is inconsistent that with a PMU event names avoid legacy
>>>> encodings, but when wildcarding PMUs (ie without a PMU with the event
>>>> name) the legacy encodings have priority.
>>>>
>>>> The patch doing this work was reverted in a v6.10 release candidate
>>>> as, even though the patch was posted for weeks and had been on
>>>> linux-next for weeks without issue, Linus was in the habit of using
>>>> explicit legacy events with unsupported precision options on his
>>>> Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
>>>> where ARM decided to call the events bus_cycles and cycles, the latter
>>>> being also a legacy event name. ARM haven't renamed the cycles event
>>>> to a more consistent cpu_cycles and avoided the problem. With these
>>>> changes the problematic event will now be skipped, a large warning
>>>> produced, and perf record will continue for the other PMU events. This
>>>> solution was proposed by Arnaldo.
>>>>
>>>> Two minor changes have been added to help with the error message and
>>>> to work around issues occurring with "perf stat metrics (shadow stat)
>>>> test".
>>>>
>>>> The patches have only been tested on my x86 non-hybrid laptop.
>>>
>>> Hi Atish and James,
>>>
>>> Could I get your tags for this series?
>>>
> 
> Hi Ian,
> Thanks for your patches. It definitely helps to have a clean slate
> implementation
> for the perf tool. However, I have some open questions about other use cases
> that we discussed during the RVI Summit.
> 
>>> The patches were originally motivated by wanting to make the behavior
>>> of events parsed like "cycles" match that of "cpu/cycles/", the PMU is
>>> wildcarded to "cpu" in the first case. This was divergent because of
>>> ARM we switched from preferring legacy (type = PERF_TYPE_HARDWARE,
>>> config = PERF_COUNT_HW_CPU_CYCLES) to sysfs/json (type=<core PMU's
>>> type>, config=<encoding from event>) when a PMU name was given. This
>>> aligns with RISC-V wanting to use json encodings to avoid complexity
>>> in the PMU driver.
>>>
>>
>> I couldn't find the thread, but I remember fairly recently it was
>> mentioned that RISC-V would be supporting the legacy events after all,
>> maybe it was a comment from Atish? I'm not sure if that changes the
>> requirements for this or not?
>>
>> I still can't really imagine how tooling would work if every tool has to
>> maintain the mappings of basic events like instructions and branches.
>> For example all the perf_event_open tests in ltp use the legacy events.
>>
> 
> No it has not changed. While this series helps to avoid clunky vendor
> specific encodings
> in the driver for perf tool, I am still unsure how we will manage
> other applications
> (directly passing legacy events through perf_event_open or
> perf_evlist__open) will work.
> 
> I have only anecdotal data about folks relying perf legacy events
> directly to profile
> their application. All of them use mostly cycle/instruction events though.
> Are there any users who use other legacy events directly without perf tool ?
> 
> If not, we may have only cycle/instruction mapping in the driver and
> rely on json for everything else.
> The other use case is virtualization. I have been playing with these
> patches to find a clean solution to
> enable all the use cases. If you have any other ideas, please let me know.
> 

Yeah I would expect it's mostly cycles and instructions. I searched a 
bit for PERF_COUNT_HW_BRANCH_MISSES and only found tool/developer type 
usages, which I suppose we could expect to have to handle the mappings 
like perf. Although it's not the easiest thing to search for and 
obviously that only includes open source.

Usages do exist though, there are people posting on stack overflow using 
it, and other various bug tracker listings. So you would expect those 
same users to have to use raw event codes and some switch statement to 
pick the right one for their hardware, or use Perf.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 3/4] perf record: Skip don't fail for events that don't open
  2024-10-26 12:17 ` [PATCH v1 3/4] perf record: Skip don't fail for events that don't open Ian Rogers
@ 2024-11-11 15:49   ` James Clark
  2024-11-11 17:00     ` Ian Rogers
  2024-11-12 19:53   ` Leo Yan
  1 sibling, 1 reply; 24+ messages in thread
From: James Clark @ 2024-11-11 15:49 UTC (permalink / raw)
  To: Ian Rogers, Atish Patra, Namhyung Kim, Arnaldo Carvalho de Melo
  Cc: linux-riscv, beeman, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Ze Gao,
	Weilin Wang, Ben Gainey, Dominique Martinet, Junhao He,
	linux-perf-users, linux-kernel



On 26/10/2024 1:17 pm, Ian Rogers wrote:
> Whilst for many tools it is an expected behavior that failure to open
> a perf event is a failure, ARM decided to name PMU events the same as
> legacy events and then failed to rename such events on a server uncore
> SLC PMU. As perf's default behavior when no PMU is specified is to
> open the event on all PMUs that advertise/"have" the event, this
> yielded failures when trying to make the priority of legacy and
> sysfs/json events uniform - something requested by RISC-V and ARM. A
> legacy event user on ARM hardware may find their event opened on an
> uncore PMU which for perf record will fail. Arnaldo suggested skipping
> such events which this patch implements. Rather than have the skipping
> conditional on running on ARM, the skipping is done on all
> architectures as such a fundamental behavioral difference could lead
> to problems with tools built/depending on perf.
> 
> An example of perf record failing to open events on x86 is:
> ```
> $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> Error:
> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> "dmesg | grep -i perf" may provide additional information.
> 
> Error:
> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> "dmesg | grep -i perf" may provide additional information.
> 

This makes me wonder if this message was overly wordy to begin with. 
This line is fine:

  Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0'
  which will be removed.

The next bit about the syscall just repeats. The exit code could be 
included on the previous line. And the dmesg bit is general advice that 
could possibly be printed once at the end.

> Error:
> Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> The LLC-prefetch-read event is not supported.
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> 
> $ perf report --stats
> Aggregated stats:
>                 TOTAL events:      17255
>                  MMAP events:        284  ( 1.6%)
>                  COMM events:       1961  (11.4%)
>                  EXIT events:          1  ( 0.0%)
>                  FORK events:       1960  (11.4%)
>                SAMPLE events:         87  ( 0.5%)
>                 MMAP2 events:      12836  (74.4%)
>               KSYMBOL events:         83  ( 0.5%)
>             BPF_EVENT events:         36  ( 0.2%)
>        FINISHED_ROUND events:          2  ( 0.0%)
>              ID_INDEX events:          1  ( 0.0%)
>            THREAD_MAP events:          1  ( 0.0%)
>               CPU_MAP events:          1  ( 0.0%)
>             TIME_CONV events:          1  ( 0.0%)
>         FINISHED_INIT events:          1  ( 0.0%)
> cycles stats:
>                SAMPLE events:         87
> ```
> 
> Note, if all events fail to open then the data file will contain no
> samples. This is deliberate as at the point the events are opened
> there are other events, such as the dummy event for sideband data, and
> these events will succeed in opening even if the user specified ones

Is a file with only sideband events useful? Is it possible to fail the 
record command if the event doesn't open anywhere?

I noticed this leads to some different behavior and a libperf warning 
when you have paranoid=3:

   $ perf record -e cycles -C 0 -- true

   Error:
   Failure to open event 'cpu_atom/cycles/u' on PMU 'cpu_atom' which will
   be removed.
   ...
   Consider adjusting /proc/sys/kernel/perf_event_paranoid setting
   ...
   libperf: Miscounted nr_mmaps 0 vs 28
   WARNING: No sample_id_all support, falling back to unordered
   processing
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.021 MB perf.data ]


> don't. Having a mix of open and broken events leads to a problem of
> identifying different sources of events.
 >

In my basic test I saw that the opened event was identified correctly in 
perf report, unless you have an example that you encountered that we 
should fix?

One place I saw an issue was with auxtrace events. If there's an event 
name clash then you're likely to not be able to open the file afterwards 
because the auxtrace code can't handle an event that didn't open. But I 
don't know of any name clashes there (I just faked one for testing), and 
maybe that could be fixed up later in the auxtrace code if there is ever 
a real one.

Other than the above it does seem to work ok.

> The issue with legacy events is that on RISC-V they want the driver to
> not have mappings from legacy to non-legacy config encodings for each
> vendor/model due to size, complexity and difficulty to update. It was
> reported that on ARM Apple-M? CPUs the legacy mapping in the driver
> was broken and the sysfs/json events should always take precedent,
> however, it isn't clear this is still the case. It is the case that
> without working around this issue a legacy event like cycles without a
> PMU can encode differently than when specified with a PMU - the
> non-PMU version favoring legacy encodings, the PMU one avoiding legacy
> encodings.
> 
> The patch removes events and then adjusts the idx value for each
> evsel. This is done so that the dense xyarrays used for file
> descriptors, etc. don't contain broken entries. As event opening
> happens relatively late in the record process, use of the idx value
> before the open will have become corrupted, so it is expected there
> are latent bugs hidden behind this change - the change is best
> effort. As the only vendor that has broken event names is ARM, this
> will principally effect ARM users. They will also experience warning
> messages like those above because of the uncore PMU advertising legacy
> event names.



> 
> Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
>   tools/perf/builtin-record.c | 22 +++++++++++++++++-----
>   1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> index f83252472921..7e99743f7e42 100644
> --- a/tools/perf/builtin-record.c
> +++ b/tools/perf/builtin-record.c
> @@ -1364,6 +1364,7 @@ static int record__open(struct record *rec)
>   	struct perf_session *session = rec->session;
>   	struct record_opts *opts = &rec->opts;
>   	int rc = 0;
> +	bool skipped = false;
>   
>   	evlist__for_each_entry(evlist, pos) {
>   try_again:
> @@ -1379,15 +1380,26 @@ static int record__open(struct record *rec)
>   			        pos = evlist__reset_weak_group(evlist, pos, true);
>   				goto try_again;
>   			}
> -			rc = -errno;
>   			evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
> -			ui__error("%s\n", msg);
> -			goto out;
> +			ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
> +				  evsel__name(pos), evsel__pmu_name(pos), msg);
> +			pos->skippable = true;
> +			skipped = true;
> +		} else {
> +			pos->supported = true;
>   		}
> -
> -		pos->supported = true;
>   	}
>   
> +	if (skipped) {
> +		struct evsel *tmp;
> +		int idx = 0;
> +
> +		evlist__for_each_entry_safe(evlist, tmp, pos) {
> +			if (pos->skippable)
> +				evlist__remove(evlist, pos);
> +			pos->core.idx = idx++;
> +		}
> +	}
>   	if (symbol_conf.kptr_restrict && !evlist__exclude_kernel(evlist)) {
>   		pr_warning(
>   "WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,\n"


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 3/4] perf record: Skip don't fail for events that don't open
  2024-11-11 15:49   ` James Clark
@ 2024-11-11 17:00     ` Ian Rogers
  2024-11-12 14:12       ` James Clark
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Rogers @ 2024-11-11 17:00 UTC (permalink / raw)
  To: James Clark
  Cc: Atish Patra, Namhyung Kim, Arnaldo Carvalho de Melo, linux-riscv,
	beeman, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Ze Gao,
	Weilin Wang, Ben Gainey, Dominique Martinet, Junhao He,
	linux-perf-users, linux-kernel

On Mon, Nov 11, 2024 at 7:49 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 26/10/2024 1:17 pm, Ian Rogers wrote:
> > Whilst for many tools it is an expected behavior that failure to open
> > a perf event is a failure, ARM decided to name PMU events the same as
> > legacy events and then failed to rename such events on a server uncore
> > SLC PMU. As perf's default behavior when no PMU is specified is to
> > open the event on all PMUs that advertise/"have" the event, this
> > yielded failures when trying to make the priority of legacy and
> > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > legacy event user on ARM hardware may find their event opened on an
> > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > such events which this patch implements. Rather than have the skipping
> > conditional on running on ARM, the skipping is done on all
> > architectures as such a fundamental behavioral difference could lead
> > to problems with tools built/depending on perf.
> >
> > An example of perf record failing to open events on x86 is:
> > ```
> > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > Error:
> > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > "dmesg | grep -i perf" may provide additional information.
> >
> > Error:
> > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > "dmesg | grep -i perf" may provide additional information.
> >
>
> This makes me wonder if this message was overly wordy to begin with.
> This line is fine:
>
>   Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0'
>   which will be removed.
>
> The next bit about the syscall just repeats. The exit code could be
> included on the previous line. And the dmesg bit is general advice that
> could possibly be printed once at the end.

Thanks for the time spent investigating this!

I agree on this. I wonder if we should have short and long messages,
or message+help like we do for parse events. One patch series like
this is improving EBUSY:
https://lore.kernel.org/lkml/20241106003007.2112584-1-ctshao@google.com/
The issue with printing at the end is knowing where/when the end is.
Printing once is easy enough.

> > Error:
> > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > The LLC-prefetch-read event is not supported.
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> >
> > $ perf report --stats
> > Aggregated stats:
> >                 TOTAL events:      17255
> >                  MMAP events:        284  ( 1.6%)
> >                  COMM events:       1961  (11.4%)
> >                  EXIT events:          1  ( 0.0%)
> >                  FORK events:       1960  (11.4%)
> >                SAMPLE events:         87  ( 0.5%)
> >                 MMAP2 events:      12836  (74.4%)
> >               KSYMBOL events:         83  ( 0.5%)
> >             BPF_EVENT events:         36  ( 0.2%)
> >        FINISHED_ROUND events:          2  ( 0.0%)
> >              ID_INDEX events:          1  ( 0.0%)
> >            THREAD_MAP events:          1  ( 0.0%)
> >               CPU_MAP events:          1  ( 0.0%)
> >             TIME_CONV events:          1  ( 0.0%)
> >         FINISHED_INIT events:          1  ( 0.0%)
> > cycles stats:
> >                SAMPLE events:         87
> > ```
> >
> > Note, if all events fail to open then the data file will contain no
> > samples. This is deliberate as at the point the events are opened
> > there are other events, such as the dummy event for sideband data, and
> > these events will succeed in opening even if the user specified ones
>
> Is a file with only sideband events useful? Is it possible to fail the
> record command if the event doesn't open anywhere?
>
> I noticed this leads to some different behavior and a libperf warning
> when you have paranoid=3:
>
>    $ perf record -e cycles -C 0 -- true
>
>    Error:
>    Failure to open event 'cpu_atom/cycles/u' on PMU 'cpu_atom' which will
>    be removed.
>    ...
>    Consider adjusting /proc/sys/kernel/perf_event_paranoid setting
>    ...
>    libperf: Miscounted nr_mmaps 0 vs 28
>    WARNING: No sample_id_all support, falling back to unordered
>    processing
>    [ perf record: Woken up 1 times to write data ]
>    [ perf record: Captured and wrote 0.021 MB perf.data ]

So paranoid=3 is a Debian extension that Peter Z doesn't agree with
and so isn't enabled in regular kernels. So we're dealing with cycles,
which we'd expect to open or fall-back to a software event like
task-clock. It feels like things could happen better here but that's
not necessarily the fault of this patch.

> > don't. Having a mix of open and broken events leads to a problem of
> > identifying different sources of events.
>  >
>
> In my basic test I saw that the opened event was identified correctly in
> perf report, unless you have an example that you encountered that we
> should fix?

I didn't find any but we also don't tend to test failing to open
events. I could imagine things failing in `perf test` on neoverse
testing given the l3 advertising the cycles event.

> One place I saw an issue was with auxtrace events. If there's an event
> name clash then you're likely to not be able to open the file afterwards
> because the auxtrace code can't handle an event that didn't open. But I
> don't know of any name clashes there (I just faked one for testing), and
> maybe that could be fixed up later in the auxtrace code if there is ever
> a real one.
>
> Other than the above it does seem to work ok.

Cool, can this be taken as a Tested-by?

Thanks,
Ian

> > The issue with legacy events is that on RISC-V they want the driver to
> > not have mappings from legacy to non-legacy config encodings for each
> > vendor/model due to size, complexity and difficulty to update. It was
> > reported that on ARM Apple-M? CPUs the legacy mapping in the driver
> > was broken and the sysfs/json events should always take precedent,
> > however, it isn't clear this is still the case. It is the case that
> > without working around this issue a legacy event like cycles without a
> > PMU can encode differently than when specified with a PMU - the
> > non-PMU version favoring legacy encodings, the PMU one avoiding legacy
> > encodings.
> >
> > The patch removes events and then adjusts the idx value for each
> > evsel. This is done so that the dense xyarrays used for file
> > descriptors, etc. don't contain broken entries. As event opening
> > happens relatively late in the record process, use of the idx value
> > before the open will have become corrupted, so it is expected there
> > are latent bugs hidden behind this change - the change is best
> > effort. As the only vendor that has broken event names is ARM, this
> > will principally effect ARM users. They will also experience warning
> > messages like those above because of the uncore PMU advertising legacy
> > event names.
>
>
>
> >
> > Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> >   tools/perf/builtin-record.c | 22 +++++++++++++++++-----
> >   1 file changed, 17 insertions(+), 5 deletions(-)
> >
> > diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> > index f83252472921..7e99743f7e42 100644
> > --- a/tools/perf/builtin-record.c
> > +++ b/tools/perf/builtin-record.c
> > @@ -1364,6 +1364,7 @@ static int record__open(struct record *rec)
> >       struct perf_session *session = rec->session;
> >       struct record_opts *opts = &rec->opts;
> >       int rc = 0;
> > +     bool skipped = false;
> >
> >       evlist__for_each_entry(evlist, pos) {
> >   try_again:
> > @@ -1379,15 +1380,26 @@ static int record__open(struct record *rec)
> >                               pos = evlist__reset_weak_group(evlist, pos, true);
> >                               goto try_again;
> >                       }
> > -                     rc = -errno;
> >                       evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
> > -                     ui__error("%s\n", msg);
> > -                     goto out;
> > +                     ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
> > +                               evsel__name(pos), evsel__pmu_name(pos), msg);
> > +                     pos->skippable = true;
> > +                     skipped = true;
> > +             } else {
> > +                     pos->supported = true;
> >               }
> > -
> > -             pos->supported = true;
> >       }
> >
> > +     if (skipped) {
> > +             struct evsel *tmp;
> > +             int idx = 0;
> > +
> > +             evlist__for_each_entry_safe(evlist, tmp, pos) {
> > +                     if (pos->skippable)
> > +                             evlist__remove(evlist, pos);
> > +                     pos->core.idx = idx++;
> > +             }
> > +     }
> >       if (symbol_conf.kptr_restrict && !evlist__exclude_kernel(evlist)) {
> >               pr_warning(
> >   "WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,\n"
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2024-11-11 10:45       ` James Clark
@ 2024-11-11 17:19         ` Ian Rogers
  2024-11-11 23:38           ` Atish Kumar Patra
  2024-11-12 14:21           ` James Clark
  0 siblings, 2 replies; 24+ messages in thread
From: Ian Rogers @ 2024-11-11 17:19 UTC (permalink / raw)
  To: James Clark
  Cc: Atish Kumar Patra, linux-perf-users, linux-kernel, Ben Gainey,
	Junhao He, linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa,
	Alexander Shishkin, Kan Liang, Ze Gao, Weilin Wang,
	Dominique Martinet

On Mon, Nov 11, 2024 at 2:45 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 08/11/2024 6:37 pm, Atish Kumar Patra wrote:
> > On Fri, Nov 8, 2024 at 4:16 AM James Clark <james.clark@linaro.org> wrote:
> >>
> >>
> >>
> >> On 07/11/2024 18:51, Ian Rogers wrote:
> >>> On Sat, Oct 26, 2024 at 5:18 AM Ian Rogers <irogers@google.com> wrote:
> >>>>
> >>>> At the RISC-V summit the topic of avoiding event data being in the
> >>>> RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> >>>> events being the priority when no PMU is provided so that legacy
> >>>> events maybe supported via json. Originally Mark Rutland also
> >>>> expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> >>>> M? processors, but James Clark more recently tested this and believes
> >>>> the driver issues there may not have existed or have been resolved. In
> >>>> any case, it is inconsistent that with a PMU event names avoid legacy
> >>>> encodings, but when wildcarding PMUs (ie without a PMU with the event
> >>>> name) the legacy encodings have priority.
> >>>>
> >>>> The patch doing this work was reverted in a v6.10 release candidate
> >>>> as, even though the patch was posted for weeks and had been on
> >>>> linux-next for weeks without issue, Linus was in the habit of using
> >>>> explicit legacy events with unsupported precision options on his
> >>>> Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> >>>> where ARM decided to call the events bus_cycles and cycles, the latter
> >>>> being also a legacy event name. ARM haven't renamed the cycles event
> >>>> to a more consistent cpu_cycles and avoided the problem. With these
> >>>> changes the problematic event will now be skipped, a large warning
> >>>> produced, and perf record will continue for the other PMU events. This
> >>>> solution was proposed by Arnaldo.
> >>>>
> >>>> Two minor changes have been added to help with the error message and
> >>>> to work around issues occurring with "perf stat metrics (shadow stat)
> >>>> test".
> >>>>
> >>>> The patches have only been tested on my x86 non-hybrid laptop.
> >>>
> >>> Hi Atish and James,
> >>>
> >>> Could I get your tags for this series?
> >>>
> >
> > Hi Ian,
> > Thanks for your patches. It definitely helps to have a clean slate
> > implementation
> > for the perf tool. However, I have some open questions about other use cases
> > that we discussed during the RVI Summit.
> >
> >>> The patches were originally motivated by wanting to make the behavior
> >>> of events parsed like "cycles" match that of "cpu/cycles/", the PMU is
> >>> wildcarded to "cpu" in the first case. This was divergent because of
> >>> ARM we switched from preferring legacy (type = PERF_TYPE_HARDWARE,
> >>> config = PERF_COUNT_HW_CPU_CYCLES) to sysfs/json (type=<core PMU's
> >>> type>, config=<encoding from event>) when a PMU name was given. This
> >>> aligns with RISC-V wanting to use json encodings to avoid complexity
> >>> in the PMU driver.
> >>>
> >>
> >> I couldn't find the thread, but I remember fairly recently it was
> >> mentioned that RISC-V would be supporting the legacy events after all,
> >> maybe it was a comment from Atish? I'm not sure if that changes the
> >> requirements for this or not?
> >>
> >> I still can't really imagine how tooling would work if every tool has to
> >> maintain the mappings of basic events like instructions and branches.
> >> For example all the perf_event_open tests in ltp use the legacy events.
> >>
> >
> > No it has not changed. While this series helps to avoid clunky vendor
> > specific encodings
> > in the driver for perf tool, I am still unsure how we will manage
> > other applications
> > (directly passing legacy events through perf_event_open or
> > perf_evlist__open) will work.
> >
> > I have only anecdotal data about folks relying perf legacy events
> > directly to profile
> > their application. All of them use mostly cycle/instruction events though.
> > Are there any users who use other legacy events directly without perf tool ?
> >
> > If not, we may have only cycle/instruction mapping in the driver and
> > rely on json for everything else.
> > The other use case is virtualization. I have been playing with these
> > patches to find a clean solution to
> > enable all the use cases. If you have any other ideas, please let me know.
> >
>
> Yeah I would expect it's mostly cycles and instructions. I searched a
> bit for PERF_COUNT_HW_BRANCH_MISSES and only found tool/developer type
> usages, which I suppose we could expect to have to handle the mappings
> like perf. Although it's not the easiest thing to search for and
> obviously that only includes open source.
>
> Usages do exist though, there are people posting on stack overflow using
> it, and other various bug tracker listings. So you would expect those
> same users to have to use raw event codes and some switch statement to
> pick the right one for their hardware, or use Perf.

I don't have any magic to solve this. My thoughts:

1) I thought legacy events were just going to hang around forever,
although the name hinting they'd kind of been deprecated. At LPC '23
Atish and Mark Rutland originally asked for the sysfs/json to be the
preference. I thought Kan was going to push back given the upheaval,
especially updating every test expectation. It went through but we're
in this weird state where wildcard events are encoded using legacy and
PMU specifying events aren't. What this series hopes to solve.

2) I think it is important that perf tool be a reference
implementation where others can look (strace, etc.) to base their
implementation. Moving perf to sysfs/json is 1 step closer to legacy
event deprecation. Please yell if deprecation isn't what is wanted as
personally other than cleanliness I don't mind. If we do decide not to
have sysfs/json be the priority then I think it sensible to revert the
changes making it the priority for events that specify a PMU. I'd like
some level of consistency.

3) I'd like event parsing to be a library that is as easy to link
against as libbpf (i.e. not a viral license). Event parsing is really
just mapping an event name to 1 or more perf_event_attr. The PMU
abstraction is tied into the event parsing, but this has only been
more true recently. The evsel/evlist is tied into event parsing but
that feels separable. As most json isn't distributed under a viral
license this feels achievable, then I suspect most tools can use this
library rather than have to reinvent a wheel.

Still looking for tags :-)

Thanks,
Ian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2024-10-26 12:17 [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
                   ` (4 preceding siblings ...)
  2024-11-07 18:51 ` [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
@ 2024-11-11 23:23 ` Atish Kumar Patra
  5 siblings, 0 replies; 24+ messages in thread
From: Atish Kumar Patra @ 2024-11-11 23:23 UTC (permalink / raw)
  To: Ian Rogers
  Cc: linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang,
	James Clark, Ze Gao, Weilin Wang, Ben Gainey, Dominique Martinet,
	Junhao He, linux-perf-users, linux-kernel

On Sat, Oct 26, 2024 at 5:18 AM Ian Rogers <irogers@google.com> wrote:
>
> At the RISC-V summit the topic of avoiding event data being in the
> RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> events being the priority when no PMU is provided so that legacy
> events maybe supported via json. Originally Mark Rutland also
> expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> M? processors, but James Clark more recently tested this and believes
> the driver issues there may not have existed or have been resolved. In
> any case, it is inconsistent that with a PMU event names avoid legacy
> encodings, but when wildcarding PMUs (ie without a PMU with the event
> name) the legacy encodings have priority.
>
> The patch doing this work was reverted in a v6.10 release candidate
> as, even though the patch was posted for weeks and had been on
> linux-next for weeks without issue, Linus was in the habit of using
> explicit legacy events with unsupported precision options on his
> Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> where ARM decided to call the events bus_cycles and cycles, the latter
> being also a legacy event name. ARM haven't renamed the cycles event
> to a more consistent cpu_cycles and avoided the problem. With these
> changes the problematic event will now be skipped, a large warning
> produced, and perf record will continue for the other PMU events. This
> solution was proposed by Arnaldo.
>
> Two minor changes have been added to help with the error message and
> to work around issues occurring with "perf stat metrics (shadow stat)
> test".
>
> The patches have only been tested on my x86 non-hybrid laptop.
>
> Ian Rogers (4):
>   perf evsel: Add pmu_name helper
>   perf stat: Fix find_stat for mixed legacy/non-legacy events
>   perf record: Skip don't fail for events that don't open
>   perf parse-events: Reapply "Prefer sysfs/JSON hardware events over
>     legacy"
>
>  tools/perf/builtin-record.c    | 22 +++++++---
>  tools/perf/util/evsel.c        | 10 +++++
>  tools/perf/util/evsel.h        |  1 +
>  tools/perf/util/parse-events.c | 26 +++++++++---
>  tools/perf/util/parse-events.l | 76 +++++++++++++++++-----------------
>  tools/perf/util/parse-events.y | 60 ++++++++++++++++++---------
>  tools/perf/util/pmus.c         | 20 +++++++--
>  tools/perf/util/stat-shadow.c  |  3 +-
>  8 files changed, 145 insertions(+), 73 deletions(-)
>

Thanks for the series. Tested on RISC-V qemu with counter delegation enabled.
I am able to get the event encodings specified in the json file
instead of legacy event encodings.

--------------------------------
root@debian:~/host# ./perf stat -e cycles -e instructions -vvv ls
Control descriptor is not initialized
Opening: cycles
------------------------------------------------------------
perf_event_attr:
  type                             4 (cpu)
  size                             136
  config                           0x1 (cycles)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  { bp_len, config2 }              0x7f9
------------------------------------------------------------
sys_perf_event_open: pid 424  cpu -1  group_fd -1  flags 0x8 = 3
Opening: instructions
------------------------------------------------------------
perf_event_attr:
  type                             4 (cpu)
  size                             136
  config                           0x2 (instructions)
  sample_type                      IDENTIFIER
  read_format                      TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
  disabled                         1
  inherit                          1
  enable_on_exec                   1
  { bp_len, config2 }              0x7fc
------------------------------------------------------------
sys_perf_event_open: pid 424  cpu -1  group_fd -1  flags 0x8 = 4
-----------------------------------------------------------------------------------

However, we may still have to provide the encodings in the driver for
vendors due to the reasons described in the
thread. Hopefully, we can get rid of legacy PMU events one day which
will allow us to remove those driver bindings in the future.

Tested-by: Atish Patra <atishp@rivosinc.com>

> --
> 2.47.0.163.g1226f6d8fa-goog
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2024-11-11 17:19         ` Ian Rogers
@ 2024-11-11 23:38           ` Atish Kumar Patra
  2024-11-12 14:21           ` James Clark
  1 sibling, 0 replies; 24+ messages in thread
From: Atish Kumar Patra @ 2024-11-11 23:38 UTC (permalink / raw)
  To: Ian Rogers
  Cc: James Clark, linux-perf-users, linux-kernel, Ben Gainey,
	Junhao He, linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa,
	Alexander Shishkin, Kan Liang, Ze Gao, Weilin Wang,
	Dominique Martinet

On Mon, Nov 11, 2024 at 9:19 AM Ian Rogers <irogers@google.com> wrote:
>
> On Mon, Nov 11, 2024 at 2:45 AM James Clark <james.clark@linaro.org> wrote:
> >
> >
> >
> > On 08/11/2024 6:37 pm, Atish Kumar Patra wrote:
> > > On Fri, Nov 8, 2024 at 4:16 AM James Clark <james.clark@linaro.org> wrote:
> > >>
> > >>
> > >>
> > >> On 07/11/2024 18:51, Ian Rogers wrote:
> > >>> On Sat, Oct 26, 2024 at 5:18 AM Ian Rogers <irogers@google.com> wrote:
> > >>>>
> > >>>> At the RISC-V summit the topic of avoiding event data being in the
> > >>>> RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> > >>>> events being the priority when no PMU is provided so that legacy
> > >>>> events maybe supported via json. Originally Mark Rutland also
> > >>>> expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> > >>>> M? processors, but James Clark more recently tested this and believes
> > >>>> the driver issues there may not have existed or have been resolved. In
> > >>>> any case, it is inconsistent that with a PMU event names avoid legacy
> > >>>> encodings, but when wildcarding PMUs (ie without a PMU with the event
> > >>>> name) the legacy encodings have priority.
> > >>>>
> > >>>> The patch doing this work was reverted in a v6.10 release candidate
> > >>>> as, even though the patch was posted for weeks and had been on
> > >>>> linux-next for weeks without issue, Linus was in the habit of using
> > >>>> explicit legacy events with unsupported precision options on his
> > >>>> Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> > >>>> where ARM decided to call the events bus_cycles and cycles, the latter
> > >>>> being also a legacy event name. ARM haven't renamed the cycles event
> > >>>> to a more consistent cpu_cycles and avoided the problem. With these
> > >>>> changes the problematic event will now be skipped, a large warning
> > >>>> produced, and perf record will continue for the other PMU events. This
> > >>>> solution was proposed by Arnaldo.
> > >>>>
> > >>>> Two minor changes have been added to help with the error message and
> > >>>> to work around issues occurring with "perf stat metrics (shadow stat)
> > >>>> test".
> > >>>>
> > >>>> The patches have only been tested on my x86 non-hybrid laptop.
> > >>>
> > >>> Hi Atish and James,
> > >>>
> > >>> Could I get your tags for this series?
> > >>>
> > >
> > > Hi Ian,
> > > Thanks for your patches. It definitely helps to have a clean slate
> > > implementation
> > > for the perf tool. However, I have some open questions about other use cases
> > > that we discussed during the RVI Summit.
> > >
> > >>> The patches were originally motivated by wanting to make the behavior
> > >>> of events parsed like "cycles" match that of "cpu/cycles/", the PMU is
> > >>> wildcarded to "cpu" in the first case. This was divergent because of
> > >>> ARM we switched from preferring legacy (type = PERF_TYPE_HARDWARE,
> > >>> config = PERF_COUNT_HW_CPU_CYCLES) to sysfs/json (type=<core PMU's
> > >>> type>, config=<encoding from event>) when a PMU name was given. This
> > >>> aligns with RISC-V wanting to use json encodings to avoid complexity
> > >>> in the PMU driver.
> > >>>
> > >>
> > >> I couldn't find the thread, but I remember fairly recently it was
> > >> mentioned that RISC-V would be supporting the legacy events after all,
> > >> maybe it was a comment from Atish? I'm not sure if that changes the
> > >> requirements for this or not?
> > >>
> > >> I still can't really imagine how tooling would work if every tool has to
> > >> maintain the mappings of basic events like instructions and branches.
> > >> For example all the perf_event_open tests in ltp use the legacy events.
> > >>
> > >
> > > No it has not changed. While this series helps to avoid clunky vendor
> > > specific encodings
> > > in the driver for perf tool, I am still unsure how we will manage
> > > other applications
> > > (directly passing legacy events through perf_event_open or
> > > perf_evlist__open) will work.
> > >
> > > I have only anecdotal data about folks relying perf legacy events
> > > directly to profile
> > > their application. All of them use mostly cycle/instruction events though.
> > > Are there any users who use other legacy events directly without perf tool ?
> > >
> > > If not, we may have only cycle/instruction mapping in the driver and
> > > rely on json for everything else.
> > > The other use case is virtualization. I have been playing with these
> > > patches to find a clean solution to
> > > enable all the use cases. If you have any other ideas, please let me know.
> > >
> >
> > Yeah I would expect it's mostly cycles and instructions. I searched a
> > bit for PERF_COUNT_HW_BRANCH_MISSES and only found tool/developer type
> > usages, which I suppose we could expect to have to handle the mappings
> > like perf. Although it's not the easiest thing to search for and
> > obviously that only includes open source.
> >
> > Usages do exist though, there are people posting on stack overflow using
> > it, and other various bug tracker listings. So you would expect those
> > same users to have to use raw event codes and some switch statement to
> > pick the right one for their hardware, or use Perf.
>
> I don't have any magic to solve this. My thoughts:
>
> 1) I thought legacy events were just going to hang around forever,
> although the name hinting they'd kind of been deprecated. At LPC '23
> Atish and Mark Rutland originally asked for the sysfs/json to be the
> preference. I thought Kan was going to push back given the upheaval,
> especially updating every test expectation. It went through but we're
> in this weird state where wildcard events are encoded using legacy and
> PMU specifying events aren't. What this series hopes to solve.
>
> 2) I think it is important that perf tool be a reference
> implementation where others can look (strace, etc.) to base their
> implementation. Moving perf to sysfs/json is 1 step closer to legacy
> event deprecation. Please yell if deprecation isn't what is wanted as
> personally other than cleanliness I don't mind. If we do decide not to
> have sysfs/json be the priority then I think it sensible to revert the
> changes making it the priority for events that specify a PMU. I'd like
> some level of consistency.
>
> 3) I'd like event parsing to be a library that is as easy to link
> against as libbpf (i.e. not a viral license). Event parsing is really
> just mapping an event name to 1 or more perf_event_attr. The PMU
> abstraction is tied into the event parsing, but this has only been
> more true recently. The evsel/evlist is tied into event parsing but
> that feels separable. As most json isn't distributed under a viral
> license this feels achievable, then I suspect most tools can use this
> library rather than have to reinvent a wheel.

That would be the ideal solution IMHO for any other tools.

For virtualization, the guest OS would pass the event attribute as RAW
if the matching json
file is present. The hypervisor would simply program that in mhpmevent CSR.

In absence of the json file, guest OS would pass the legacy event
encoding which must be
mapped to a vendor specific encoding present in the driver. Hopefully,
this can be removed
in the future once legacy events are deprecated.  Thoughts ?

>
> Still looking for tags :-)
>

Done. Sorry for the delay. Just wanted to ensure that this solutions
works properly
for all the above cases I mentioned.

> Thanks,
> Ian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 3/4] perf record: Skip don't fail for events that don't open
  2024-11-11 17:00     ` Ian Rogers
@ 2024-11-12 14:12       ` James Clark
  2024-11-12 16:25         ` Ian Rogers
  0 siblings, 1 reply; 24+ messages in thread
From: James Clark @ 2024-11-12 14:12 UTC (permalink / raw)
  To: Ian Rogers, Leo Yan
  Cc: Atish Patra, Namhyung Kim, Arnaldo Carvalho de Melo, linux-riscv,
	beeman, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Ze Gao,
	Weilin Wang, Ben Gainey, Dominique Martinet, Junhao He,
	linux-perf-users, linux-kernel



On 11/11/2024 5:00 pm, Ian Rogers wrote:
> On Mon, Nov 11, 2024 at 7:49 AM James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 26/10/2024 1:17 pm, Ian Rogers wrote:
>>> Whilst for many tools it is an expected behavior that failure to open
>>> a perf event is a failure, ARM decided to name PMU events the same as
>>> legacy events and then failed to rename such events on a server uncore
>>> SLC PMU. As perf's default behavior when no PMU is specified is to
>>> open the event on all PMUs that advertise/"have" the event, this
>>> yielded failures when trying to make the priority of legacy and
>>> sysfs/json events uniform - something requested by RISC-V and ARM. A
>>> legacy event user on ARM hardware may find their event opened on an
>>> uncore PMU which for perf record will fail. Arnaldo suggested skipping
>>> such events which this patch implements. Rather than have the skipping
>>> conditional on running on ARM, the skipping is done on all
>>> architectures as such a fundamental behavioral difference could lead
>>> to problems with tools built/depending on perf.
>>>
>>> An example of perf record failing to open events on x86 is:
>>> ```
>>> $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
>>> Error:
>>> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
>>> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
>>> "dmesg | grep -i perf" may provide additional information.
>>>
>>> Error:
>>> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
>>> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
>>> "dmesg | grep -i perf" may provide additional information.
>>>
>>
>> This makes me wonder if this message was overly wordy to begin with.
>> This line is fine:
>>
>>    Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0'
>>    which will be removed.
>>
>> The next bit about the syscall just repeats. The exit code could be
>> included on the previous line. And the dmesg bit is general advice that
>> could possibly be printed once at the end.
> 
> Thanks for the time spent investigating this!
> 
> I agree on this. I wonder if we should have short and long messages,
> or message+help like we do for parse events. One patch series like
> this is improving EBUSY:
> https://lore.kernel.org/lkml/20241106003007.2112584-1-ctshao@google.com/
> The issue with printing at the end is knowing where/when the end is.
> Printing once is easy enough.
> 
>>> Error:
>>> Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
>>> The LLC-prefetch-read event is not supported.
>>> [ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
>>>
>>> $ perf report --stats
>>> Aggregated stats:
>>>                  TOTAL events:      17255
>>>                   MMAP events:        284  ( 1.6%)
>>>                   COMM events:       1961  (11.4%)
>>>                   EXIT events:          1  ( 0.0%)
>>>                   FORK events:       1960  (11.4%)
>>>                 SAMPLE events:         87  ( 0.5%)
>>>                  MMAP2 events:      12836  (74.4%)
>>>                KSYMBOL events:         83  ( 0.5%)
>>>              BPF_EVENT events:         36  ( 0.2%)
>>>         FINISHED_ROUND events:          2  ( 0.0%)
>>>               ID_INDEX events:          1  ( 0.0%)
>>>             THREAD_MAP events:          1  ( 0.0%)
>>>                CPU_MAP events:          1  ( 0.0%)
>>>              TIME_CONV events:          1  ( 0.0%)
>>>          FINISHED_INIT events:          1  ( 0.0%)
>>> cycles stats:
>>>                 SAMPLE events:         87
>>> ```
>>>
>>> Note, if all events fail to open then the data file will contain no
>>> samples. This is deliberate as at the point the events are opened
>>> there are other events, such as the dummy event for sideband data, and
>>> these events will succeed in opening even if the user specified ones
>>
>> Is a file with only sideband events useful? Is it possible to fail the
>> record command if the event doesn't open anywhere?
>>
>> I noticed this leads to some different behavior and a libperf warning
>> when you have paranoid=3:
>>
>>     $ perf record -e cycles -C 0 -- true
>>
>>     Error:
>>     Failure to open event 'cpu_atom/cycles/u' on PMU 'cpu_atom' which will
>>     be removed.
>>     ...
>>     Consider adjusting /proc/sys/kernel/perf_event_paranoid setting
>>     ...
>>     libperf: Miscounted nr_mmaps 0 vs 28
>>     WARNING: No sample_id_all support, falling back to unordered
>>     processing
>>     [ perf record: Woken up 1 times to write data ]
>>     [ perf record: Captured and wrote 0.021 MB perf.data ]
> 
> So paranoid=3 is a Debian extension that Peter Z doesn't agree with
> and so isn't enabled in regular kernels. So we're dealing with cycles,
> which we'd expect to open or fall-back to a software event like
> task-clock. It feels like things could happen better here but that's
> not necessarily the fault of this patch.
> 

paranoid=3 is a bit of a red-herring. I actually meant any paranoid 
value other than -1. The important part is "-C 0" because only root is 
allowed to open on a CPU. Or "-a" also causes the issue.

Previously you got the command failure and the hint about the paranoid 
value, but now you get the libperf error too and it still makes the file.

I still don't think a file with only sideband events is useful, I don't 
think there's even a commandline that you can run currently that results 
in that? You always need at least one event to open for it to succeed 
and I'm thinking we could keep that.

It would be nice to fix the libperf warning too because it's warning 
about something that was already warned about.

>>> don't. Having a mix of open and broken events leads to a problem of
>>> identifying different sources of events.
>>   >
>>
>> In my basic test I saw that the opened event was identified correctly in
>> perf report, unless you have an example that you encountered that we
>> should fix?
> 
> I didn't find any but we also don't tend to test failing to open
> events. I could imagine things failing in `perf test` on neoverse
> testing given the l3 advertising the cycles event.
> 
>> One place I saw an issue was with auxtrace events. If there's an event
>> name clash then you're likely to not be able to open the file afterwards
>> because the auxtrace code can't handle an event that didn't open. But I
>> don't know of any name clashes there (I just faked one for testing), and
>> maybe that could be fixed up later in the auxtrace code if there is ever
>> a real one.
>>
>> Other than the above it does seem to work ok.
> 
> Cool, can this be taken as a Tested-by?
> 

Yep, although with a caveat that I faked the second cycles PMU that 
wouldn't open, but I don't think that should make a difference:

Tested-by: James Clark <james.clark@linaro.org>

I can also test with the real thing if you like but would take a bit 
longer as I have to borrow it off Leo.

> Thanks,
> Ian
> 
>>> The issue with legacy events is that on RISC-V they want the driver to
>>> not have mappings from legacy to non-legacy config encodings for each
>>> vendor/model due to size, complexity and difficulty to update. It was
>>> reported that on ARM Apple-M? CPUs the legacy mapping in the driver
>>> was broken and the sysfs/json events should always take precedent,
>>> however, it isn't clear this is still the case. It is the case that
>>> without working around this issue a legacy event like cycles without a
>>> PMU can encode differently than when specified with a PMU - the
>>> non-PMU version favoring legacy encodings, the PMU one avoiding legacy
>>> encodings.
>>>
>>> The patch removes events and then adjusts the idx value for each
>>> evsel. This is done so that the dense xyarrays used for file
>>> descriptors, etc. don't contain broken entries. As event opening
>>> happens relatively late in the record process, use of the idx value
>>> before the open will have become corrupted, so it is expected there
>>> are latent bugs hidden behind this change - the change is best
>>> effort. As the only vendor that has broken event names is ARM, this
>>> will principally effect ARM users. They will also experience warning
>>> messages like those above because of the uncore PMU advertising legacy
>>> event names.
>>
>>
>>
>>>
>>> Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
>>> Signed-off-by: Ian Rogers <irogers@google.com>
>>> ---
>>>    tools/perf/builtin-record.c | 22 +++++++++++++++++-----
>>>    1 file changed, 17 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
>>> index f83252472921..7e99743f7e42 100644
>>> --- a/tools/perf/builtin-record.c
>>> +++ b/tools/perf/builtin-record.c
>>> @@ -1364,6 +1364,7 @@ static int record__open(struct record *rec)
>>>        struct perf_session *session = rec->session;
>>>        struct record_opts *opts = &rec->opts;
>>>        int rc = 0;
>>> +     bool skipped = false;
>>>
>>>        evlist__for_each_entry(evlist, pos) {
>>>    try_again:
>>> @@ -1379,15 +1380,26 @@ static int record__open(struct record *rec)
>>>                                pos = evlist__reset_weak_group(evlist, pos, true);
>>>                                goto try_again;
>>>                        }
>>> -                     rc = -errno;
>>>                        evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
>>> -                     ui__error("%s\n", msg);
>>> -                     goto out;
>>> +                     ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
>>> +                               evsel__name(pos), evsel__pmu_name(pos), msg);
>>> +                     pos->skippable = true;
>>> +                     skipped = true;
>>> +             } else {
>>> +                     pos->supported = true;
>>>                }
>>> -
>>> -             pos->supported = true;
>>>        }
>>>
>>> +     if (skipped) {
>>> +             struct evsel *tmp;
>>> +             int idx = 0;
>>> +
>>> +             evlist__for_each_entry_safe(evlist, tmp, pos) {
>>> +                     if (pos->skippable)
>>> +                             evlist__remove(evlist, pos);
>>> +                     pos->core.idx = idx++;
>>> +             }
>>> +     }
>>>        if (symbol_conf.kptr_restrict && !evlist__exclude_kernel(evlist)) {
>>>                pr_warning(
>>>    "WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,\n"
>>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2024-11-11 17:19         ` Ian Rogers
  2024-11-11 23:38           ` Atish Kumar Patra
@ 2024-11-12 14:21           ` James Clark
  2024-11-12 17:23             ` Ian Rogers
  1 sibling, 1 reply; 24+ messages in thread
From: James Clark @ 2024-11-12 14:21 UTC (permalink / raw)
  To: Ian Rogers, Atish Kumar Patra
  Cc: linux-perf-users, linux-kernel, Ben Gainey, Junhao He,
	linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa,
	Alexander Shishkin, Kan Liang, Ze Gao, Weilin Wang,
	Dominique Martinet



On 11/11/2024 5:19 pm, Ian Rogers wrote:
> On Mon, Nov 11, 2024 at 2:45 AM James Clark <james.clark@linaro.org> wrote:
>>
>>
>>
>> On 08/11/2024 6:37 pm, Atish Kumar Patra wrote:
>>> On Fri, Nov 8, 2024 at 4:16 AM James Clark <james.clark@linaro.org> wrote:
>>>>
>>>>
>>>>
>>>> On 07/11/2024 18:51, Ian Rogers wrote:
>>>>> On Sat, Oct 26, 2024 at 5:18 AM Ian Rogers <irogers@google.com> wrote:
>>>>>>
>>>>>> At the RISC-V summit the topic of avoiding event data being in the
>>>>>> RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
>>>>>> events being the priority when no PMU is provided so that legacy
>>>>>> events maybe supported via json. Originally Mark Rutland also
>>>>>> expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
>>>>>> M? processors, but James Clark more recently tested this and believes
>>>>>> the driver issues there may not have existed or have been resolved. In
>>>>>> any case, it is inconsistent that with a PMU event names avoid legacy
>>>>>> encodings, but when wildcarding PMUs (ie without a PMU with the event
>>>>>> name) the legacy encodings have priority.
>>>>>>
>>>>>> The patch doing this work was reverted in a v6.10 release candidate
>>>>>> as, even though the patch was posted for weeks and had been on
>>>>>> linux-next for weeks without issue, Linus was in the habit of using
>>>>>> explicit legacy events with unsupported precision options on his
>>>>>> Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
>>>>>> where ARM decided to call the events bus_cycles and cycles, the latter
>>>>>> being also a legacy event name. ARM haven't renamed the cycles event
>>>>>> to a more consistent cpu_cycles and avoided the problem. With these
>>>>>> changes the problematic event will now be skipped, a large warning
>>>>>> produced, and perf record will continue for the other PMU events. This
>>>>>> solution was proposed by Arnaldo.
>>>>>>
>>>>>> Two minor changes have been added to help with the error message and
>>>>>> to work around issues occurring with "perf stat metrics (shadow stat)
>>>>>> test".
>>>>>>
>>>>>> The patches have only been tested on my x86 non-hybrid laptop.
>>>>>
>>>>> Hi Atish and James,
>>>>>
>>>>> Could I get your tags for this series?
>>>>>
>>>
>>> Hi Ian,
>>> Thanks for your patches. It definitely helps to have a clean slate
>>> implementation
>>> for the perf tool. However, I have some open questions about other use cases
>>> that we discussed during the RVI Summit.
>>>
>>>>> The patches were originally motivated by wanting to make the behavior
>>>>> of events parsed like "cycles" match that of "cpu/cycles/", the PMU is
>>>>> wildcarded to "cpu" in the first case. This was divergent because of
>>>>> ARM we switched from preferring legacy (type = PERF_TYPE_HARDWARE,
>>>>> config = PERF_COUNT_HW_CPU_CYCLES) to sysfs/json (type=<core PMU's
>>>>> type>, config=<encoding from event>) when a PMU name was given. This
>>>>> aligns with RISC-V wanting to use json encodings to avoid complexity
>>>>> in the PMU driver.
>>>>>
>>>>
>>>> I couldn't find the thread, but I remember fairly recently it was
>>>> mentioned that RISC-V would be supporting the legacy events after all,
>>>> maybe it was a comment from Atish? I'm not sure if that changes the
>>>> requirements for this or not?
>>>>
>>>> I still can't really imagine how tooling would work if every tool has to
>>>> maintain the mappings of basic events like instructions and branches.
>>>> For example all the perf_event_open tests in ltp use the legacy events.
>>>>
>>>
>>> No it has not changed. While this series helps to avoid clunky vendor
>>> specific encodings
>>> in the driver for perf tool, I am still unsure how we will manage
>>> other applications
>>> (directly passing legacy events through perf_event_open or
>>> perf_evlist__open) will work.
>>>
>>> I have only anecdotal data about folks relying perf legacy events
>>> directly to profile
>>> their application. All of them use mostly cycle/instruction events though.
>>> Are there any users who use other legacy events directly without perf tool ?
>>>
>>> If not, we may have only cycle/instruction mapping in the driver and
>>> rely on json for everything else.
>>> The other use case is virtualization. I have been playing with these
>>> patches to find a clean solution to
>>> enable all the use cases. If you have any other ideas, please let me know.
>>>
>>
>> Yeah I would expect it's mostly cycles and instructions. I searched a
>> bit for PERF_COUNT_HW_BRANCH_MISSES and only found tool/developer type
>> usages, which I suppose we could expect to have to handle the mappings
>> like perf. Although it's not the easiest thing to search for and
>> obviously that only includes open source.
>>
>> Usages do exist though, there are people posting on stack overflow using
>> it, and other various bug tracker listings. So you would expect those
>> same users to have to use raw event codes and some switch statement to
>> pick the right one for their hardware, or use Perf.
> 
> I don't have any magic to solve this. My thoughts:
> 
> 1) I thought legacy events were just going to hang around forever,
> although the name hinting they'd kind of been deprecated. At LPC '23
> Atish and Mark Rutland originally asked for the sysfs/json to be the
> preference. I thought Kan was going to push back given the upheaval,
> especially updating every test expectation. It went through but we're
> in this weird state where wildcard events are encoded using legacy and
> PMU specifying events aren't. What this series hopes to solve.
> 
> 2) I think it is important that perf tool be a reference
> implementation where others can look (strace, etc.) to base their
> implementation. Moving perf to sysfs/json is 1 step closer to legacy
> event deprecation. Please yell if deprecation isn't what is wanted as
> personally other than cleanliness I don't mind. If we do decide not to
> have sysfs/json be the priority then I think it sensible to revert the
> changes making it the priority for events that specify a PMU. I'd like
> some level of consistency.

Personally this change feels like it's encouraging fragmentation, the 
cleanest would be if RISC-V supports the legacy events like the other 
platforms. It's not a huge set of events anyway, and then existing 
software continues to work in addition to Perf continuing to work.

If we're still thinking that RISC-V will support the legacy events 
anyway in the future, then it weakens the argument to change this and 
risk any breakages that fall out of it.

I'm also not sure whether they really are legacy or just a common base 
set of events. Since the extended type thing was added even the legacy 
events support hybrid, so it's not like they expired from lack of 
features. Even if we claim they're legacy, is that really ever going to 
make them go away?

Although I admit nobody has come out with a definitive use case that 
doesn't use either the cycles or instructions events, so maybe its all fine.

> 
> 3) I'd like event parsing to be a library that is as easy to link
> against as libbpf (i.e. not a viral license). Event parsing is really
> just mapping an event name to 1 or more perf_event_attr. The PMU
> abstraction is tied into the event parsing, but this has only been
> more true recently. The evsel/evlist is tied into event parsing but
> that feels separable. As most json isn't distributed under a viral
> license this feels achievable, then I suspect most tools can use this
> library rather than have to reinvent a wheel.
> 
> Still looking for tags :-)
> 
> Thanks,
> Ian


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 3/4] perf record: Skip don't fail for events that don't open
  2024-11-12 14:12       ` James Clark
@ 2024-11-12 16:25         ` Ian Rogers
  0 siblings, 0 replies; 24+ messages in thread
From: Ian Rogers @ 2024-11-12 16:25 UTC (permalink / raw)
  To: James Clark
  Cc: Leo Yan, Atish Patra, Namhyung Kim, Arnaldo Carvalho de Melo,
	linux-riscv, beeman, Peter Zijlstra, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Ze Gao,
	Weilin Wang, Ben Gainey, Dominique Martinet, Junhao He,
	linux-perf-users, linux-kernel

On Tue, Nov 12, 2024 at 6:12 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 11/11/2024 5:00 pm, Ian Rogers wrote:
> > On Mon, Nov 11, 2024 at 7:49 AM James Clark <james.clark@linaro.org> wrote:
> >>
> >>
> >>
> >> On 26/10/2024 1:17 pm, Ian Rogers wrote:
> >>> Whilst for many tools it is an expected behavior that failure to open
> >>> a perf event is a failure, ARM decided to name PMU events the same as
> >>> legacy events and then failed to rename such events on a server uncore
> >>> SLC PMU. As perf's default behavior when no PMU is specified is to
> >>> open the event on all PMUs that advertise/"have" the event, this
> >>> yielded failures when trying to make the priority of legacy and
> >>> sysfs/json events uniform - something requested by RISC-V and ARM. A
> >>> legacy event user on ARM hardware may find their event opened on an
> >>> uncore PMU which for perf record will fail. Arnaldo suggested skipping
> >>> such events which this patch implements. Rather than have the skipping
> >>> conditional on running on ARM, the skipping is done on all
> >>> architectures as such a fundamental behavioral difference could lead
> >>> to problems with tools built/depending on perf.
> >>>
> >>> An example of perf record failing to open events on x86 is:
> >>> ```
> >>> $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> >>> Error:
> >>> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> >>> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> >>> "dmesg | grep -i perf" may provide additional information.
> >>>
> >>> Error:
> >>> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> >>> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> >>> "dmesg | grep -i perf" may provide additional information.
> >>>
> >>
> >> This makes me wonder if this message was overly wordy to begin with.
> >> This line is fine:
> >>
> >>    Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0'
> >>    which will be removed.
> >>
> >> The next bit about the syscall just repeats. The exit code could be
> >> included on the previous line. And the dmesg bit is general advice that
> >> could possibly be printed once at the end.
> >
> > Thanks for the time spent investigating this!
> >
> > I agree on this. I wonder if we should have short and long messages,
> > or message+help like we do for parse events. One patch series like
> > this is improving EBUSY:
> > https://lore.kernel.org/lkml/20241106003007.2112584-1-ctshao@google.com/
> > The issue with printing at the end is knowing where/when the end is.
> > Printing once is easy enough.
> >
> >>> Error:
> >>> Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> >>> The LLC-prefetch-read event is not supported.
> >>> [ perf record: Woken up 1 times to write data ]
> >>> [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> >>>
> >>> $ perf report --stats
> >>> Aggregated stats:
> >>>                  TOTAL events:      17255
> >>>                   MMAP events:        284  ( 1.6%)
> >>>                   COMM events:       1961  (11.4%)
> >>>                   EXIT events:          1  ( 0.0%)
> >>>                   FORK events:       1960  (11.4%)
> >>>                 SAMPLE events:         87  ( 0.5%)
> >>>                  MMAP2 events:      12836  (74.4%)
> >>>                KSYMBOL events:         83  ( 0.5%)
> >>>              BPF_EVENT events:         36  ( 0.2%)
> >>>         FINISHED_ROUND events:          2  ( 0.0%)
> >>>               ID_INDEX events:          1  ( 0.0%)
> >>>             THREAD_MAP events:          1  ( 0.0%)
> >>>                CPU_MAP events:          1  ( 0.0%)
> >>>              TIME_CONV events:          1  ( 0.0%)
> >>>          FINISHED_INIT events:          1  ( 0.0%)
> >>> cycles stats:
> >>>                 SAMPLE events:         87
> >>> ```
> >>>
> >>> Note, if all events fail to open then the data file will contain no
> >>> samples. This is deliberate as at the point the events are opened
> >>> there are other events, such as the dummy event for sideband data, and
> >>> these events will succeed in opening even if the user specified ones
> >>
> >> Is a file with only sideband events useful? Is it possible to fail the
> >> record command if the event doesn't open anywhere?
> >>
> >> I noticed this leads to some different behavior and a libperf warning
> >> when you have paranoid=3:
> >>
> >>     $ perf record -e cycles -C 0 -- true
> >>
> >>     Error:
> >>     Failure to open event 'cpu_atom/cycles/u' on PMU 'cpu_atom' which will
> >>     be removed.
> >>     ...
> >>     Consider adjusting /proc/sys/kernel/perf_event_paranoid setting
> >>     ...
> >>     libperf: Miscounted nr_mmaps 0 vs 28
> >>     WARNING: No sample_id_all support, falling back to unordered
> >>     processing
> >>     [ perf record: Woken up 1 times to write data ]
> >>     [ perf record: Captured and wrote 0.021 MB perf.data ]
> >
> > So paranoid=3 is a Debian extension that Peter Z doesn't agree with
> > and so isn't enabled in regular kernels. So we're dealing with cycles,
> > which we'd expect to open or fall-back to a software event like
> > task-clock. It feels like things could happen better here but that's
> > not necessarily the fault of this patch.
> >
>
> paranoid=3 is a bit of a red-herring. I actually meant any paranoid
> value other than -1. The important part is "-C 0" because only root is
> allowed to open on a CPU. Or "-a" also causes the issue.
>
> Previously you got the command failure and the hint about the paranoid
> value, but now you get the libperf error too and it still makes the file.
>
> I still don't think a file with only sideband events is useful, I don't
> think there's even a commandline that you can run currently that results
> in that? You always need at least one event to open for it to succeed
> and I'm thinking we could keep that.

You can currently do `perf record -e dummy ...` for example in this
ARM SPE test (I see similar in offcpu, attr test, intel_pt, switch
tracking):
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/tests/shell/test_arm_spe.sh?h=perf-tools-next#n94
If we add a condition that an evlist with only sideband events is
erroneous, and fail perf record, then I think that will break these
expectations. Any change like that would at least need to fix up the
tests, but the tests could be a good indicator that users are also
doing things like this and that may lead to complaints.

> It would be nice to fix the libperf warning too because it's warning
> about something that was already warned about.

Do you have something in mind to clean up the warning? Perhaps you can
send it as a patch.

> >>> don't. Having a mix of open and broken events leads to a problem of
> >>> identifying different sources of events.
> >>   >
> >>
> >> In my basic test I saw that the opened event was identified correctly in
> >> perf report, unless you have an example that you encountered that we
> >> should fix?
> >
> > I didn't find any but we also don't tend to test failing to open
> > events. I could imagine things failing in `perf test` on neoverse
> > testing given the l3 advertising the cycles event.
> >
> >> One place I saw an issue was with auxtrace events. If there's an event
> >> name clash then you're likely to not be able to open the file afterwards
> >> because the auxtrace code can't handle an event that didn't open. But I
> >> don't know of any name clashes there (I just faked one for testing), and
> >> maybe that could be fixed up later in the auxtrace code if there is ever
> >> a real one.
> >>
> >> Other than the above it does seem to work ok.
> >
> > Cool, can this be taken as a Tested-by?
> >
>
> Yep, although with a caveat that I faked the second cycles PMU that
> wouldn't open, but I don't think that should make a difference:
>
> Tested-by: James Clark <james.clark@linaro.org>

Thanks James, as this patch is before reapplying the "Prefer
sysfs/JSON hardware events over legacy" patch, can I take it that the
tag is for the whole series?

> I can also test with the real thing if you like but would take a bit
> longer as I have to borrow it off Leo.

More testing is always good.

Thanks,
Ian

> > Thanks,
> > Ian
> >
> >>> The issue with legacy events is that on RISC-V they want the driver to
> >>> not have mappings from legacy to non-legacy config encodings for each
> >>> vendor/model due to size, complexity and difficulty to update. It was
> >>> reported that on ARM Apple-M? CPUs the legacy mapping in the driver
> >>> was broken and the sysfs/json events should always take precedent,
> >>> however, it isn't clear this is still the case. It is the case that
> >>> without working around this issue a legacy event like cycles without a
> >>> PMU can encode differently than when specified with a PMU - the
> >>> non-PMU version favoring legacy encodings, the PMU one avoiding legacy
> >>> encodings.
> >>>
> >>> The patch removes events and then adjusts the idx value for each
> >>> evsel. This is done so that the dense xyarrays used for file
> >>> descriptors, etc. don't contain broken entries. As event opening
> >>> happens relatively late in the record process, use of the idx value
> >>> before the open will have become corrupted, so it is expected there
> >>> are latent bugs hidden behind this change - the change is best
> >>> effort. As the only vendor that has broken event names is ARM, this
> >>> will principally effect ARM users. They will also experience warning
> >>> messages like those above because of the uncore PMU advertising legacy
> >>> event names.
> >>
> >>
> >>
> >>>
> >>> Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org>
> >>> Signed-off-by: Ian Rogers <irogers@google.com>
> >>> ---
> >>>    tools/perf/builtin-record.c | 22 +++++++++++++++++-----
> >>>    1 file changed, 17 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
> >>> index f83252472921..7e99743f7e42 100644
> >>> --- a/tools/perf/builtin-record.c
> >>> +++ b/tools/perf/builtin-record.c
> >>> @@ -1364,6 +1364,7 @@ static int record__open(struct record *rec)
> >>>        struct perf_session *session = rec->session;
> >>>        struct record_opts *opts = &rec->opts;
> >>>        int rc = 0;
> >>> +     bool skipped = false;
> >>>
> >>>        evlist__for_each_entry(evlist, pos) {
> >>>    try_again:
> >>> @@ -1379,15 +1380,26 @@ static int record__open(struct record *rec)
> >>>                                pos = evlist__reset_weak_group(evlist, pos, true);
> >>>                                goto try_again;
> >>>                        }
> >>> -                     rc = -errno;
> >>>                        evsel__open_strerror(pos, &opts->target, errno, msg, sizeof(msg));
> >>> -                     ui__error("%s\n", msg);
> >>> -                     goto out;
> >>> +                     ui__error("Failure to open event '%s' on PMU '%s' which will be removed.\n%s\n",
> >>> +                               evsel__name(pos), evsel__pmu_name(pos), msg);
> >>> +                     pos->skippable = true;
> >>> +                     skipped = true;
> >>> +             } else {
> >>> +                     pos->supported = true;
> >>>                }
> >>> -
> >>> -             pos->supported = true;
> >>>        }
> >>>
> >>> +     if (skipped) {
> >>> +             struct evsel *tmp;
> >>> +             int idx = 0;
> >>> +
> >>> +             evlist__for_each_entry_safe(evlist, tmp, pos) {
> >>> +                     if (pos->skippable)
> >>> +                             evlist__remove(evlist, pos);
> >>> +                     pos->core.idx = idx++;
> >>> +             }
> >>> +     }
> >>>        if (symbol_conf.kptr_restrict && !evlist__exclude_kernel(evlist)) {
> >>>                pr_warning(
> >>>    "WARNING: Kernel address maps (/proc/{kallsyms,modules}) are restricted,\n"
> >>
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2024-11-12 14:21           ` James Clark
@ 2024-11-12 17:23             ` Ian Rogers
  2024-11-12 19:55               ` Atish Kumar Patra
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Rogers @ 2024-11-12 17:23 UTC (permalink / raw)
  To: James Clark
  Cc: Atish Kumar Patra, linux-perf-users, linux-kernel, Ben Gainey,
	Junhao He, linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa,
	Alexander Shishkin, Kan Liang, Ze Gao, Weilin Wang,
	Dominique Martinet

On Tue, Nov 12, 2024 at 6:22 AM James Clark <james.clark@linaro.org> wrote:
>
> On 11/11/2024 5:19 pm, Ian Rogers wrote:
> > On Mon, Nov 11, 2024 at 2:45 AM James Clark <james.clark@linaro.org> wrote:
> >>
> >> On 08/11/2024 6:37 pm, Atish Kumar Patra wrote:
> >>> On Fri, Nov 8, 2024 at 4:16 AM James Clark <james.clark@linaro.org> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 07/11/2024 18:51, Ian Rogers wrote:
> >>>>> On Sat, Oct 26, 2024 at 5:18 AM Ian Rogers <irogers@google.com> wrote:
> >>>>>>
> >>>>>> At the RISC-V summit the topic of avoiding event data being in the
> >>>>>> RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> >>>>>> events being the priority when no PMU is provided so that legacy
> >>>>>> events maybe supported via json. Originally Mark Rutland also
> >>>>>> expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> >>>>>> M? processors, but James Clark more recently tested this and believes
> >>>>>> the driver issues there may not have existed or have been resolved. In
> >>>>>> any case, it is inconsistent that with a PMU event names avoid legacy
> >>>>>> encodings, but when wildcarding PMUs (ie without a PMU with the event
> >>>>>> name) the legacy encodings have priority.
> >>>>>>
> >>>>>> The patch doing this work was reverted in a v6.10 release candidate
> >>>>>> as, even though the patch was posted for weeks and had been on
> >>>>>> linux-next for weeks without issue, Linus was in the habit of using
> >>>>>> explicit legacy events with unsupported precision options on his
> >>>>>> Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> >>>>>> where ARM decided to call the events bus_cycles and cycles, the latter
> >>>>>> being also a legacy event name. ARM haven't renamed the cycles event
> >>>>>> to a more consistent cpu_cycles and avoided the problem. With these
> >>>>>> changes the problematic event will now be skipped, a large warning
> >>>>>> produced, and perf record will continue for the other PMU events. This
> >>>>>> solution was proposed by Arnaldo.
> >>>>>>
> >>>>>> Two minor changes have been added to help with the error message and
> >>>>>> to work around issues occurring with "perf stat metrics (shadow stat)
> >>>>>> test".
> >>>>>>
> >>>>>> The patches have only been tested on my x86 non-hybrid laptop.
> >>>>>
> >>>>> Hi Atish and James,
> >>>>>
> >>>>> Could I get your tags for this series?
> >>>>>
> >>>
> >>> Hi Ian,
> >>> Thanks for your patches. It definitely helps to have a clean slate
> >>> implementation
> >>> for the perf tool. However, I have some open questions about other use cases
> >>> that we discussed during the RVI Summit.
> >>>
> >>>>> The patches were originally motivated by wanting to make the behavior
> >>>>> of events parsed like "cycles" match that of "cpu/cycles/", the PMU is
> >>>>> wildcarded to "cpu" in the first case. This was divergent because of
> >>>>> ARM we switched from preferring legacy (type = PERF_TYPE_HARDWARE,
> >>>>> config = PERF_COUNT_HW_CPU_CYCLES) to sysfs/json (type=<core PMU's
> >>>>> type>, config=<encoding from event>) when a PMU name was given. This
> >>>>> aligns with RISC-V wanting to use json encodings to avoid complexity
> >>>>> in the PMU driver.
> >>>>>
> >>>>
> >>>> I couldn't find the thread, but I remember fairly recently it was
> >>>> mentioned that RISC-V would be supporting the legacy events after all,
> >>>> maybe it was a comment from Atish? I'm not sure if that changes the
> >>>> requirements for this or not?
> >>>>
> >>>> I still can't really imagine how tooling would work if every tool has to
> >>>> maintain the mappings of basic events like instructions and branches.
> >>>> For example all the perf_event_open tests in ltp use the legacy events.
> >>>>
> >>>
> >>> No it has not changed. While this series helps to avoid clunky vendor
> >>> specific encodings
> >>> in the driver for perf tool, I am still unsure how we will manage
> >>> other applications
> >>> (directly passing legacy events through perf_event_open or
> >>> perf_evlist__open) will work.
> >>>
> >>> I have only anecdotal data about folks relying perf legacy events
> >>> directly to profile
> >>> their application. All of them use mostly cycle/instruction events though.
> >>> Are there any users who use other legacy events directly without perf tool ?
> >>>
> >>> If not, we may have only cycle/instruction mapping in the driver and
> >>> rely on json for everything else.
> >>> The other use case is virtualization. I have been playing with these
> >>> patches to find a clean solution to
> >>> enable all the use cases. If you have any other ideas, please let me know.
> >>>
> >>
> >> Yeah I would expect it's mostly cycles and instructions. I searched a
> >> bit for PERF_COUNT_HW_BRANCH_MISSES and only found tool/developer type
> >> usages, which I suppose we could expect to have to handle the mappings
> >> like perf. Although it's not the easiest thing to search for and
> >> obviously that only includes open source.
> >>
> >> Usages do exist though, there are people posting on stack overflow using
> >> it, and other various bug tracker listings. So you would expect those
> >> same users to have to use raw event codes and some switch statement to
> >> pick the right one for their hardware, or use Perf.
> >
> > I don't have any magic to solve this. My thoughts:
> >
> > 1) I thought legacy events were just going to hang around forever,
> > although the name hinting they'd kind of been deprecated. At LPC '23
> > Atish and Mark Rutland originally asked for the sysfs/json to be the
> > preference. I thought Kan was going to push back given the upheaval,
> > especially updating every test expectation. It went through but we're
> > in this weird state where wildcard events are encoded using legacy and
> > PMU specifying events aren't. What this series hopes to solve.
> >
> > 2) I think it is important that perf tool be a reference
> > implementation where others can look (strace, etc.) to base their
> > implementation. Moving perf to sysfs/json is 1 step closer to legacy
> > event deprecation. Please yell if deprecation isn't what is wanted as
> > personally other than cleanliness I don't mind. If we do decide not to
> > have sysfs/json be the priority then I think it sensible to revert the
> > changes making it the priority for events that specify a PMU. I'd like
> > some level of consistency.
>
> Personally this change feels like it's encouraging fragmentation, the
> cleanest would be if RISC-V supports the legacy events like the other
> platforms. It's not a huge set of events anyway, and then existing
> software continues to work in addition to Perf continuing to work.

So I try to be agnostic on the issue but we got to this point because
of events being broken on ARM Apple chips. I fixed an issue where the
ARM core PMU appeared as an uncore PMU as ARM's PMU naming differs
from everyone else's. That fix made it so we used legacy events for
the ARM core PMU, the PMU driver didn't handle this correctly (at the
time) on ARM Apple Linux breaking perf over multiple Linux releases
affecting a number of users.

Having a pool of standard events isn't on the face of it a crazy idea,
I'll try to remember things that have come up:

1) instructions and cycles seem like fairly easy events to agree upon.
However, we also have cpu-cycles that means cycles. With legacy events
the hyphen is often used as a PMU separator. Now what does the cpu
mean here? On ARM you also have cpu_cycles as a sysfs event, ie an
underscore and not a hyphen:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/drivers/perf/arm_pmuv3.c?h=perf-tools-next#n192
What does 'cpu' mean when ARM has no core PMU called 'cpu' and in
heterogeneous systems there are multiple notions of 'cpu' for the core
PMU.
What's my point? Just trying to make a minimal 2 event set of common
events with the way things currently parse leads to ambiguity.

2) missing/ambiguous legacy events. Beyond instructions and cycles
there are cache references, cache misses, branch instructions, branch
misses, bus cycles, stalled cycles frontend, stalled cycles backend
and reference CPU cycles. This set of events captures an out-of-order
machine with a single bus, but even then there's ambiguity over
whether the events include speculatively executed instructions or not.
What's my point? Using these events leads to unpredictable counts in
software. With a sysfs/json event there is a description but none was
ever given about what these legacy events should do, or do if you have
a system with >1 bus, counts for speculatively executed instructions,
isn't out-of-order, etc.

3) "legacy cache" events never really took off. The legacy cache event
names encoded data around speculation and appeared less ambiguous, a
good source for all the names supported by perf is here:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evsel.c?h=perf-tools-next#n734
but choices were made like prefetch is the same as a speculative read
in the encoding. Vendors no doubt agree and disagree on this. LLC and
L2 are synonyms, but today LLC generally means L3 and often an
entirely separate PMU.
What's my point? The code tried to do a smarter less ambiguous thing
than PERF_TYPE_HARDWARE with PERF_TYPE_HW_CACHE but again it was
flawed, carried too many architectural assumptions and without the
drivers adopting it neither did users.

4) PMU drivers mapping issues and the aforementioned ARM Apple Linux
breakage, RISC-V wanting to keep mappings out of a more simplistic
driver.

So I wasn't there for all of this, but I think this is what led to the
legacy events effectively becoming deprecated.

> If we're still thinking that RISC-V will support the legacy events
> anyway in the future, then it weakens the argument to change this and
> risk any breakages that fall out of it.

Agreed, that's why I keep saying I'm only doing this because ARM and
RISC-V want it. What I want is that for things like the encoding of
"cpu-cycles" and "cpu/cpu-cycles/" when there is only 1 PMU providing
such an event called "cpu" the perf_event_attr encodings should match,
we don't have this today but did prior to changing the priority to fix
the ARM Apple Linux issue.

I think for what Atish is asking for perhaps the best route is to add
vendor standard events like ARM has. Anything done in Linux won't
carry over to other operating systems you can be trying to virtualize.
This doesn't to me feel like a problem we should be trying to solve
either in the kernel/driver or the tool, well I've not heard an idea
how we can.

> I'm also not sure whether they really are legacy or just a common base
> set of events. Since the extended type thing was added even the legacy
> events support hybrid, so it's not like they expired from lack of
> features. Even if we claim they're legacy, is that really ever going to
> make them go away?

So for heterogeneous chips ARM first did BIG.little over ten years
ago. More recently Intel did hybrid and it wasn't until Intel added
hybrid support for legacy events that the idea of this worked. As you
know, ARM''s support came later. Does software outside of the perf
tool know to encode information in the extended type information in
legacy events? I doubt it. For example, libpfm4 has only recently been
adding support for heterogeneous CPUs.

So yes there are legacy events and they are a common pool of events
which is convenient for developers, however, this approach quickly
showed shortcomings and often lacked proper vendor support. My company
advocates developers against using common events, which admittedly is
more work for the developer, because we've had multiple issues over
the years primarily due to baked in assumptions.

Thanks,
Ian

> Although I admit nobody has come out with a definitive use case that
> doesn't use either the cycles or instructions events, so maybe its all fine.
>
> >
> > 3) I'd like event parsing to be a library that is as easy to link
> > against as libbpf (i.e. not a viral license). Event parsing is really
> > just mapping an event name to 1 or more perf_event_attr. The PMU
> > abstraction is tied into the event parsing, but this has only been
> > more true recently. The evsel/evlist is tied into event parsing but
> > that feels separable. As most json isn't distributed under a viral
> > license this feels achievable, then I suspect most tools can use this
> > library rather than have to reinvent a wheel.
> >
> > Still looking for tags :-)
> >
> > Thanks,
> > Ian
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 3/4] perf record: Skip don't fail for events that don't open
  2024-10-26 12:17 ` [PATCH v1 3/4] perf record: Skip don't fail for events that don't open Ian Rogers
  2024-11-11 15:49   ` James Clark
@ 2024-11-12 19:53   ` Leo Yan
  2024-11-12 22:24     ` Ian Rogers
  1 sibling, 1 reply; 24+ messages in thread
From: Leo Yan @ 2024-11-12 19:53 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Atish Patra, linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang,
	James Clark, Ze Gao, Weilin Wang, Ben Gainey, Dominique Martinet,
	Junhao He, linux-perf-users, linux-kernel

On Sat, Oct 26, 2024 at 05:17:57AM -0700, Ian Rogers wrote:
> 
> Whilst for many tools it is an expected behavior that failure to open
> a perf event is a failure, ARM decided to name PMU events the same as
> legacy events and then failed to rename such events on a server uncore
> SLC PMU. As perf's default behavior when no PMU is specified is to
> open the event on all PMUs that advertise/"have" the event, this
> yielded failures when trying to make the priority of legacy and
> sysfs/json events uniform - something requested by RISC-V and ARM. A
> legacy event user on ARM hardware may find their event opened on an
> uncore PMU which for perf record will fail. Arnaldo suggested skipping
> such events which this patch implements. Rather than have the skipping
> conditional on running on ARM, the skipping is done on all
> architectures as such a fundamental behavioral difference could lead
> to problems with tools built/depending on perf.
> 
> An example of perf record failing to open events on x86 is:
> ```
> $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> Error:
> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> "dmesg | grep -i perf" may provide additional information.
> 
> Error:
> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> "dmesg | grep -i perf" may provide additional information.
> 
> Error:
> Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> The LLC-prefetch-read event is not supported.
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> 
> $ perf report --stats
> Aggregated stats:
>                TOTAL events:      17255
>                 MMAP events:        284  ( 1.6%)
>                 COMM events:       1961  (11.4%)
>                 EXIT events:          1  ( 0.0%)
>                 FORK events:       1960  (11.4%)
>               SAMPLE events:         87  ( 0.5%)
>                MMAP2 events:      12836  (74.4%)
>              KSYMBOL events:         83  ( 0.5%)
>            BPF_EVENT events:         36  ( 0.2%)
>       FINISHED_ROUND events:          2  ( 0.0%)
>             ID_INDEX events:          1  ( 0.0%)
>           THREAD_MAP events:          1  ( 0.0%)
>              CPU_MAP events:          1  ( 0.0%)
>            TIME_CONV events:          1  ( 0.0%)
>        FINISHED_INIT events:          1  ( 0.0%)
> cycles stats:
>               SAMPLE events:         87
> ```

Thanks for James reminding me.  Tested on AVA platform:

# tree /sys/bus/event_source/devices/arm_dsu_*/events
  ...
  /sys/bus/event_source/devices/arm_dsu_9/events
  ├── bus_access
  ├── bus_cycles
  ├── cycles
  ├── l3d_cache
  ├── l3d_cache_allocate
  ├── l3d_cache_refill
  ├── l3d_cache_wb
  └── memory_error

# ./perf record -- sleep 0.1
 Error:
 Failure to open event 'cycles:PH' on PMU 'arm_dsu_0' which will be
 removed.
 cycles:PH: PMU Hardware doesn't support sampling/overflow-interrupts.
 Try 'perf stat'
 Error:
 Failure to open event 'cycles:PH' on PMU 'arm_dsu_1' which will be
 removed.
 cycles:PH: PMU Hardware doesn't support sampling/overflow-interrupts.
 Try 'perf stat'
 ...
 Error:
 Failure to open event 'cycles:PH' on PMU 'arm_dsu_15' which will be
 removed.
 cycles:PH: PMU Hardware doesn't support sampling/overflow-interrupts.
 Try 'perf stat'
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.008 MB perf.data (8 samples) ]

# ./perf report --stats

 Aggregated stats:
                TOTAL events:         67
                 MMAP events:         40  (59.7%)
                 COMM events:          1  ( 1.5%)
               SAMPLE events:          8  (11.9%)
              KSYMBOL events:          6  ( 9.0%)
            BPF_EVENT events:          6  ( 9.0%)
       FINISHED_ROUND events:          1  ( 1.5%)
             ID_INDEX events:          1  ( 1.5%)
           THREAD_MAP events:          1  ( 1.5%)
              CPU_MAP events:          1  ( 1.5%)
            TIME_CONV events:          1  ( 1.5%)
        FINISHED_INIT events:          1  ( 1.5%)
 cycles:P stats:
               SAMPLE events:          8

# ./perf stat -- sleep 0.1

 Performance counter stats for 'sleep 0.1':

              0.87 msec task-clock                       #    0.009 CPUs utilized
                 1      context-switches                 #    1.148 K/sec
                 0      cpu-migrations                   #    0.000 /sec
                52      page-faults                      #   59.685 K/sec
           877,835      instructions                     #    1.14  insn per cycle
                                                  #    0.25  stalled cycles per insn
           772,102      cycles                           #  886.210 M/sec
           191,914      stalled-cycles-frontend          #   24.86% frontend cycles idle
           219,183      stalled-cycles-backend           #   28.39% backend cycles idle
           184,099      branches                         #  211.307 M/sec
             8,548      branch-misses                    #    4.64% of all branches

       0.101623529 seconds time elapsed

       0.001645000 seconds user
       0.000000000 seconds sys

Tested-by: Leo Yan <leo.yan@arm.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided
  2024-11-12 17:23             ` Ian Rogers
@ 2024-11-12 19:55               ` Atish Kumar Patra
  0 siblings, 0 replies; 24+ messages in thread
From: Atish Kumar Patra @ 2024-11-12 19:55 UTC (permalink / raw)
  To: Ian Rogers
  Cc: James Clark, linux-perf-users, linux-kernel, Ben Gainey,
	Junhao He, linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland, Jiri Olsa,
	Alexander Shishkin, Kan Liang, Ze Gao, Weilin Wang,
	Dominique Martinet

On Tue, Nov 12, 2024 at 9:23 AM Ian Rogers <irogers@google.com> wrote:
>
> On Tue, Nov 12, 2024 at 6:22 AM James Clark <james.clark@linaro.org> wrote:
> >
> > On 11/11/2024 5:19 pm, Ian Rogers wrote:
> > > On Mon, Nov 11, 2024 at 2:45 AM James Clark <james.clark@linaro.org> wrote:
> > >>
> > >> On 08/11/2024 6:37 pm, Atish Kumar Patra wrote:
> > >>> On Fri, Nov 8, 2024 at 4:16 AM James Clark <james.clark@linaro.org> wrote:
> > >>>>
> > >>>>
> > >>>>
> > >>>> On 07/11/2024 18:51, Ian Rogers wrote:
> > >>>>> On Sat, Oct 26, 2024 at 5:18 AM Ian Rogers <irogers@google.com> wrote:
> > >>>>>>
> > >>>>>> At the RISC-V summit the topic of avoiding event data being in the
> > >>>>>> RISC-V PMU kernel driver came up. There is a preference for sysfs/JSON
> > >>>>>> events being the priority when no PMU is provided so that legacy
> > >>>>>> events maybe supported via json. Originally Mark Rutland also
> > >>>>>> expressed at LPC 2023 that doing this would resolve bugs on ARM Apple
> > >>>>>> M? processors, but James Clark more recently tested this and believes
> > >>>>>> the driver issues there may not have existed or have been resolved. In
> > >>>>>> any case, it is inconsistent that with a PMU event names avoid legacy
> > >>>>>> encodings, but when wildcarding PMUs (ie without a PMU with the event
> > >>>>>> name) the legacy encodings have priority.
> > >>>>>>
> > >>>>>> The patch doing this work was reverted in a v6.10 release candidate
> > >>>>>> as, even though the patch was posted for weeks and had been on
> > >>>>>> linux-next for weeks without issue, Linus was in the habit of using
> > >>>>>> explicit legacy events with unsupported precision options on his
> > >>>>>> Neoverse-N1. This machine has SLC PMU events for bus and CPU cycles
> > >>>>>> where ARM decided to call the events bus_cycles and cycles, the latter
> > >>>>>> being also a legacy event name. ARM haven't renamed the cycles event
> > >>>>>> to a more consistent cpu_cycles and avoided the problem. With these
> > >>>>>> changes the problematic event will now be skipped, a large warning
> > >>>>>> produced, and perf record will continue for the other PMU events. This
> > >>>>>> solution was proposed by Arnaldo.
> > >>>>>>
> > >>>>>> Two minor changes have been added to help with the error message and
> > >>>>>> to work around issues occurring with "perf stat metrics (shadow stat)
> > >>>>>> test".
> > >>>>>>
> > >>>>>> The patches have only been tested on my x86 non-hybrid laptop.
> > >>>>>
> > >>>>> Hi Atish and James,
> > >>>>>
> > >>>>> Could I get your tags for this series?
> > >>>>>
> > >>>
> > >>> Hi Ian,
> > >>> Thanks for your patches. It definitely helps to have a clean slate
> > >>> implementation
> > >>> for the perf tool. However, I have some open questions about other use cases
> > >>> that we discussed during the RVI Summit.
> > >>>
> > >>>>> The patches were originally motivated by wanting to make the behavior
> > >>>>> of events parsed like "cycles" match that of "cpu/cycles/", the PMU is
> > >>>>> wildcarded to "cpu" in the first case. This was divergent because of
> > >>>>> ARM we switched from preferring legacy (type = PERF_TYPE_HARDWARE,
> > >>>>> config = PERF_COUNT_HW_CPU_CYCLES) to sysfs/json (type=<core PMU's
> > >>>>> type>, config=<encoding from event>) when a PMU name was given. This
> > >>>>> aligns with RISC-V wanting to use json encodings to avoid complexity
> > >>>>> in the PMU driver.
> > >>>>>
> > >>>>
> > >>>> I couldn't find the thread, but I remember fairly recently it was
> > >>>> mentioned that RISC-V would be supporting the legacy events after all,
> > >>>> maybe it was a comment from Atish? I'm not sure if that changes the
> > >>>> requirements for this or not?
> > >>>>
> > >>>> I still can't really imagine how tooling would work if every tool has to
> > >>>> maintain the mappings of basic events like instructions and branches.
> > >>>> For example all the perf_event_open tests in ltp use the legacy events.
> > >>>>
> > >>>
> > >>> No it has not changed. While this series helps to avoid clunky vendor
> > >>> specific encodings
> > >>> in the driver for perf tool, I am still unsure how we will manage
> > >>> other applications
> > >>> (directly passing legacy events through perf_event_open or
> > >>> perf_evlist__open) will work.
> > >>>
> > >>> I have only anecdotal data about folks relying perf legacy events
> > >>> directly to profile
> > >>> their application. All of them use mostly cycle/instruction events though.
> > >>> Are there any users who use other legacy events directly without perf tool ?
> > >>>
> > >>> If not, we may have only cycle/instruction mapping in the driver and
> > >>> rely on json for everything else.
> > >>> The other use case is virtualization. I have been playing with these
> > >>> patches to find a clean solution to
> > >>> enable all the use cases. If you have any other ideas, please let me know.
> > >>>
> > >>
> > >> Yeah I would expect it's mostly cycles and instructions. I searched a
> > >> bit for PERF_COUNT_HW_BRANCH_MISSES and only found tool/developer type
> > >> usages, which I suppose we could expect to have to handle the mappings
> > >> like perf. Although it's not the easiest thing to search for and
> > >> obviously that only includes open source.
> > >>
> > >> Usages do exist though, there are people posting on stack overflow using
> > >> it, and other various bug tracker listings. So you would expect those
> > >> same users to have to use raw event codes and some switch statement to
> > >> pick the right one for their hardware, or use Perf.
> > >
> > > I don't have any magic to solve this. My thoughts:
> > >
> > > 1) I thought legacy events were just going to hang around forever,
> > > although the name hinting they'd kind of been deprecated. At LPC '23
> > > Atish and Mark Rutland originally asked for the sysfs/json to be the
> > > preference. I thought Kan was going to push back given the upheaval,
> > > especially updating every test expectation. It went through but we're
> > > in this weird state where wildcard events are encoded using legacy and
> > > PMU specifying events aren't. What this series hopes to solve.
> > >
> > > 2) I think it is important that perf tool be a reference
> > > implementation where others can look (strace, etc.) to base their
> > > implementation. Moving perf to sysfs/json is 1 step closer to legacy
> > > event deprecation. Please yell if deprecation isn't what is wanted as
> > > personally other than cleanliness I don't mind. If we do decide not to
> > > have sysfs/json be the priority then I think it sensible to revert the
> > > changes making it the priority for events that specify a PMU. I'd like
> > > some level of consistency.
> >
> > Personally this change feels like it's encouraging fragmentation, the
> > cleanest would be if RISC-V supports the legacy events like the other
> > platforms. It's not a huge set of events anyway, and then existing
> > software continues to work in addition to Perf continuing to work.
>
> So I try to be agnostic on the issue but we got to this point because
> of events being broken on ARM Apple chips. I fixed an issue where the
> ARM core PMU appeared as an uncore PMU as ARM's PMU naming differs
> from everyone else's. That fix made it so we used legacy events for
> the ARM core PMU, the PMU driver didn't handle this correctly (at the
> time) on ARM Apple Linux breaking perf over multiple Linux releases
> affecting a number of users.
>
> Having a pool of standard events isn't on the face of it a crazy idea,
> I'll try to remember things that have come up:
>
> 1) instructions and cycles seem like fairly easy events to agree upon.
> However, we also have cpu-cycles that means cycles. With legacy events
> the hyphen is often used as a PMU separator. Now what does the cpu
> mean here? On ARM you also have cpu_cycles as a sysfs event, ie an
> underscore and not a hyphen:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/drivers/perf/arm_pmuv3.c?h=perf-tools-next#n192
> What does 'cpu' mean when ARM has no core PMU called 'cpu' and in
> heterogeneous systems there are multiple notions of 'cpu' for the core
> PMU.
> What's my point? Just trying to make a minimal 2 event set of common
> events with the way things currently parse leads to ambiguity.
>
> 2) missing/ambiguous legacy events. Beyond instructions and cycles
> there are cache references, cache misses, branch instructions, branch
> misses, bus cycles, stalled cycles frontend, stalled cycles backend
> and reference CPU cycles. This set of events captures an out-of-order
> machine with a single bus, but even then there's ambiguity over
> whether the events include speculatively executed instructions or not.
> What's my point? Using these events leads to unpredictable counts in
> software. With a sysfs/json event there is a description but none was
> ever given about what these legacy events should do, or do if you have
> a system with >1 bus, counts for speculatively executed instructions,
> isn't out-of-order, etc.
>
> 3) "legacy cache" events never really took off. The legacy cache event
> names encoded data around speculation and appeared less ambiguous, a
> good source for all the names supported by perf is here:
> https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evsel.c?h=perf-tools-next#n734
> but choices were made like prefetch is the same as a speculative read
> in the encoding. Vendors no doubt agree and disagree on this. LLC and
> L2 are synonyms, but today LLC generally means L3 and often an
> entirely separate PMU.
> What's my point? The code tried to do a smarter less ambiguous thing
> than PERF_TYPE_HARDWARE with PERF_TYPE_HW_CACHE but again it was
> flawed, carried too many architectural assumptions and without the
> drivers adopting it neither did users.
>
> 4) PMU drivers mapping issues and the aforementioned ARM Apple Linux
> breakage, RISC-V wanting to keep mappings out of a more simplistic
> driver.
>
> So I wasn't there for all of this, but I think this is what led to the
> legacy events effectively becoming deprecated.
>
> > If we're still thinking that RISC-V will support the legacy events
> > anyway in the future, then it weakens the argument to change this and
> > risk any breakages that fall out of it.
>
> Agreed, that's why I keep saying I'm only doing this because ARM and
> RISC-V want it. What I want is that for things like the encoding of

In the beginning, we wanted it for the RISC-V ecosystem because we had
an opportunity to start
from a clean slate and attempt a better implementation. However, the
burden of supporting the existing
existing users who try to run their preferred way of profiling
applications by using
legacy events (especially cycle/instructions) are forcing us to follow
the convention (have the encodings specified in the driver as well.
) unfortunately.

However, we can add the encodings under a Kconfig which can be
disabled or removed in the future if we get
to a point where there are no legacy users.

> "cpu-cycles" and "cpu/cpu-cycles/" when there is only 1 PMU providing
> such an event called "cpu" the perf_event_attr encodings should match,
> we don't have this today but did prior to changing the priority to fix
> the ARM Apple Linux issue.
>
> I think for what Atish is asking for perhaps the best route is to add
> vendor standard events like ARM has. Anything done in Linux won't
> carry over to other operating systems you can be trying to virtualize.
> This doesn't to me feel like a problem we should be trying to solve
> either in the kernel/driver or the tool, well I've not heard an idea
> how we can.
>

Apart from the non-Linux guests using SBI PMU events, the host
hypervisor(also running Linux) may need
event mapping available in the driver for the following reasons even
for guests running Linux.

1. The guest Linux is not running the updated perf tool with the
correct json file
2. VMM wants to fake the unique identifiers (vendorid, archid, implid)
for migration or some
other reasons.

Both these use cases are probably corner cases at best. That's why, I
am inclined to keep the encoding
in the driver to begin with with the hope we won't need to serve these
use cases in the future.

The driver will use the raw events coming from json as a preferred
method for all the events anyways. In that case,
the hypervisor doesn't have to do anything apart from filtering. In
absence of it, the events are encoded as legacy
(HARDWARE or HARDWARE_CACHE) and the hypervisor will try its best to
map those to a platform specific encoding.

> > I'm also not sure whether they really are legacy or just a common base
> > set of events. Since the extended type thing was added even the legacy
> > events support hybrid, so it's not like they expired from lack of
> > features. Even if we claim they're legacy, is that really ever going to
> > make them go away?
>
> So for heterogeneous chips ARM first did BIG.little over ten years
> ago. More recently Intel did hybrid and it wasn't until Intel added
> hybrid support for legacy events that the idea of this worked. As you
> know, ARM''s support came later. Does software outside of the perf
> tool know to encode information in the extended type information in
> legacy events? I doubt it. For example, libpfm4 has only recently been
> adding support for heterogeneous CPUs.
>
> So yes there are legacy events and they are a common pool of events
> which is convenient for developers, however, this approach quickly
> showed shortcomings and often lacked proper vendor support. My company
> advocates developers against using common events, which admittedly is
> more work for the developer, because we've had multiple issues over
> the years primarily due to baked in assumptions.
>
> Thanks,
> Ian
>
> > Although I admit nobody has come out with a definitive use case that
> > doesn't use either the cycles or instructions events, so maybe its all fine.
> >
> > >
> > > 3) I'd like event parsing to be a library that is as easy to link
> > > against as libbpf (i.e. not a viral license). Event parsing is really
> > > just mapping an event name to 1 or more perf_event_attr. The PMU
> > > abstraction is tied into the event parsing, but this has only been
> > > more true recently. The evsel/evlist is tied into event parsing but
> > > that feels separable. As most json isn't distributed under a viral
> > > license this feels achievable, then I suspect most tools can use this
> > > library rather than have to reinvent a wheel.
> > >
> > > Still looking for tags :-)
> > >
> > > Thanks,
> > > Ian
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 3/4] perf record: Skip don't fail for events that don't open
  2024-11-12 19:53   ` Leo Yan
@ 2024-11-12 22:24     ` Ian Rogers
  2024-11-13 10:00       ` James Clark
  0 siblings, 1 reply; 24+ messages in thread
From: Ian Rogers @ 2024-11-12 22:24 UTC (permalink / raw)
  To: Leo Yan
  Cc: Atish Patra, linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang,
	James Clark, Ze Gao, Weilin Wang, Ben Gainey, Dominique Martinet,
	Junhao He, linux-perf-users, linux-kernel

On Tue, Nov 12, 2024 at 11:53 AM Leo Yan <leo.yan@arm.com> wrote:
>
> On Sat, Oct 26, 2024 at 05:17:57AM -0700, Ian Rogers wrote:
> >
> > Whilst for many tools it is an expected behavior that failure to open
> > a perf event is a failure, ARM decided to name PMU events the same as
> > legacy events and then failed to rename such events on a server uncore
> > SLC PMU. As perf's default behavior when no PMU is specified is to
> > open the event on all PMUs that advertise/"have" the event, this
> > yielded failures when trying to make the priority of legacy and
> > sysfs/json events uniform - something requested by RISC-V and ARM. A
> > legacy event user on ARM hardware may find their event opened on an
> > uncore PMU which for perf record will fail. Arnaldo suggested skipping
> > such events which this patch implements. Rather than have the skipping
> > conditional on running on ARM, the skipping is done on all
> > architectures as such a fundamental behavioral difference could lead
> > to problems with tools built/depending on perf.
> >
> > An example of perf record failing to open events on x86 is:
> > ```
> > $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
> > Error:
> > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > "dmesg | grep -i perf" may provide additional information.
> >
> > Error:
> > Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
> > The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
> > "dmesg | grep -i perf" may provide additional information.
> >
> > Error:
> > Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
> > The LLC-prefetch-read event is not supported.
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
> >
> > $ perf report --stats
> > Aggregated stats:
> >                TOTAL events:      17255
> >                 MMAP events:        284  ( 1.6%)
> >                 COMM events:       1961  (11.4%)
> >                 EXIT events:          1  ( 0.0%)
> >                 FORK events:       1960  (11.4%)
> >               SAMPLE events:         87  ( 0.5%)
> >                MMAP2 events:      12836  (74.4%)
> >              KSYMBOL events:         83  ( 0.5%)
> >            BPF_EVENT events:         36  ( 0.2%)
> >       FINISHED_ROUND events:          2  ( 0.0%)
> >             ID_INDEX events:          1  ( 0.0%)
> >           THREAD_MAP events:          1  ( 0.0%)
> >              CPU_MAP events:          1  ( 0.0%)
> >            TIME_CONV events:          1  ( 0.0%)
> >        FINISHED_INIT events:          1  ( 0.0%)
> > cycles stats:
> >               SAMPLE events:         87
> > ```
>
> Thanks for James reminding me.  Tested on AVA platform:
>
> # tree /sys/bus/event_source/devices/arm_dsu_*/events
>   ...
>   /sys/bus/event_source/devices/arm_dsu_9/events
>   ├── bus_access
>   ├── bus_cycles
>   ├── cycles
>   ├── l3d_cache
>   ├── l3d_cache_allocate
>   ├── l3d_cache_refill
>   ├── l3d_cache_wb
>   └── memory_error
>
> # ./perf record -- sleep 0.1
>  Error:
>  Failure to open event 'cycles:PH' on PMU 'arm_dsu_0' which will be
>  removed.
>  cycles:PH: PMU Hardware doesn't support sampling/overflow-interrupts.
>  Try 'perf stat'
>  Error:
>  Failure to open event 'cycles:PH' on PMU 'arm_dsu_1' which will be
>  removed.
>  cycles:PH: PMU Hardware doesn't support sampling/overflow-interrupts.
>  Try 'perf stat'
>  ...
>  Error:
>  Failure to open event 'cycles:PH' on PMU 'arm_dsu_15' which will be
>  removed.
>  cycles:PH: PMU Hardware doesn't support sampling/overflow-interrupts.
>  Try 'perf stat'
>  [ perf record: Woken up 1 times to write data ]
>  [ perf record: Captured and wrote 0.008 MB perf.data (8 samples) ]
>
> # ./perf report --stats
>
>  Aggregated stats:
>                 TOTAL events:         67
>                  MMAP events:         40  (59.7%)
>                  COMM events:          1  ( 1.5%)
>                SAMPLE events:          8  (11.9%)
>               KSYMBOL events:          6  ( 9.0%)
>             BPF_EVENT events:          6  ( 9.0%)
>        FINISHED_ROUND events:          1  ( 1.5%)
>              ID_INDEX events:          1  ( 1.5%)
>            THREAD_MAP events:          1  ( 1.5%)
>               CPU_MAP events:          1  ( 1.5%)
>             TIME_CONV events:          1  ( 1.5%)
>         FINISHED_INIT events:          1  ( 1.5%)
>  cycles:P stats:
>                SAMPLE events:          8
>
> # ./perf stat -- sleep 0.1
>
>  Performance counter stats for 'sleep 0.1':
>
>               0.87 msec task-clock                       #    0.009 CPUs utilized
>                  1      context-switches                 #    1.148 K/sec
>                  0      cpu-migrations                   #    0.000 /sec
>                 52      page-faults                      #   59.685 K/sec
>            877,835      instructions                     #    1.14  insn per cycle
>                                                   #    0.25  stalled cycles per insn
>            772,102      cycles                           #  886.210 M/sec
>            191,914      stalled-cycles-frontend          #   24.86% frontend cycles idle
>            219,183      stalled-cycles-backend           #   28.39% backend cycles idle
>            184,099      branches                         #  211.307 M/sec
>              8,548      branch-misses                    #    4.64% of all branches
>
>        0.101623529 seconds time elapsed
>
>        0.001645000 seconds user
>        0.000000000 seconds sys
>
> Tested-by: Leo Yan <leo.yan@arm.com>

Thanks Leo! As the Tested-by makes sense only if you've applied all 4
patches, which your testing and James' testing shows you've both done,
I'll add the tags to all 4 patches. I'll do likewise with Atish,
rebase and resend the patches.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH v1 3/4] perf record: Skip don't fail for events that don't open
  2024-11-12 22:24     ` Ian Rogers
@ 2024-11-13 10:00       ` James Clark
  0 siblings, 0 replies; 24+ messages in thread
From: James Clark @ 2024-11-13 10:00 UTC (permalink / raw)
  To: Ian Rogers, Leo Yan
  Cc: Atish Patra, linux-riscv, beeman, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
	Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang, Ze Gao,
	Weilin Wang, Ben Gainey, Dominique Martinet, Junhao He,
	linux-perf-users, linux-kernel



On 12/11/2024 10:24 pm, Ian Rogers wrote:
> On Tue, Nov 12, 2024 at 11:53 AM Leo Yan <leo.yan@arm.com> wrote:
>>
>> On Sat, Oct 26, 2024 at 05:17:57AM -0700, Ian Rogers wrote:
>>>
>>> Whilst for many tools it is an expected behavior that failure to open
>>> a perf event is a failure, ARM decided to name PMU events the same as
>>> legacy events and then failed to rename such events on a server uncore
>>> SLC PMU. As perf's default behavior when no PMU is specified is to
>>> open the event on all PMUs that advertise/"have" the event, this
>>> yielded failures when trying to make the priority of legacy and
>>> sysfs/json events uniform - something requested by RISC-V and ARM. A
>>> legacy event user on ARM hardware may find their event opened on an
>>> uncore PMU which for perf record will fail. Arnaldo suggested skipping
>>> such events which this patch implements. Rather than have the skipping
>>> conditional on running on ARM, the skipping is done on all
>>> architectures as such a fundamental behavioral difference could lead
>>> to problems with tools built/depending on perf.
>>>
>>> An example of perf record failing to open events on x86 is:
>>> ```
>>> $ perf record -e data_read,cycles,LLC-prefetch-read -a sleep 0.1
>>> Error:
>>> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_0' which will be removed.
>>> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
>>> "dmesg | grep -i perf" may provide additional information.
>>>
>>> Error:
>>> Failure to open event 'data_read' on PMU 'uncore_imc_free_running_1' which will be removed.
>>> The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (data_read).
>>> "dmesg | grep -i perf" may provide additional information.
>>>
>>> Error:
>>> Failure to open event 'LLC-prefetch-read' on PMU 'cpu' which will be removed.
>>> The LLC-prefetch-read event is not supported.
>>> [ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 2.188 MB perf.data (87 samples) ]
>>>
>>> $ perf report --stats
>>> Aggregated stats:
>>>                 TOTAL events:      17255
>>>                  MMAP events:        284  ( 1.6%)
>>>                  COMM events:       1961  (11.4%)
>>>                  EXIT events:          1  ( 0.0%)
>>>                  FORK events:       1960  (11.4%)
>>>                SAMPLE events:         87  ( 0.5%)
>>>                 MMAP2 events:      12836  (74.4%)
>>>               KSYMBOL events:         83  ( 0.5%)
>>>             BPF_EVENT events:         36  ( 0.2%)
>>>        FINISHED_ROUND events:          2  ( 0.0%)
>>>              ID_INDEX events:          1  ( 0.0%)
>>>            THREAD_MAP events:          1  ( 0.0%)
>>>               CPU_MAP events:          1  ( 0.0%)
>>>             TIME_CONV events:          1  ( 0.0%)
>>>         FINISHED_INIT events:          1  ( 0.0%)
>>> cycles stats:
>>>                SAMPLE events:         87
>>> ```
>>
>> Thanks for James reminding me.  Tested on AVA platform:
>>
>> # tree /sys/bus/event_source/devices/arm_dsu_*/events
>>    ...
>>    /sys/bus/event_source/devices/arm_dsu_9/events
>>    ├── bus_access
>>    ├── bus_cycles
>>    ├── cycles
>>    ├── l3d_cache
>>    ├── l3d_cache_allocate
>>    ├── l3d_cache_refill
>>    ├── l3d_cache_wb
>>    └── memory_error
>>
>> # ./perf record -- sleep 0.1
>>   Error:
>>   Failure to open event 'cycles:PH' on PMU 'arm_dsu_0' which will be
>>   removed.
>>   cycles:PH: PMU Hardware doesn't support sampling/overflow-interrupts.
>>   Try 'perf stat'
>>   Error:
>>   Failure to open event 'cycles:PH' on PMU 'arm_dsu_1' which will be
>>   removed.
>>   cycles:PH: PMU Hardware doesn't support sampling/overflow-interrupts.
>>   Try 'perf stat'
>>   ...
>>   Error:
>>   Failure to open event 'cycles:PH' on PMU 'arm_dsu_15' which will be
>>   removed.
>>   cycles:PH: PMU Hardware doesn't support sampling/overflow-interrupts.
>>   Try 'perf stat'
>>   [ perf record: Woken up 1 times to write data ]
>>   [ perf record: Captured and wrote 0.008 MB perf.data (8 samples) ]
>>
>> # ./perf report --stats
>>
>>   Aggregated stats:
>>                  TOTAL events:         67
>>                   MMAP events:         40  (59.7%)
>>                   COMM events:          1  ( 1.5%)
>>                 SAMPLE events:          8  (11.9%)
>>                KSYMBOL events:          6  ( 9.0%)
>>              BPF_EVENT events:          6  ( 9.0%)
>>         FINISHED_ROUND events:          1  ( 1.5%)
>>               ID_INDEX events:          1  ( 1.5%)
>>             THREAD_MAP events:          1  ( 1.5%)
>>                CPU_MAP events:          1  ( 1.5%)
>>              TIME_CONV events:          1  ( 1.5%)
>>          FINISHED_INIT events:          1  ( 1.5%)
>>   cycles:P stats:
>>                 SAMPLE events:          8
>>
>> # ./perf stat -- sleep 0.1
>>
>>   Performance counter stats for 'sleep 0.1':
>>
>>                0.87 msec task-clock                       #    0.009 CPUs utilized
>>                   1      context-switches                 #    1.148 K/sec
>>                   0      cpu-migrations                   #    0.000 /sec
>>                  52      page-faults                      #   59.685 K/sec
>>             877,835      instructions                     #    1.14  insn per cycle
>>                                                    #    0.25  stalled cycles per insn
>>             772,102      cycles                           #  886.210 M/sec
>>             191,914      stalled-cycles-frontend          #   24.86% frontend cycles idle
>>             219,183      stalled-cycles-backend           #   28.39% backend cycles idle
>>             184,099      branches                         #  211.307 M/sec
>>               8,548      branch-misses                    #    4.64% of all branches
>>
>>         0.101623529 seconds time elapsed
>>
>>         0.001645000 seconds user
>>         0.000000000 seconds sys
>>
>> Tested-by: Leo Yan <leo.yan@arm.com>
> 
> Thanks Leo! As the Tested-by makes sense only if you've applied all 4
> patches, which your testing and James' testing shows you've both done,
> I'll add the tags to all 4 patches. I'll do likewise with Atish,
> rebase and resend the patches.
> 
> Thanks,
> Ian

Yep makes sense

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2024-11-13 10:00 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-26 12:17 [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
2024-10-26 12:17 ` [PATCH v1 1/4] perf evsel: Add pmu_name helper Ian Rogers
2024-10-26 12:17 ` [PATCH v1 2/4] perf stat: Fix find_stat for mixed legacy/non-legacy events Ian Rogers
2024-10-26 12:17 ` [PATCH v1 3/4] perf record: Skip don't fail for events that don't open Ian Rogers
2024-11-11 15:49   ` James Clark
2024-11-11 17:00     ` Ian Rogers
2024-11-12 14:12       ` James Clark
2024-11-12 16:25         ` Ian Rogers
2024-11-12 19:53   ` Leo Yan
2024-11-12 22:24     ` Ian Rogers
2024-11-13 10:00       ` James Clark
2024-10-26 12:17 ` [PATCH v1 4/4] perf parse-events: Reapply "Prefer sysfs/JSON hardware events over legacy" Ian Rogers
2024-11-07 18:51 ` [PATCH v1 0/4] Prefer sysfs/JSON events also when no PMU is provided Ian Rogers
2024-11-08 12:16   ` James Clark
2024-11-08 18:37     ` Atish Kumar Patra
2024-11-08 18:59       ` Ian Rogers
2024-11-08 22:06         ` Atish Kumar Patra
2024-11-11 10:45       ` James Clark
2024-11-11 17:19         ` Ian Rogers
2024-11-11 23:38           ` Atish Kumar Patra
2024-11-12 14:21           ` James Clark
2024-11-12 17:23             ` Ian Rogers
2024-11-12 19:55               ` Atish Kumar Patra
2024-11-11 23:23 ` Atish Kumar Patra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).