* [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid
@ 2025-06-27 19:24 Ian Rogers
2025-06-27 19:24 ` [PATCH v1 01/12] perf parse-events: Warn if a cpu term is unsupported by a CPU Ian Rogers
` (13 more replies)
0 siblings, 14 replies; 22+ messages in thread
From: Ian Rogers @ 2025-06-27 19:24 UTC (permalink / raw)
To: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Kan Liang, Ben Gainey, James Clark, Howard Chu, Weilin Wang,
Levi Yun, Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones,
Yicong Yang, Anubhav Shelat, Thomas Richter, Jean-Philippe Romain,
Song Liu, linux-perf-users, linux-kernel
On hybrid systems some PMUs apply to all core types, particularly for
metrics the msr PMU and the tsc event. The metrics often only want the
values of the counter for their specific core type. These patches
allow the cpu term in an event to give a PMU name to take the cpumask
from. For example:
$ perf stat -e msr/tsc,cpu=cpu_atom/ ...
will aggregate the msr/tsc/ value but only for atom cores. In doing
this problems were identified in how cpumasks are handled by parsing
and event setup when cpumasks are specified along with a task to
profile. The event parsing, cpumask evlist propagation code and perf
stat code are updated accordingly.
The final result of the patch series is to be able to run:
```
$ perf stat --no-scale -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10
10.1: Basic parsing test : Ok
10.2: Parsing without PMU name : Ok
10.3: Parsing with PMU name : Ok
Performance counter stats for 'perf test -F 10':
63,704,975 msr/tsc/
47,060,704 msr/tsc,cpu=cpu_core/ (4.62%)
16,640,591 msr/tsc,cpu=cpu_atom/ (2.18%)
```
This has (further) identified a kernel bug for task events around the
enabled time being too large leading to invalid scaling (hence the
--no-scale in the command line above).
Ian Rogers (12):
perf parse-events: Warn if a cpu term is unsupported by a CPU
perf stat: Avoid buffer overflow to the aggregation map
perf stat: Don't size aggregation ids from user_requested_cpus
perf parse-events: Allow the cpu term to be a PMU
perf tool_pmu: Allow num_cpus(_online) to be specific to a cpumask
libperf evsel: Rename own_cpus to pmu_cpus
libperf evsel: Factor perf_evsel__exit out of perf_evsel__delete
perf evsel: Use libperf perf_evsel__exit
perf pmus: Factor perf_pmus__find_by_attr out of evsel__find_pmu
perf parse-events: Minor __add_event refactoring
perf evsel: Add evsel__open_per_cpu_and_thread
perf parse-events: Support user CPUs mixed with threads/processes
tools/lib/perf/evlist.c | 118 ++++++++++++++++--------
tools/lib/perf/evsel.c | 9 +-
tools/lib/perf/include/internal/evsel.h | 3 +-
tools/perf/builtin-stat.c | 9 +-
tools/perf/tests/event_update.c | 4 +-
tools/perf/util/evlist.c | 15 +--
tools/perf/util/evsel.c | 55 +++++++++--
tools/perf/util/evsel.h | 5 +
tools/perf/util/expr.c | 2 +-
tools/perf/util/header.c | 4 +-
tools/perf/util/parse-events.c | 102 ++++++++++++++------
tools/perf/util/pmus.c | 29 +++---
tools/perf/util/pmus.h | 2 +
tools/perf/util/stat.c | 6 +-
tools/perf/util/synthetic-events.c | 4 +-
tools/perf/util/tool_pmu.c | 56 +++++++++--
tools/perf/util/tool_pmu.h | 2 +-
17 files changed, 297 insertions(+), 128 deletions(-)
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v1 01/12] perf parse-events: Warn if a cpu term is unsupported by a CPU
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
@ 2025-06-27 19:24 ` Ian Rogers
2025-06-27 19:24 ` [PATCH v1 02/12] perf stat: Avoid buffer overflow to the aggregation map Ian Rogers
` (12 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: Ian Rogers @ 2025-06-27 19:24 UTC (permalink / raw)
To: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Kan Liang, Ben Gainey, James Clark, Howard Chu, Weilin Wang,
Levi Yun, Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones,
Yicong Yang, Anubhav Shelat, Thomas Richter, Jean-Philippe Romain,
Song Liu, linux-perf-users, linux-kernel
Factor requested CPU warning out of evlist and into evsel. At the end
of adding an event, perform the warning check. To avoid repeatedly
testing if the cpu_list is empty, add a local variable.
```
$ perf stat -e cpu_atom/cycles,cpu=1/ -a true
WARNING: A requested CPU in '1' is not supported by PMU 'cpu_atom' (CPUs 16-27) for event 'cpu_atom/cycles/'
Performance counter stats for 'system wide':
<not supported> cpu_atom/cycles/
0.000781511 seconds time elapsed
```
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/evlist.c | 15 +--------------
tools/perf/util/evsel.c | 24 ++++++++++++++++++++++++
tools/perf/util/evsel.h | 2 ++
tools/perf/util/parse-events.c | 12 ++++++++----
4 files changed, 35 insertions(+), 18 deletions(-)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 5664ebf6bbc6..a3c4d8558d29 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -2546,20 +2546,7 @@ void evlist__warn_user_requested_cpus(struct evlist *evlist, const char *cpu_lis
return;
evlist__for_each_entry(evlist, pos) {
- struct perf_cpu_map *intersect, *to_test, *online = cpu_map__online();
- const struct perf_pmu *pmu = evsel__find_pmu(pos);
-
- to_test = pmu && pmu->is_core ? pmu->cpus : online;
- intersect = perf_cpu_map__intersect(to_test, user_requested_cpus);
- if (!perf_cpu_map__equal(intersect, user_requested_cpus)) {
- char buf[128];
-
- cpu_map__snprint(to_test, buf, sizeof(buf));
- pr_warning("WARNING: A requested CPU in '%s' is not supported by PMU '%s' (CPUs %s) for event '%s'\n",
- cpu_list, pmu ? pmu->name : "cpu", buf, evsel__name(pos));
- }
- perf_cpu_map__put(intersect);
- perf_cpu_map__put(online);
+ evsel__warn_user_requested_cpus(pos, user_requested_cpus);
}
perf_cpu_map__put(user_requested_cpus);
}
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index d55482f094bf..0208d999da24 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -4071,3 +4071,27 @@ void evsel__uniquify_counter(struct evsel *counter)
counter->uniquified_name = false;
}
}
+
+void evsel__warn_user_requested_cpus(struct evsel *evsel, struct perf_cpu_map *user_requested_cpus)
+{
+ struct perf_cpu_map *intersect, *online = NULL;
+ const struct perf_pmu *pmu = evsel__find_pmu(evsel);
+
+ if (pmu && pmu->is_core) {
+ intersect = perf_cpu_map__intersect(pmu->cpus, user_requested_cpus);
+ } else {
+ online = cpu_map__online();
+ intersect = perf_cpu_map__intersect(online, user_requested_cpus);
+ }
+ if (!perf_cpu_map__equal(intersect, user_requested_cpus)) {
+ char buf1[128];
+ char buf2[128];
+
+ cpu_map__snprint(user_requested_cpus, buf1, sizeof(buf1));
+ cpu_map__snprint(online ?: pmu->cpus, buf2, sizeof(buf2));
+ pr_warning("WARNING: A requested CPU in '%s' is not supported by PMU '%s' (CPUs %s) for event '%s'\n",
+ buf1, pmu ? pmu->name : "cpu", buf2, evsel__name(evsel));
+ }
+ perf_cpu_map__put(intersect);
+ perf_cpu_map__put(online);
+}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 6dbc9690e0c9..8b5962a1e814 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -572,4 +572,6 @@ void evsel__set_config_if_unset(struct perf_pmu *pmu, struct evsel *evsel,
bool evsel__is_offcpu_event(struct evsel *evsel);
+void evsel__warn_user_requested_cpus(struct evsel *evsel, struct perf_cpu_map *user_requested_cpus);
+
#endif /* __PERF_EVSEL_H */
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index d1965a7b97ed..7a32d5234a64 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -257,6 +257,7 @@ __add_event(struct list_head *list, int *idx,
struct evsel *evsel;
bool is_pmu_core;
struct perf_cpu_map *cpus;
+ bool has_cpu_list = !perf_cpu_map__is_empty(cpu_list);
/*
* Ensure the first_wildcard_match's PMU matches that of the new event
@@ -281,7 +282,7 @@ __add_event(struct list_head *list, int *idx,
if (pmu) {
is_pmu_core = pmu->is_core;
- cpus = perf_cpu_map__get(perf_cpu_map__is_empty(cpu_list) ? pmu->cpus : cpu_list);
+ cpus = perf_cpu_map__get(has_cpu_list ? cpu_list : pmu->cpus);
perf_pmu__warn_invalid_formats(pmu);
if (attr->type == PERF_TYPE_RAW || attr->type >= PERF_TYPE_MAX) {
perf_pmu__warn_invalid_config(pmu, attr->config, name,
@@ -296,10 +297,10 @@ __add_event(struct list_head *list, int *idx,
} else {
is_pmu_core = (attr->type == PERF_TYPE_HARDWARE ||
attr->type == PERF_TYPE_HW_CACHE);
- if (perf_cpu_map__is_empty(cpu_list))
- cpus = is_pmu_core ? perf_cpu_map__new_online_cpus() : NULL;
- else
+ if (has_cpu_list)
cpus = perf_cpu_map__get(cpu_list);
+ else
+ cpus = is_pmu_core ? cpu_map__online() : NULL;
}
if (init_attr)
event_attr_init(attr);
@@ -331,6 +332,9 @@ __add_event(struct list_head *list, int *idx,
if (list)
list_add_tail(&evsel->core.node, list);
+ if (has_cpu_list)
+ evsel__warn_user_requested_cpus(evsel, cpu_list);
+
return evsel;
}
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v1 02/12] perf stat: Avoid buffer overflow to the aggregation map
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
2025-06-27 19:24 ` [PATCH v1 01/12] perf parse-events: Warn if a cpu term is unsupported by a CPU Ian Rogers
@ 2025-06-27 19:24 ` Ian Rogers
2025-06-27 19:24 ` [PATCH v1 03/12] perf stat: Don't size aggregation ids from user_requested_cpus Ian Rogers
` (11 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: Ian Rogers @ 2025-06-27 19:24 UTC (permalink / raw)
To: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Kan Liang, Ben Gainey, James Clark, Howard Chu, Weilin Wang,
Levi Yun, Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones,
Yicong Yang, Anubhav Shelat, Thomas Richter, Jean-Philippe Romain,
Song Liu, linux-perf-users, linux-kernel
CPUs may be created and passed to perf_stat__get_aggr (via
config->aggr_get_id), such as in the stat display
should_skip_zero_counter. There may be no such aggr_id, for example,
if running with a thread. Add a missing bound check and just create
IDs for these cases.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-stat.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 50fc53adb7e4..803bdcf89c0d 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1365,7 +1365,7 @@ static struct aggr_cpu_id perf_stat__get_aggr(struct perf_stat_config *config,
struct aggr_cpu_id id;
/* per-process mode - should use global aggr mode */
- if (cpu.cpu == -1)
+ if (cpu.cpu == -1 || cpu.cpu >= config->cpus_aggr_map->nr)
return get_id(config, cpu);
if (aggr_cpu_id__is_empty(&config->cpus_aggr_map->map[cpu.cpu]))
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v1 03/12] perf stat: Don't size aggregation ids from user_requested_cpus
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
2025-06-27 19:24 ` [PATCH v1 01/12] perf parse-events: Warn if a cpu term is unsupported by a CPU Ian Rogers
2025-06-27 19:24 ` [PATCH v1 02/12] perf stat: Avoid buffer overflow to the aggregation map Ian Rogers
@ 2025-06-27 19:24 ` Ian Rogers
2025-06-27 19:24 ` [PATCH v1 04/12] perf parse-events: Allow the cpu term to be a PMU Ian Rogers
` (10 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: Ian Rogers @ 2025-06-27 19:24 UTC (permalink / raw)
To: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Kan Liang, Ben Gainey, James Clark, Howard Chu, Weilin Wang,
Levi Yun, Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones,
Yicong Yang, Anubhav Shelat, Thomas Richter, Jean-Philippe Romain,
Song Liu, linux-perf-users, linux-kernel
As evsels may have additional CPU terms, the user_requested_cpus may
not reflect all the CPUs requested. Use evlist->all_cpus to size the
array as that reflects all the CPUs potentially needed by the evlist.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/builtin-stat.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 803bdcf89c0d..ff726f3e42ea 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -1513,11 +1513,8 @@ static int perf_stat_init_aggr_mode(void)
* taking the highest cpu number to be the size of
* the aggregation translate cpumap.
*/
- if (!perf_cpu_map__is_any_cpu_or_is_empty(evsel_list->core.user_requested_cpus))
- nr = perf_cpu_map__max(evsel_list->core.user_requested_cpus).cpu;
- else
- nr = 0;
- stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr + 1);
+ nr = perf_cpu_map__max(evsel_list->core.all_cpus).cpu + 1;
+ stat_config.cpus_aggr_map = cpu_aggr_map__empty_new(nr);
return stat_config.cpus_aggr_map ? 0 : -ENOMEM;
}
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v1 04/12] perf parse-events: Allow the cpu term to be a PMU
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
` (2 preceding siblings ...)
2025-06-27 19:24 ` [PATCH v1 03/12] perf stat: Don't size aggregation ids from user_requested_cpus Ian Rogers
@ 2025-06-27 19:24 ` Ian Rogers
2025-07-16 20:09 ` Namhyung Kim
2025-06-27 19:24 ` [PATCH v1 05/12] perf tool_pmu: Allow num_cpus(_online) to be specific to a cpumask Ian Rogers
` (9 subsequent siblings)
13 siblings, 1 reply; 22+ messages in thread
From: Ian Rogers @ 2025-06-27 19:24 UTC (permalink / raw)
To: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Kan Liang, Ben Gainey, James Clark, Howard Chu, Weilin Wang,
Levi Yun, Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones,
Yicong Yang, Anubhav Shelat, Thomas Richter, Jean-Philippe Romain,
Song Liu, linux-perf-users, linux-kernel
On hybrid systems, events like msr/tsc/ will aggregate counts across
all CPUs. Often metrics only want a value like msr/tsc/ for the cores
on which the metric is being computed. Listing each CPU with terms
cpu=0,cpu=1.. is laborious and would need to be encoded for all
variations of a CPU model.
Allow the cpumask from a PMU to be an argument to the cpu term. For
example in the following the cpumask of the cstate_pkg PMU selects the
CPUs to count msr/tsc/ counter upon:
```
$ cat /sys/bus/event_source/devices/cstate_pkg/cpumask
0
$ perf stat -A -e 'msr/tsc,cpu=cstate_pkg/' -a sleep 0.1
Performance counter stats for 'system wide':
CPU0 252,621,253 msr/tsc,cpu=cstate_pkg/
0.101184092 seconds time elapsed
```
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/parse-events.c | 37 +++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 7a32d5234a64..ef38eb082342 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -192,10 +192,20 @@ static struct perf_cpu_map *get_config_cpu(const struct parse_events_terms *head
list_for_each_entry(term, &head_terms->terms, list) {
if (term->type_term == PARSE_EVENTS__TERM_TYPE_CPU) {
- struct perf_cpu_map *cpu = perf_cpu_map__new_int(term->val.num);
+ struct perf_cpu_map *term_cpus;
- perf_cpu_map__merge(&cpus, cpu);
- perf_cpu_map__put(cpu);
+ if (term->type_val == PARSE_EVENTS__TERM_TYPE_NUM) {
+ term_cpus = perf_cpu_map__new_int(term->val.num);
+ } else {
+ struct perf_pmu *pmu = perf_pmus__find(term->val.str);
+
+ if (perf_cpu_map__is_empty(pmu->cpus))
+ term_cpus = pmu->is_core ? cpu_map__online() : NULL;
+ else
+ term_cpus = perf_cpu_map__get(pmu->cpus);
+ }
+ perf_cpu_map__merge(&cpus, term_cpus);
+ perf_cpu_map__put(term_cpus);
}
}
@@ -1054,12 +1064,21 @@ do { \
}
break;
case PARSE_EVENTS__TERM_TYPE_CPU:
- CHECK_TYPE_VAL(NUM);
- if (term->val.num >= (u64)cpu__max_present_cpu().cpu) {
- parse_events_error__handle(err, term->err_val,
- strdup("too big"),
- NULL);
- return -EINVAL;
+ if (term->type_val == PARSE_EVENTS__TERM_TYPE_NUM) {
+ if (term->val.num >= (u64)cpu__max_present_cpu().cpu) {
+ parse_events_error__handle(err, term->err_val,
+ strdup("too big"),
+ /*help=*/NULL);
+ return -EINVAL;
+ }
+ } else {
+ assert(term->type_val == PARSE_EVENTS__TERM_TYPE_STR);
+ if (perf_pmus__find(term->val.str) == NULL) {
+ parse_events_error__handle(err, term->err_val,
+ strdup("not a valid PMU"),
+ /*help=*/NULL);
+ return -EINVAL;
+ }
}
break;
case PARSE_EVENTS__TERM_TYPE_DRV_CFG:
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v1 05/12] perf tool_pmu: Allow num_cpus(_online) to be specific to a cpumask
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
` (3 preceding siblings ...)
2025-06-27 19:24 ` [PATCH v1 04/12] perf parse-events: Allow the cpu term to be a PMU Ian Rogers
@ 2025-06-27 19:24 ` Ian Rogers
2025-06-27 19:24 ` [PATCH v1 06/12] libperf evsel: Rename own_cpus to pmu_cpus Ian Rogers
` (8 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: Ian Rogers @ 2025-06-27 19:24 UTC (permalink / raw)
To: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Kan Liang, Ben Gainey, James Clark, Howard Chu, Weilin Wang,
Levi Yun, Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones,
Yicong Yang, Anubhav Shelat, Thomas Richter, Jean-Philippe Romain,
Song Liu, linux-perf-users, linux-kernel
For hybrid metrics it is useful to know the number of p-core or e-core
CPUs. If a cpumask is specified for the num_cpus or num_cpus_online
tool events, compute the value relative to the given mask rather than
for the full system.
```
$ sudo /tmp/perf/perf stat -e 'tool/num_cpus/,tool/num_cpus,cpu=cpu_core/,
tool/num_cpus,cpu=cpu_atom/,tool/num_cpus_online/,tool/num_cpus_online,
cpu=cpu_core/,tool/num_cpus_online,cpu=cpu_atom/' true
Performance counter stats for 'true':
28 tool/num_cpus/
16 tool/num_cpus,cpu=cpu_core/
12 tool/num_cpus,cpu=cpu_atom/
28 tool/num_cpus_online/
16 tool/num_cpus_online,cpu=cpu_core/
12 tool/num_cpus_online,cpu=cpu_atom/
0.000767205 seconds time elapsed
0.000938000 seconds user
0.000000000 seconds sys
```
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/expr.c | 2 +-
tools/perf/util/tool_pmu.c | 56 +++++++++++++++++++++++++++++++++-----
tools/perf/util/tool_pmu.h | 2 +-
3 files changed, 51 insertions(+), 9 deletions(-)
diff --git a/tools/perf/util/expr.c b/tools/perf/util/expr.c
index 6413537442aa..ffd55bc06fa4 100644
--- a/tools/perf/util/expr.c
+++ b/tools/perf/util/expr.c
@@ -397,7 +397,7 @@ double expr__get_literal(const char *literal, const struct expr_scanner_ctx *ctx
if (ev != TOOL_PMU__EVENT_NONE) {
u64 count;
- if (tool_pmu__read_event(ev, &count))
+ if (tool_pmu__read_event(ev, /*evsel=*/NULL, &count))
result = count;
else
pr_err("Failure to read '%s'", literal);
diff --git a/tools/perf/util/tool_pmu.c b/tools/perf/util/tool_pmu.c
index 4630b8cc8e52..7aa4f315b0ac 100644
--- a/tools/perf/util/tool_pmu.c
+++ b/tools/perf/util/tool_pmu.c
@@ -332,7 +332,7 @@ static bool has_pmem(void)
return has_pmem;
}
-bool tool_pmu__read_event(enum tool_pmu_event ev, u64 *result)
+bool tool_pmu__read_event(enum tool_pmu_event ev, struct evsel *evsel, u64 *result)
{
const struct cpu_topology *topology;
@@ -347,18 +347,60 @@ bool tool_pmu__read_event(enum tool_pmu_event ev, u64 *result)
return true;
case TOOL_PMU__EVENT_NUM_CPUS:
- *result = cpu__max_present_cpu().cpu;
+ if (!evsel || perf_cpu_map__is_empty(evsel->core.cpus)) {
+ /* No evsel to be specific to. */
+ *result = cpu__max_present_cpu().cpu;
+ } else if (!perf_cpu_map__has_any_cpu(evsel->core.cpus)) {
+ /* Evsel just has specific CPUs. */
+ *result = perf_cpu_map__nr(evsel->core.cpus);
+ } else {
+ /*
+ * "Any CPU" event that can be scheduled on any CPU in
+ * the PMU's cpumask. The PMU cpumask should be saved in
+ * own_cpus. If not present fall back to max.
+ */
+ if (!perf_cpu_map__is_empty(evsel->core.own_cpus))
+ *result = perf_cpu_map__nr(evsel->core.own_cpus);
+ else
+ *result = cpu__max_present_cpu().cpu;
+ }
return true;
case TOOL_PMU__EVENT_NUM_CPUS_ONLINE: {
struct perf_cpu_map *online = cpu_map__online();
- if (online) {
+ if (!online)
+ return false;
+
+ if (!evsel || perf_cpu_map__is_empty(evsel->core.cpus)) {
+ /* No evsel to be specific to. */
*result = perf_cpu_map__nr(online);
- perf_cpu_map__put(online);
- return true;
+ } else if (!perf_cpu_map__has_any_cpu(evsel->core.cpus)) {
+ /* Evsel just has specific CPUs. */
+ struct perf_cpu_map *tmp =
+ perf_cpu_map__intersect(online, evsel->core.cpus);
+
+ *result = perf_cpu_map__nr(tmp);
+ perf_cpu_map__put(tmp);
+ } else {
+ /*
+ * "Any CPU" event that can be scheduled on any CPU in
+ * the PMU's cpumask. The PMU cpumask should be saved in
+ * own_cpus, if not present then just the online cpu
+ * mask.
+ */
+ if (!perf_cpu_map__is_empty(evsel->core.own_cpus)) {
+ struct perf_cpu_map *tmp =
+ perf_cpu_map__intersect(online, evsel->core.own_cpus);
+
+ *result = perf_cpu_map__nr(tmp);
+ perf_cpu_map__put(tmp);
+ } else {
+ *result = perf_cpu_map__nr(online);
+ }
}
- return false;
+ perf_cpu_map__put(online);
+ return true;
}
case TOOL_PMU__EVENT_NUM_DIES:
topology = online_topology();
@@ -417,7 +459,7 @@ int evsel__tool_pmu_read(struct evsel *evsel, int cpu_map_idx, int thread)
old_count = perf_counts(evsel->prev_raw_counts, cpu_map_idx, thread);
val = 0;
if (cpu_map_idx == 0 && thread == 0) {
- if (!tool_pmu__read_event(ev, &val)) {
+ if (!tool_pmu__read_event(ev, evsel, &val)) {
count->lost++;
val = 0;
}
diff --git a/tools/perf/util/tool_pmu.h b/tools/perf/util/tool_pmu.h
index c6ad1dd90a56..d642e7d73910 100644
--- a/tools/perf/util/tool_pmu.h
+++ b/tools/perf/util/tool_pmu.h
@@ -34,7 +34,7 @@ enum tool_pmu_event tool_pmu__str_to_event(const char *str);
bool tool_pmu__skip_event(const char *name);
int tool_pmu__num_skip_events(void);
-bool tool_pmu__read_event(enum tool_pmu_event ev, u64 *result);
+bool tool_pmu__read_event(enum tool_pmu_event ev, struct evsel *evsel, u64 *result);
u64 tool_pmu__cpu_slots_per_cycle(void);
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v1 06/12] libperf evsel: Rename own_cpus to pmu_cpus
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
` (4 preceding siblings ...)
2025-06-27 19:24 ` [PATCH v1 05/12] perf tool_pmu: Allow num_cpus(_online) to be specific to a cpumask Ian Rogers
@ 2025-06-27 19:24 ` Ian Rogers
2025-06-27 19:24 ` [PATCH v1 07/12] libperf evsel: Factor perf_evsel__exit out of perf_evsel__delete Ian Rogers
` (7 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: Ian Rogers @ 2025-06-27 19:24 UTC (permalink / raw)
To: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Kan Liang, Ben Gainey, James Clark, Howard Chu, Weilin Wang,
Levi Yun, Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones,
Yicong Yang, Anubhav Shelat, Thomas Richter, Jean-Philippe Romain,
Song Liu, linux-perf-users, linux-kernel
own_cpus is generally the cpumask from the PMU. Rename to pmu_cpus to
try to make this clearer. Variable rename with no other changes.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/lib/perf/evlist.c | 8 ++++----
tools/lib/perf/evsel.c | 2 +-
tools/lib/perf/include/internal/evsel.h | 2 +-
tools/perf/tests/event_update.c | 4 ++--
tools/perf/util/evsel.c | 6 +++---
| 4 ++--
tools/perf/util/parse-events.c | 2 +-
tools/perf/util/synthetic-events.c | 4 ++--
tools/perf/util/tool_pmu.c | 12 ++++++------
9 files changed, 22 insertions(+), 22 deletions(-)
diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index b1f4c8176b32..9d9dec21f510 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -46,7 +46,7 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
* are valid by intersecting with those of the PMU.
*/
perf_cpu_map__put(evsel->cpus);
- evsel->cpus = perf_cpu_map__intersect(evlist->user_requested_cpus, evsel->own_cpus);
+ evsel->cpus = perf_cpu_map__intersect(evlist->user_requested_cpus, evsel->pmu_cpus);
/*
* Empty cpu lists would eventually get opened as "any" so remove
@@ -61,7 +61,7 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
list_for_each_entry_from(next, &evlist->entries, node)
next->idx--;
}
- } else if (!evsel->own_cpus || evlist->has_user_cpus ||
+ } else if (!evsel->pmu_cpus || evlist->has_user_cpus ||
(!evsel->requires_cpu && perf_cpu_map__has_any_cpu(evlist->user_requested_cpus))) {
/*
* The PMU didn't specify a default cpu map, this isn't a core
@@ -72,13 +72,13 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
*/
perf_cpu_map__put(evsel->cpus);
evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
- } else if (evsel->cpus != evsel->own_cpus) {
+ } else if (evsel->cpus != evsel->pmu_cpus) {
/*
* No user requested cpu map but the PMU cpu map doesn't match
* the evsel's. Reset it back to the PMU cpu map.
*/
perf_cpu_map__put(evsel->cpus);
- evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
+ evsel->cpus = perf_cpu_map__get(evsel->pmu_cpus);
}
if (evsel->system_wide) {
diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
index 2a85e0bfee1e..127abe7df63d 100644
--- a/tools/lib/perf/evsel.c
+++ b/tools/lib/perf/evsel.c
@@ -46,7 +46,7 @@ void perf_evsel__delete(struct perf_evsel *evsel)
assert(evsel->mmap == NULL); /* If not munmap wasn't called. */
assert(evsel->sample_id == NULL); /* If not free_id wasn't called. */
perf_cpu_map__put(evsel->cpus);
- perf_cpu_map__put(evsel->own_cpus);
+ perf_cpu_map__put(evsel->pmu_cpus);
perf_thread_map__put(evsel->threads);
free(evsel);
}
diff --git a/tools/lib/perf/include/internal/evsel.h b/tools/lib/perf/include/internal/evsel.h
index ea78defa77d0..b97dc8c92882 100644
--- a/tools/lib/perf/include/internal/evsel.h
+++ b/tools/lib/perf/include/internal/evsel.h
@@ -99,7 +99,7 @@ struct perf_evsel {
* cpu map for opening the event on, for example, the first CPU on a
* socket for an uncore event.
*/
- struct perf_cpu_map *own_cpus;
+ struct perf_cpu_map *pmu_cpus;
struct perf_thread_map *threads;
struct xyarray *fd;
struct xyarray *mmap;
diff --git a/tools/perf/tests/event_update.c b/tools/perf/tests/event_update.c
index 9301fde11366..cb9e6de2e033 100644
--- a/tools/perf/tests/event_update.c
+++ b/tools/perf/tests/event_update.c
@@ -109,8 +109,8 @@ static int test__event_update(struct test_suite *test __maybe_unused, int subtes
TEST_ASSERT_VAL("failed to synthesize attr update name",
!perf_event__synthesize_event_update_name(&tmp.tool, evsel, process_event_name));
- perf_cpu_map__put(evsel->core.own_cpus);
- evsel->core.own_cpus = perf_cpu_map__new("1,2,3");
+ perf_cpu_map__put(evsel->core.pmu_cpus);
+ evsel->core.pmu_cpus = perf_cpu_map__new("1,2,3");
TEST_ASSERT_VAL("failed to synthesize attr update cpus",
!perf_event__synthesize_event_update_cpus(&tmp.tool, evsel, process_event_cpus));
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 0208d999da24..8caee925bd4c 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -487,7 +487,7 @@ struct evsel *evsel__clone(struct evsel *dest, struct evsel *orig)
return NULL;
evsel->core.cpus = perf_cpu_map__get(orig->core.cpus);
- evsel->core.own_cpus = perf_cpu_map__get(orig->core.own_cpus);
+ evsel->core.pmu_cpus = perf_cpu_map__get(orig->core.pmu_cpus);
evsel->core.threads = perf_thread_map__get(orig->core.threads);
evsel->core.nr_members = orig->core.nr_members;
evsel->core.system_wide = orig->core.system_wide;
@@ -1526,7 +1526,7 @@ void evsel__config(struct evsel *evsel, struct record_opts *opts,
attr->exclude_user = 1;
}
- if (evsel->core.own_cpus || evsel->unit)
+ if (evsel->core.pmu_cpus || evsel->unit)
evsel->core.attr.read_format |= PERF_FORMAT_ID;
/*
@@ -1670,7 +1670,7 @@ void evsel__exit(struct evsel *evsel)
evsel__free_config_terms(evsel);
cgroup__put(evsel->cgrp);
perf_cpu_map__put(evsel->core.cpus);
- perf_cpu_map__put(evsel->core.own_cpus);
+ perf_cpu_map__put(evsel->core.pmu_cpus);
perf_thread_map__put(evsel->core.threads);
zfree(&evsel->group_name);
zfree(&evsel->name);
--git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 2dea35237e81..234641aa6b13 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -4480,8 +4480,8 @@ int perf_event__process_event_update(const struct perf_tool *tool __maybe_unused
case PERF_EVENT_UPDATE__CPUS:
map = cpu_map__new_data(&ev->cpus.cpus);
if (map) {
- perf_cpu_map__put(evsel->core.own_cpus);
- evsel->core.own_cpus = map;
+ perf_cpu_map__put(evsel->core.pmu_cpus);
+ evsel->core.pmu_cpus = map;
} else
pr_err("failed to get event_update cpus\n");
default:
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index ef38eb082342..a78a4bc4e8fe 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -323,7 +323,7 @@ __add_event(struct list_head *list, int *idx,
(*idx)++;
evsel->core.cpus = cpus;
- evsel->core.own_cpus = perf_cpu_map__get(cpus);
+ evsel->core.pmu_cpus = perf_cpu_map__get(cpus);
evsel->core.requires_cpu = pmu ? pmu->is_uncore : false;
evsel->core.is_pmu_core = is_pmu_core;
evsel->pmu = pmu;
diff --git a/tools/perf/util/synthetic-events.c b/tools/perf/util/synthetic-events.c
index 2fc4d0537840..7c00b09e3a93 100644
--- a/tools/perf/util/synthetic-events.c
+++ b/tools/perf/util/synthetic-events.c
@@ -2045,7 +2045,7 @@ int perf_event__synthesize_event_update_name(const struct perf_tool *tool, struc
int perf_event__synthesize_event_update_cpus(const struct perf_tool *tool, struct evsel *evsel,
perf_event__handler_t process)
{
- struct synthesize_cpu_map_data syn_data = { .map = evsel->core.own_cpus };
+ struct synthesize_cpu_map_data syn_data = { .map = evsel->core.pmu_cpus };
struct perf_record_event_update *ev;
int err;
@@ -2126,7 +2126,7 @@ int perf_event__synthesize_extra_attr(const struct perf_tool *tool, struct evlis
}
}
- if (evsel->core.own_cpus) {
+ if (evsel->core.pmu_cpus) {
err = perf_event__synthesize_event_update_cpus(tool, evsel, process);
if (err < 0) {
pr_err("Couldn't synthesize evsel cpus.\n");
diff --git a/tools/perf/util/tool_pmu.c b/tools/perf/util/tool_pmu.c
index 7aa4f315b0ac..d99e699e646d 100644
--- a/tools/perf/util/tool_pmu.c
+++ b/tools/perf/util/tool_pmu.c
@@ -357,10 +357,10 @@ bool tool_pmu__read_event(enum tool_pmu_event ev, struct evsel *evsel, u64 *resu
/*
* "Any CPU" event that can be scheduled on any CPU in
* the PMU's cpumask. The PMU cpumask should be saved in
- * own_cpus. If not present fall back to max.
+ * pmu_cpus. If not present fall back to max.
*/
- if (!perf_cpu_map__is_empty(evsel->core.own_cpus))
- *result = perf_cpu_map__nr(evsel->core.own_cpus);
+ if (!perf_cpu_map__is_empty(evsel->core.pmu_cpus))
+ *result = perf_cpu_map__nr(evsel->core.pmu_cpus);
else
*result = cpu__max_present_cpu().cpu;
}
@@ -386,12 +386,12 @@ bool tool_pmu__read_event(enum tool_pmu_event ev, struct evsel *evsel, u64 *resu
/*
* "Any CPU" event that can be scheduled on any CPU in
* the PMU's cpumask. The PMU cpumask should be saved in
- * own_cpus, if not present then just the online cpu
+ * pmu_cpus, if not present then just the online cpu
* mask.
*/
- if (!perf_cpu_map__is_empty(evsel->core.own_cpus)) {
+ if (!perf_cpu_map__is_empty(evsel->core.pmu_cpus)) {
struct perf_cpu_map *tmp =
- perf_cpu_map__intersect(online, evsel->core.own_cpus);
+ perf_cpu_map__intersect(online, evsel->core.pmu_cpus);
*result = perf_cpu_map__nr(tmp);
perf_cpu_map__put(tmp);
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v1 07/12] libperf evsel: Factor perf_evsel__exit out of perf_evsel__delete
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
` (5 preceding siblings ...)
2025-06-27 19:24 ` [PATCH v1 06/12] libperf evsel: Rename own_cpus to pmu_cpus Ian Rogers
@ 2025-06-27 19:24 ` Ian Rogers
2025-06-27 19:24 ` [PATCH v1 08/12] perf evsel: Use libperf perf_evsel__exit Ian Rogers
` (6 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: Ian Rogers @ 2025-06-27 19:24 UTC (permalink / raw)
To: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Kan Liang, Ben Gainey, James Clark, Howard Chu, Weilin Wang,
Levi Yun, Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones,
Yicong Yang, Anubhav Shelat, Thomas Richter, Jean-Philippe Romain,
Song Liu, linux-perf-users, linux-kernel
This allows the perf_evsel__exit to be called when the struct
perf_evsel is embedded inside another struct, such as struct evsel in
perf.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/lib/perf/evsel.c | 7 ++++++-
tools/lib/perf/include/internal/evsel.h | 1 +
2 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
index 127abe7df63d..13a307fc75ae 100644
--- a/tools/lib/perf/evsel.c
+++ b/tools/lib/perf/evsel.c
@@ -40,7 +40,7 @@ struct perf_evsel *perf_evsel__new(struct perf_event_attr *attr)
return evsel;
}
-void perf_evsel__delete(struct perf_evsel *evsel)
+void perf_evsel__exit(struct perf_evsel *evsel)
{
assert(evsel->fd == NULL); /* If not fds were not closed. */
assert(evsel->mmap == NULL); /* If not munmap wasn't called. */
@@ -48,6 +48,11 @@ void perf_evsel__delete(struct perf_evsel *evsel)
perf_cpu_map__put(evsel->cpus);
perf_cpu_map__put(evsel->pmu_cpus);
perf_thread_map__put(evsel->threads);
+}
+
+void perf_evsel__delete(struct perf_evsel *evsel)
+{
+ perf_evsel__exit(evsel);
free(evsel);
}
diff --git a/tools/lib/perf/include/internal/evsel.h b/tools/lib/perf/include/internal/evsel.h
index b97dc8c92882..fefe64ba5e26 100644
--- a/tools/lib/perf/include/internal/evsel.h
+++ b/tools/lib/perf/include/internal/evsel.h
@@ -133,6 +133,7 @@ struct perf_evsel {
void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
int idx);
+void perf_evsel__exit(struct perf_evsel *evsel);
int perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads);
void perf_evsel__close_fd(struct perf_evsel *evsel);
void perf_evsel__free_fd(struct perf_evsel *evsel);
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v1 08/12] perf evsel: Use libperf perf_evsel__exit
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
` (6 preceding siblings ...)
2025-06-27 19:24 ` [PATCH v1 07/12] libperf evsel: Factor perf_evsel__exit out of perf_evsel__delete Ian Rogers
@ 2025-06-27 19:24 ` Ian Rogers
2025-06-27 19:24 ` [PATCH v1 09/12] perf pmus: Factor perf_pmus__find_by_attr out of evsel__find_pmu Ian Rogers
` (5 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: Ian Rogers @ 2025-06-27 19:24 UTC (permalink / raw)
To: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Kan Liang, Ben Gainey, James Clark, Howard Chu, Weilin Wang,
Levi Yun, Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones,
Yicong Yang, Anubhav Shelat, Thomas Richter, Jean-Philippe Romain,
Song Liu, linux-perf-users, linux-kernel
Avoid the duplicated code and better enable perf_evsel to change.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/evsel.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 8caee925bd4c..1169aa60c5fc 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1669,9 +1669,7 @@ void evsel__exit(struct evsel *evsel)
perf_evsel__free_id(&evsel->core);
evsel__free_config_terms(evsel);
cgroup__put(evsel->cgrp);
- perf_cpu_map__put(evsel->core.cpus);
- perf_cpu_map__put(evsel->core.pmu_cpus);
- perf_thread_map__put(evsel->core.threads);
+ perf_evsel__exit(&evsel->core);
zfree(&evsel->group_name);
zfree(&evsel->name);
#ifdef HAVE_LIBTRACEEVENT
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v1 09/12] perf pmus: Factor perf_pmus__find_by_attr out of evsel__find_pmu
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
` (7 preceding siblings ...)
2025-06-27 19:24 ` [PATCH v1 08/12] perf evsel: Use libperf perf_evsel__exit Ian Rogers
@ 2025-06-27 19:24 ` Ian Rogers
2025-06-27 19:24 ` [PATCH v1 10/12] perf parse-events: Minor __add_event refactoring Ian Rogers
` (4 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: Ian Rogers @ 2025-06-27 19:24 UTC (permalink / raw)
To: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Kan Liang, Ben Gainey, James Clark, Howard Chu, Weilin Wang,
Levi Yun, Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones,
Yicong Yang, Anubhav Shelat, Thomas Richter, Jean-Philippe Romain,
Song Liu, linux-perf-users, linux-kernel
Allow a PMU to be found by a perf_event_attr, useful when creating
evsels.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/pmus.c | 29 +++++++++++++++++------------
tools/perf/util/pmus.h | 2 ++
2 files changed, 19 insertions(+), 12 deletions(-)
diff --git a/tools/perf/util/pmus.c b/tools/perf/util/pmus.c
index 3bbd26fec78a..8bf698badaa7 100644
--- a/tools/perf/util/pmus.c
+++ b/tools/perf/util/pmus.c
@@ -715,24 +715,18 @@ bool perf_pmus__supports_extended_type(void)
return perf_pmus__do_support_extended_type;
}
-struct perf_pmu *evsel__find_pmu(const struct evsel *evsel)
+struct perf_pmu *perf_pmus__find_by_attr(const struct perf_event_attr *attr)
{
- struct perf_pmu *pmu = evsel->pmu;
- bool legacy_core_type;
-
- if (pmu)
- return pmu;
+ struct perf_pmu *pmu = perf_pmus__find_by_type(attr->type);
+ u32 type = attr->type;
+ bool legacy_core_type = type == PERF_TYPE_HARDWARE || type == PERF_TYPE_HW_CACHE;
- pmu = perf_pmus__find_by_type(evsel->core.attr.type);
- legacy_core_type =
- evsel->core.attr.type == PERF_TYPE_HARDWARE ||
- evsel->core.attr.type == PERF_TYPE_HW_CACHE;
if (!pmu && legacy_core_type && perf_pmus__supports_extended_type()) {
- u32 type = evsel->core.attr.config >> PERF_PMU_TYPE_SHIFT;
+ type = attr->config >> PERF_PMU_TYPE_SHIFT;
pmu = perf_pmus__find_by_type(type);
}
- if (!pmu && (legacy_core_type || evsel->core.attr.type == PERF_TYPE_RAW)) {
+ if (!pmu && (legacy_core_type || type == PERF_TYPE_RAW)) {
/*
* For legacy events, if there was no extended type info then
* assume the PMU is the first core PMU.
@@ -743,6 +737,17 @@ struct perf_pmu *evsel__find_pmu(const struct evsel *evsel)
*/
pmu = perf_pmus__find_core_pmu();
}
+ return pmu;
+}
+
+struct perf_pmu *evsel__find_pmu(const struct evsel *evsel)
+{
+ struct perf_pmu *pmu = evsel->pmu;
+
+ if (pmu)
+ return pmu;
+
+ pmu = perf_pmus__find_by_attr(&evsel->core.attr);
((struct evsel *)evsel)->pmu = pmu;
return pmu;
}
diff --git a/tools/perf/util/pmus.h b/tools/perf/util/pmus.h
index 8def20e615ad..09590b1057ef 100644
--- a/tools/perf/util/pmus.h
+++ b/tools/perf/util/pmus.h
@@ -5,6 +5,7 @@
#include <stdbool.h>
#include <stddef.h>
+struct perf_event_attr;
struct perf_pmu;
struct print_callbacks;
@@ -16,6 +17,7 @@ void perf_pmus__destroy(void);
struct perf_pmu *perf_pmus__find(const char *name);
struct perf_pmu *perf_pmus__find_by_type(unsigned int type);
+struct perf_pmu *perf_pmus__find_by_attr(const struct perf_event_attr *attr);
struct perf_pmu *perf_pmus__scan(struct perf_pmu *pmu);
struct perf_pmu *perf_pmus__scan_core(struct perf_pmu *pmu);
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v1 10/12] perf parse-events: Minor __add_event refactoring
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
` (8 preceding siblings ...)
2025-06-27 19:24 ` [PATCH v1 09/12] perf pmus: Factor perf_pmus__find_by_attr out of evsel__find_pmu Ian Rogers
@ 2025-06-27 19:24 ` Ian Rogers
2025-06-27 19:24 ` [PATCH v1 11/12] perf evsel: Add evsel__open_per_cpu_and_thread Ian Rogers
` (3 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: Ian Rogers @ 2025-06-27 19:24 UTC (permalink / raw)
To: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Kan Liang, Ben Gainey, James Clark, Howard Chu, Weilin Wang,
Levi Yun, Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones,
Yicong Yang, Anubhav Shelat, Thomas Richter, Jean-Philippe Romain,
Song Liu, linux-perf-users, linux-kernel
Rename cpu_list to user_cpus. If a PMU isn't given, find it early from
the perf_event_attr. Make the pmu_cpus more explicitly a copy from the
PMU (except when user_cpus are given). Derive the cpus from pmu_cpus
and user_cpus as appropriate. Handle strdup errors on name and
metric_id.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/parse-events.c | 69 +++++++++++++++++++++++-----------
1 file changed, 48 insertions(+), 21 deletions(-)
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index a78a4bc4e8fe..4092a43aa84e 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -262,12 +262,12 @@ __add_event(struct list_head *list, int *idx,
bool init_attr,
const char *name, const char *metric_id, struct perf_pmu *pmu,
struct list_head *config_terms, struct evsel *first_wildcard_match,
- struct perf_cpu_map *cpu_list, u64 alternate_hw_config)
+ struct perf_cpu_map *user_cpus, u64 alternate_hw_config)
{
struct evsel *evsel;
bool is_pmu_core;
- struct perf_cpu_map *cpus;
- bool has_cpu_list = !perf_cpu_map__is_empty(cpu_list);
+ struct perf_cpu_map *cpus, *pmu_cpus;
+ bool has_user_cpus = !perf_cpu_map__is_empty(user_cpus);
/*
* Ensure the first_wildcard_match's PMU matches that of the new event
@@ -291,8 +291,6 @@ __add_event(struct list_head *list, int *idx,
}
if (pmu) {
- is_pmu_core = pmu->is_core;
- cpus = perf_cpu_map__get(has_cpu_list ? cpu_list : pmu->cpus);
perf_pmu__warn_invalid_formats(pmu);
if (attr->type == PERF_TYPE_RAW || attr->type >= PERF_TYPE_MAX) {
perf_pmu__warn_invalid_config(pmu, attr->config, name,
@@ -304,48 +302,77 @@ __add_event(struct list_head *list, int *idx,
perf_pmu__warn_invalid_config(pmu, attr->config3, name,
PERF_PMU_FORMAT_VALUE_CONFIG3, "config3");
}
+ }
+ /*
+ * If a PMU wasn't given, such as for legacy events, find now that
+ * warnings won't be generated.
+ */
+ if (!pmu)
+ pmu = perf_pmus__find_by_attr(attr);
+
+ if (pmu) {
+ is_pmu_core = pmu->is_core;
+ pmu_cpus = perf_cpu_map__get(pmu->cpus);
} else {
is_pmu_core = (attr->type == PERF_TYPE_HARDWARE ||
attr->type == PERF_TYPE_HW_CACHE);
- if (has_cpu_list)
- cpus = perf_cpu_map__get(cpu_list);
- else
- cpus = is_pmu_core ? cpu_map__online() : NULL;
+ pmu_cpus = is_pmu_core ? cpu_map__online() : NULL;
+ }
+
+ if (has_user_cpus) {
+ cpus = perf_cpu_map__get(user_cpus);
+ /* Existing behavior that pmu_cpus matches the given user ones. */
+ perf_cpu_map__put(pmu_cpus);
+ pmu_cpus = perf_cpu_map__get(user_cpus);
+ } else {
+ cpus = perf_cpu_map__get(pmu_cpus);
}
+
if (init_attr)
event_attr_init(attr);
evsel = evsel__new_idx(attr, *idx);
- if (!evsel) {
- perf_cpu_map__put(cpus);
- return NULL;
+ if (!evsel)
+ goto out_err;
+
+ if (name) {
+ evsel->name = strdup(name);
+ if (!evsel->name)
+ goto out_err;
+ }
+
+ if (metric_id) {
+ evsel->metric_id = strdup(metric_id);
+ if (!evsel->metric_id)
+ goto out_err;
}
(*idx)++;
evsel->core.cpus = cpus;
- evsel->core.pmu_cpus = perf_cpu_map__get(cpus);
+ evsel->core.pmu_cpus = pmu_cpus;
evsel->core.requires_cpu = pmu ? pmu->is_uncore : false;
evsel->core.is_pmu_core = is_pmu_core;
evsel->pmu = pmu;
evsel->alternate_hw_config = alternate_hw_config;
evsel->first_wildcard_match = first_wildcard_match;
- if (name)
- evsel->name = strdup(name);
-
- if (metric_id)
- evsel->metric_id = strdup(metric_id);
-
if (config_terms)
list_splice_init(config_terms, &evsel->config_terms);
if (list)
list_add_tail(&evsel->core.node, list);
- if (has_cpu_list)
- evsel__warn_user_requested_cpus(evsel, cpu_list);
+ if (has_user_cpus)
+ evsel__warn_user_requested_cpus(evsel, user_cpus);
return evsel;
+out_err:
+ perf_cpu_map__put(cpus);
+ perf_cpu_map__put(pmu_cpus);
+ zfree(&evsel->name);
+ zfree(&evsel->metric_id);
+ free(evsel);
+ return NULL;
}
struct evsel *parse_events__add_event(int idx, struct perf_event_attr *attr,
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v1 11/12] perf evsel: Add evsel__open_per_cpu_and_thread
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
` (9 preceding siblings ...)
2025-06-27 19:24 ` [PATCH v1 10/12] perf parse-events: Minor __add_event refactoring Ian Rogers
@ 2025-06-27 19:24 ` Ian Rogers
2025-06-27 19:24 ` [PATCH v1 12/12] perf parse-events: Support user CPUs mixed with threads/processes Ian Rogers
` (2 subsequent siblings)
13 siblings, 0 replies; 22+ messages in thread
From: Ian Rogers @ 2025-06-27 19:24 UTC (permalink / raw)
To: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Kan Liang, Ben Gainey, James Clark, Howard Chu, Weilin Wang,
Levi Yun, Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones,
Yicong Yang, Anubhav Shelat, Thomas Richter, Jean-Philippe Romain,
Song Liu, linux-perf-users, linux-kernel
Add evsel__open_per_cpu_and_thread that combines the operation of
evsel__open_per_cpu and evsel__open_per_thread so that an event
without the "any" cpumask can be opened with its cpumask and with
threads it specifies. Change the implementation of evsel__open_per_cpu
and evsel__open_per_thread to use evsel__open_per_cpu_and_thread to
make the implementation of those functions clearer.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/perf/util/evsel.c | 23 +++++++++++++++++++----
tools/perf/util/evsel.h | 3 +++
2 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 1169aa60c5fc..9abc62635e76 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2741,17 +2741,32 @@ void evsel__close(struct evsel *evsel)
perf_evsel__free_id(&evsel->core);
}
-int evsel__open_per_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, int cpu_map_idx)
+int evsel__open_per_cpu_and_thread(struct evsel *evsel,
+ struct perf_cpu_map *cpus, int cpu_map_idx,
+ struct perf_thread_map *threads)
{
if (cpu_map_idx == -1)
- return evsel__open_cpu(evsel, cpus, NULL, 0, perf_cpu_map__nr(cpus));
+ return evsel__open_cpu(evsel, cpus, threads, 0, perf_cpu_map__nr(cpus));
- return evsel__open_cpu(evsel, cpus, NULL, cpu_map_idx, cpu_map_idx + 1);
+ return evsel__open_cpu(evsel, cpus, threads, cpu_map_idx, cpu_map_idx + 1);
+}
+
+int evsel__open_per_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, int cpu_map_idx)
+{
+ struct perf_thread_map *threads = thread_map__new_by_tid(-1);
+ int ret = evsel__open_per_cpu_and_thread(evsel, cpus, cpu_map_idx, threads);
+
+ perf_thread_map__put(threads);
+ return ret;
}
int evsel__open_per_thread(struct evsel *evsel, struct perf_thread_map *threads)
{
- return evsel__open(evsel, NULL, threads);
+ struct perf_cpu_map *cpus = perf_cpu_map__new_any_cpu();
+ int ret = evsel__open_per_cpu_and_thread(evsel, cpus, -1, threads);
+
+ perf_cpu_map__put(cpus);
+ return ret;
}
static int perf_evsel__parse_id_sample(const struct evsel *evsel,
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 8b5962a1e814..4099812d9548 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -349,6 +349,9 @@ int evsel__enable(struct evsel *evsel);
int evsel__disable(struct evsel *evsel);
int evsel__disable_cpu(struct evsel *evsel, int cpu_map_idx);
+int evsel__open_per_cpu_and_thread(struct evsel *evsel,
+ struct perf_cpu_map *cpus, int cpu_map_idx,
+ struct perf_thread_map *threads);
int evsel__open_per_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, int cpu_map_idx);
int evsel__open_per_thread(struct evsel *evsel, struct perf_thread_map *threads);
int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v1 12/12] perf parse-events: Support user CPUs mixed with threads/processes
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
` (10 preceding siblings ...)
2025-06-27 19:24 ` [PATCH v1 11/12] perf evsel: Add evsel__open_per_cpu_and_thread Ian Rogers
@ 2025-06-27 19:24 ` Ian Rogers
2025-07-16 20:28 ` Namhyung Kim
2025-07-15 19:55 ` [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
2025-07-21 16:13 ` James Clark
13 siblings, 1 reply; 22+ messages in thread
From: Ian Rogers @ 2025-06-27 19:24 UTC (permalink / raw)
To: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Kan Liang, Ben Gainey, James Clark, Howard Chu, Weilin Wang,
Levi Yun, Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones,
Yicong Yang, Anubhav Shelat, Thomas Richter, Jean-Philippe Romain,
Song Liu, linux-perf-users, linux-kernel
Counting events system-wide with a specified CPU prior to this change worked:
```
$ perf stat -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' -a sleep 1
Performance counter stats for 'system wide':
59,393,419,099 msr/tsc/
33,927,965,927 msr/tsc,cpu=cpu_core/
25,465,608,044 msr/tsc,cpu=cpu_atom/
```
However, when counting with process the counts became system wide:
```
$ perf stat -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10
10.1: Basic parsing test : Ok
10.2: Parsing without PMU name : Ok
10.3: Parsing with PMU name : Ok
Performance counter stats for 'perf test -F 10':
59,233,549 msr/tsc/
59,227,556 msr/tsc,cpu=cpu_core/
59,224,053 msr/tsc,cpu=cpu_atom/
```
Make the handling of CPU maps with event parsing clearer. When an
event is parsed creating an evsel the cpus should be either the PMU's
cpumask or user specified CPUs.
Update perf_evlist__propagate_maps so that it doesn't clobber the user
specified CPUs. Try to make the behavior clearer, firstly fix up
missing cpumasks. Next, perform sanity checks and adjustments from the
global evlist CPU requests and for the PMU including simplifying to
the "any CPU"(-1) value. Finally remove the event if the cpumask is
empty.
So that events are opened with a CPU and a thread change stat's
create_perf_stat_counter to give both.
With the change things are fixed:
```
$ perf stat --no-scale -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10
10.1: Basic parsing test : Ok
10.2: Parsing without PMU name : Ok
10.3: Parsing with PMU name : Ok
Performance counter stats for 'perf test -F 10':
63,704,975 msr/tsc/
47,060,704 msr/tsc,cpu=cpu_core/ (4.62%)
16,640,591 msr/tsc,cpu=cpu_atom/ (2.18%)
```
However, note the "--no-scale" option is used. This is necessary as
the running time for the event on the counter isn't the same as the
enabled time because the thread doesn't necessarily run on the CPUs
specified for the counter. All counter values are scaled with:
scaled_value = value * time_enabled / time_running
and so without --no-scale the scaled_value becomes very large. This
problem already exists on hybrid systems for the same reason. Here are
2 runs of the same code with an instructions event that counts the
same on both types of core, there is no real multiplexing happening on
the event:
```
$ perf stat -e instructions perf test -F 10
...
Performance counter stats for 'perf test -F 10':
87,896,447 cpu_atom/instructions/ (14.37%)
98,171,964 cpu_core/instructions/ (85.63%)
...
$ perf stat --no-scale -e instructions perf test -F 10
...
Performance counter stats for 'perf test -F 10':
13,069,890 cpu_atom/instructions/ (19.32%)
83,460,274 cpu_core/instructions/ (80.68%)
...
```
The scaling has inflated per-PMU instruction counts and the overall
count by 2x.
To fix this the kernel needs changing when a task+CPU event (or just
task event on hybrid) is scheduled out. A fix could be that the state
isn't inactive but off for such events, so that time_enabled counts
don't accumulate on them.
Signed-off-by: Ian Rogers <irogers@google.com>
---
tools/lib/perf/evlist.c | 118 ++++++++++++++++++++++-----------
tools/perf/util/parse-events.c | 10 ++-
tools/perf/util/stat.c | 6 +-
3 files changed, 86 insertions(+), 48 deletions(-)
diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index 9d9dec21f510..2d2236400220 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -36,49 +36,87 @@ void perf_evlist__init(struct perf_evlist *evlist)
static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
struct perf_evsel *evsel)
{
- if (evsel->system_wide) {
- /* System wide: set the cpu map of the evsel to all online CPUs. */
- perf_cpu_map__put(evsel->cpus);
- evsel->cpus = perf_cpu_map__new_online_cpus();
- } else if (evlist->has_user_cpus && evsel->is_pmu_core) {
- /*
- * User requested CPUs on a core PMU, ensure the requested CPUs
- * are valid by intersecting with those of the PMU.
- */
+ if (perf_cpu_map__is_empty(evsel->cpus)) {
+ if (perf_cpu_map__is_empty(evsel->pmu_cpus)) {
+ /*
+ * Assume the unset PMU cpus were for a system-wide
+ * event, like a software or tracepoint.
+ */
+ evsel->pmu_cpus = perf_cpu_map__new_online_cpus();
+ }
+ if (evlist->has_user_cpus && !evsel->system_wide) {
+ /*
+ * Use the user CPUs unless the evsel is set to be
+ * system wide, such as the dummy event.
+ */
+ evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
+ } else {
+ /*
+ * System wide and other modes, assume the cpu map
+ * should be set to all PMU CPUs.
+ */
+ evsel->cpus = perf_cpu_map__get(evsel->pmu_cpus);
+ }
+ }
+ /*
+ * Avoid "any CPU"(-1) for uncore and PMUs that require a CPU, even if
+ * requested.
+ */
+ if (evsel->requires_cpu && perf_cpu_map__has_any_cpu(evsel->cpus)) {
perf_cpu_map__put(evsel->cpus);
- evsel->cpus = perf_cpu_map__intersect(evlist->user_requested_cpus, evsel->pmu_cpus);
+ evsel->cpus = perf_cpu_map__get(evsel->pmu_cpus);
+ }
- /*
- * Empty cpu lists would eventually get opened as "any" so remove
- * genuinely empty ones before they're opened in the wrong place.
- */
- if (perf_cpu_map__is_empty(evsel->cpus)) {
- struct perf_evsel *next = perf_evlist__next(evlist, evsel);
-
- perf_evlist__remove(evlist, evsel);
- /* Keep idx contiguous */
- if (next)
- list_for_each_entry_from(next, &evlist->entries, node)
- next->idx--;
+ /*
+ * Globally requested CPUs replace user requested unless the evsel is
+ * set to be system wide.
+ */
+ if (evlist->has_user_cpus && !evsel->system_wide) {
+ assert(!perf_cpu_map__has_any_cpu(evlist->user_requested_cpus));
+ if (!perf_cpu_map__equal(evsel->cpus, evlist->user_requested_cpus)) {
+ perf_cpu_map__put(evsel->cpus);
+ evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
}
- } else if (!evsel->pmu_cpus || evlist->has_user_cpus ||
- (!evsel->requires_cpu && perf_cpu_map__has_any_cpu(evlist->user_requested_cpus))) {
- /*
- * The PMU didn't specify a default cpu map, this isn't a core
- * event and the user requested CPUs or the evlist user
- * requested CPUs have the "any CPU" (aka dummy) CPU value. In
- * which case use the user requested CPUs rather than the PMU
- * ones.
- */
+ }
+
+ /* Ensure cpus only references valid PMU CPUs. */
+ if (!perf_cpu_map__has_any_cpu(evsel->cpus) &&
+ !perf_cpu_map__is_subset(evsel->pmu_cpus, evsel->cpus)) {
+ struct perf_cpu_map *tmp = perf_cpu_map__intersect(evsel->pmu_cpus, evsel->cpus);
+
perf_cpu_map__put(evsel->cpus);
- evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
- } else if (evsel->cpus != evsel->pmu_cpus) {
- /*
- * No user requested cpu map but the PMU cpu map doesn't match
- * the evsel's. Reset it back to the PMU cpu map.
- */
+ evsel->cpus = tmp;
+ }
+
+ /*
+ * Was event requested on all the PMU's CPUs but the user requested is
+ * any CPU (-1)? If so switch to using any CPU (-1) to reduce the number
+ * of events.
+ */
+ if (!evsel->system_wide &&
+ perf_cpu_map__equal(evsel->cpus, evsel->pmu_cpus) &&
+ perf_cpu_map__has_any_cpu(evlist->user_requested_cpus)) {
perf_cpu_map__put(evsel->cpus);
- evsel->cpus = perf_cpu_map__get(evsel->pmu_cpus);
+ evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
+ }
+
+ /* Sanity check assert before the evsel is potentially removed. */
+ assert(!evsel->requires_cpu || !perf_cpu_map__has_any_cpu(evsel->cpus));
+
+ /*
+ * Empty cpu lists would eventually get opened as "any" so remove
+ * genuinely empty ones before they're opened in the wrong place.
+ */
+ if (perf_cpu_map__is_empty(evsel->cpus)) {
+ struct perf_evsel *next = perf_evlist__next(evlist, evsel);
+
+ perf_evlist__remove(evlist, evsel);
+ /* Keep idx contiguous */
+ if (next)
+ list_for_each_entry_from(next, &evlist->entries, node)
+ next->idx--;
+
+ return;
}
if (evsel->system_wide) {
@@ -98,6 +136,10 @@ static void perf_evlist__propagate_maps(struct perf_evlist *evlist)
evlist->needs_map_propagation = true;
+ /* Clear the all_cpus set which will be merged into during propagation. */
+ perf_cpu_map__put(evlist->all_cpus);
+ evlist->all_cpus = NULL;
+
list_for_each_entry_safe(evsel, n, &evlist->entries, node)
__perf_evlist__propagate_maps(evlist, evsel);
}
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 4092a43aa84e..0ff7ae75d8f9 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -313,20 +313,18 @@ __add_event(struct list_head *list, int *idx,
if (pmu) {
is_pmu_core = pmu->is_core;
pmu_cpus = perf_cpu_map__get(pmu->cpus);
+ if (perf_cpu_map__is_empty(pmu_cpus))
+ pmu_cpus = cpu_map__online();
} else {
is_pmu_core = (attr->type == PERF_TYPE_HARDWARE ||
attr->type == PERF_TYPE_HW_CACHE);
pmu_cpus = is_pmu_core ? cpu_map__online() : NULL;
}
- if (has_user_cpus) {
+ if (has_user_cpus)
cpus = perf_cpu_map__get(user_cpus);
- /* Existing behavior that pmu_cpus matches the given user ones. */
- perf_cpu_map__put(pmu_cpus);
- pmu_cpus = perf_cpu_map__get(user_cpus);
- } else {
+ else
cpus = perf_cpu_map__get(pmu_cpus);
- }
if (init_attr)
event_attr_init(attr);
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index 355a7d5c8ab8..8d3bcdb69d37 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -769,8 +769,6 @@ int create_perf_stat_counter(struct evsel *evsel,
attr->enable_on_exec = 1;
}
- if (target__has_cpu(target) && !target__has_per_thread(target))
- return evsel__open_per_cpu(evsel, evsel__cpus(evsel), cpu_map_idx);
-
- return evsel__open_per_thread(evsel, evsel->core.threads);
+ return evsel__open_per_cpu_and_thread(evsel, evsel__cpus(evsel), cpu_map_idx,
+ evsel->core.threads);
}
--
2.50.0.727.gbf7dc18ff4-goog
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
` (11 preceding siblings ...)
2025-06-27 19:24 ` [PATCH v1 12/12] perf parse-events: Support user CPUs mixed with threads/processes Ian Rogers
@ 2025-07-15 19:55 ` Ian Rogers
2025-07-16 20:03 ` Falcon, Thomas
2025-07-21 16:13 ` James Clark
13 siblings, 1 reply; 22+ messages in thread
From: Ian Rogers @ 2025-07-15 19:55 UTC (permalink / raw)
To: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
Kan Liang, Ben Gainey, James Clark, Howard Chu, Weilin Wang,
Levi Yun, Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones,
Yicong Yang, Anubhav Shelat, Thomas Richter, Jean-Philippe Romain,
Song Liu, linux-perf-users, linux-kernel
On Fri, Jun 27, 2025 at 12:24 PM Ian Rogers <irogers@google.com> wrote:
>
> On hybrid systems some PMUs apply to all core types, particularly for
> metrics the msr PMU and the tsc event. The metrics often only want the
> values of the counter for their specific core type. These patches
> allow the cpu term in an event to give a PMU name to take the cpumask
> from. For example:
>
> $ perf stat -e msr/tsc,cpu=cpu_atom/ ...
>
> will aggregate the msr/tsc/ value but only for atom cores. In doing
> this problems were identified in how cpumasks are handled by parsing
> and event setup when cpumasks are specified along with a task to
> profile. The event parsing, cpumask evlist propagation code and perf
> stat code are updated accordingly.
>
> The final result of the patch series is to be able to run:
> ```
> $ perf stat --no-scale -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10
> 10.1: Basic parsing test : Ok
> 10.2: Parsing without PMU name : Ok
> 10.3: Parsing with PMU name : Ok
>
> Performance counter stats for 'perf test -F 10':
>
> 63,704,975 msr/tsc/
> 47,060,704 msr/tsc,cpu=cpu_core/ (4.62%)
> 16,640,591 msr/tsc,cpu=cpu_atom/ (2.18%)
> ```
>
> This has (further) identified a kernel bug for task events around the
> enabled time being too large leading to invalid scaling (hence the
> --no-scale in the command line above).
>
> Ian Rogers (12):
> perf parse-events: Warn if a cpu term is unsupported by a CPU
> perf stat: Avoid buffer overflow to the aggregation map
> perf stat: Don't size aggregation ids from user_requested_cpus
> perf parse-events: Allow the cpu term to be a PMU
> perf tool_pmu: Allow num_cpus(_online) to be specific to a cpumask
> libperf evsel: Rename own_cpus to pmu_cpus
> libperf evsel: Factor perf_evsel__exit out of perf_evsel__delete
> perf evsel: Use libperf perf_evsel__exit
> perf pmus: Factor perf_pmus__find_by_attr out of evsel__find_pmu
> perf parse-events: Minor __add_event refactoring
> perf evsel: Add evsel__open_per_cpu_and_thread
> perf parse-events: Support user CPUs mixed with threads/processes
Ping.
Thanks,
Ian
> tools/lib/perf/evlist.c | 118 ++++++++++++++++--------
> tools/lib/perf/evsel.c | 9 +-
> tools/lib/perf/include/internal/evsel.h | 3 +-
> tools/perf/builtin-stat.c | 9 +-
> tools/perf/tests/event_update.c | 4 +-
> tools/perf/util/evlist.c | 15 +--
> tools/perf/util/evsel.c | 55 +++++++++--
> tools/perf/util/evsel.h | 5 +
> tools/perf/util/expr.c | 2 +-
> tools/perf/util/header.c | 4 +-
> tools/perf/util/parse-events.c | 102 ++++++++++++++------
> tools/perf/util/pmus.c | 29 +++---
> tools/perf/util/pmus.h | 2 +
> tools/perf/util/stat.c | 6 +-
> tools/perf/util/synthetic-events.c | 4 +-
> tools/perf/util/tool_pmu.c | 56 +++++++++--
> tools/perf/util/tool_pmu.h | 2 +-
> 17 files changed, 297 insertions(+), 128 deletions(-)
>
> --
> 2.50.0.727.gbf7dc18ff4-goog
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid
2025-07-15 19:55 ` [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
@ 2025-07-16 20:03 ` Falcon, Thomas
0 siblings, 0 replies; 22+ messages in thread
From: Falcon, Thomas @ 2025-07-16 20:03 UTC (permalink / raw)
To: ben.gainey@arm.com, alexander.shishkin@linux.intel.com,
blakejones@google.com, tmricht@linux.ibm.com, song@kernel.org,
howardchu95@gmail.com, Hunter, Adrian,
jean-philippe.romain@foss.st.com, linux-kernel@vger.kernel.org,
mingo@redhat.com, irogers@google.com, ashelat@redhat.com,
linux-perf-users@vger.kernel.org, james.clark@linaro.org,
kan.liang@linux.intel.com, mark.rutland@arm.com,
peterz@infradead.org, linux@treblig.org, yeoreum.yun@arm.com,
Wang, Weilin, acme@kernel.org, yangyicong@hisilicon.com,
jolsa@kernel.org, namhyung@kernel.org, quic_zhonhan@quicinc.com
On Tue, 2025-07-15 at 12:55 -0700, Ian Rogers wrote:
> On Fri, Jun 27, 2025 at 12:24 PM Ian Rogers <irogers@google.com> wrote:
> >
> > On hybrid systems some PMUs apply to all core types, particularly for
> > metrics the msr PMU and the tsc event. The metrics often only want the
> > values of the counter for their specific core type. These patches
> > allow the cpu term in an event to give a PMU name to take the cpumask
> > from. For example:
> >
> > $ perf stat -e msr/tsc,cpu=cpu_atom/ ...
> >
> > will aggregate the msr/tsc/ value but only for atom cores. In doing
> > this problems were identified in how cpumasks are handled by parsing
> > and event setup when cpumasks are specified along with a task to
> > profile. The event parsing, cpumask evlist propagation code and perf
> > stat code are updated accordingly.
> >
> > The final result of the patch series is to be able to run:
> > ```
> > $ perf stat --no-scale -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10
> > 10.1: Basic parsing test : Ok
> > 10.2: Parsing without PMU name : Ok
> > 10.3: Parsing with PMU name : Ok
> >
> > Performance counter stats for 'perf test -F 10':
> >
> > 63,704,975 msr/tsc/
> > 47,060,704 msr/tsc,cpu=cpu_core/ (4.62%)
> > 16,640,591 msr/tsc,cpu=cpu_atom/ (2.18%)
> > ```
> >
> > This has (further) identified a kernel bug for task events around the
> > enabled time being too large leading to invalid scaling (hence the
> > --no-scale in the command line above).
> >
> > Ian Rogers (12):
> > perf parse-events: Warn if a cpu term is unsupported by a CPU
> > perf stat: Avoid buffer overflow to the aggregation map
> > perf stat: Don't size aggregation ids from user_requested_cpus
> > perf parse-events: Allow the cpu term to be a PMU
> > perf tool_pmu: Allow num_cpus(_online) to be specific to a cpumask
> > libperf evsel: Rename own_cpus to pmu_cpus
> > libperf evsel: Factor perf_evsel__exit out of perf_evsel__delete
> > perf evsel: Use libperf perf_evsel__exit
> > perf pmus: Factor perf_pmus__find_by_attr out of evsel__find_pmu
> > perf parse-events: Minor __add_event refactoring
> > perf evsel: Add evsel__open_per_cpu_and_thread
> > perf parse-events: Support user CPUs mixed with threads/processes
>
> Ping.
Hi Ian,
Looks good to me.
Reviewed-by: Thomas Falcon <thomas.falcon@intel.com>
Thanks,
Tom
>
> Thanks,
> Ian
>
> > tools/lib/perf/evlist.c | 118 ++++++++++++++++--------
> > tools/lib/perf/evsel.c | 9 +-
> > tools/lib/perf/include/internal/evsel.h | 3 +-
> > tools/perf/builtin-stat.c | 9 +-
> > tools/perf/tests/event_update.c | 4 +-
> > tools/perf/util/evlist.c | 15 +--
> > tools/perf/util/evsel.c | 55 +++++++++--
> > tools/perf/util/evsel.h | 5 +
> > tools/perf/util/expr.c | 2 +-
> > tools/perf/util/header.c | 4 +-
> > tools/perf/util/parse-events.c | 102 ++++++++++++++------
> > tools/perf/util/pmus.c | 29 +++---
> > tools/perf/util/pmus.h | 2 +
> > tools/perf/util/stat.c | 6 +-
> > tools/perf/util/synthetic-events.c | 4 +-
> > tools/perf/util/tool_pmu.c | 56 +++++++++--
> > tools/perf/util/tool_pmu.h | 2 +-
> > 17 files changed, 297 insertions(+), 128 deletions(-)
> >
> > --
> > 2.50.0.727.gbf7dc18ff4-goog
> >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/12] perf parse-events: Allow the cpu term to be a PMU
2025-06-27 19:24 ` [PATCH v1 04/12] perf parse-events: Allow the cpu term to be a PMU Ian Rogers
@ 2025-07-16 20:09 ` Namhyung Kim
2025-07-16 20:25 ` Ian Rogers
0 siblings, 1 reply; 22+ messages in thread
From: Namhyung Kim @ 2025-07-16 20:09 UTC (permalink / raw)
To: Ian Rogers
Cc: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Jiri Olsa, Adrian Hunter, Kan Liang, Ben Gainey, James Clark,
Howard Chu, Weilin Wang, Levi Yun, Dr. David Alan Gilbert,
Zhongqiu Han, Blake Jones, Yicong Yang, Anubhav Shelat,
Thomas Richter, Jean-Philippe Romain, Song Liu, linux-perf-users,
linux-kernel
On Fri, Jun 27, 2025 at 12:24:09PM -0700, Ian Rogers wrote:
> On hybrid systems, events like msr/tsc/ will aggregate counts across
> all CPUs. Often metrics only want a value like msr/tsc/ for the cores
> on which the metric is being computed. Listing each CPU with terms
> cpu=0,cpu=1.. is laborious and would need to be encoded for all
> variations of a CPU model.
>
> Allow the cpumask from a PMU to be an argument to the cpu term. For
> example in the following the cpumask of the cstate_pkg PMU selects the
> CPUs to count msr/tsc/ counter upon:
> ```
> $ cat /sys/bus/event_source/devices/cstate_pkg/cpumask
> 0
> $ perf stat -A -e 'msr/tsc,cpu=cstate_pkg/' -a sleep 0.1
It can be confusing if 'cpu' takes a number or a PMU name. What about
adding a new term (maybe 'cpu_from') to handle this case?
Also please update the documentation.
Thanks,
Namhyung
>
> Performance counter stats for 'system wide':
>
> CPU0 252,621,253 msr/tsc,cpu=cstate_pkg/
>
> 0.101184092 seconds time elapsed
> ```
>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
> tools/perf/util/parse-events.c | 37 +++++++++++++++++++++++++---------
> 1 file changed, 28 insertions(+), 9 deletions(-)
>
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 7a32d5234a64..ef38eb082342 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -192,10 +192,20 @@ static struct perf_cpu_map *get_config_cpu(const struct parse_events_terms *head
>
> list_for_each_entry(term, &head_terms->terms, list) {
> if (term->type_term == PARSE_EVENTS__TERM_TYPE_CPU) {
> - struct perf_cpu_map *cpu = perf_cpu_map__new_int(term->val.num);
> + struct perf_cpu_map *term_cpus;
>
> - perf_cpu_map__merge(&cpus, cpu);
> - perf_cpu_map__put(cpu);
> + if (term->type_val == PARSE_EVENTS__TERM_TYPE_NUM) {
> + term_cpus = perf_cpu_map__new_int(term->val.num);
> + } else {
> + struct perf_pmu *pmu = perf_pmus__find(term->val.str);
> +
> + if (perf_cpu_map__is_empty(pmu->cpus))
> + term_cpus = pmu->is_core ? cpu_map__online() : NULL;
> + else
> + term_cpus = perf_cpu_map__get(pmu->cpus);
> + }
> + perf_cpu_map__merge(&cpus, term_cpus);
> + perf_cpu_map__put(term_cpus);
> }
> }
>
> @@ -1054,12 +1064,21 @@ do { \
> }
> break;
> case PARSE_EVENTS__TERM_TYPE_CPU:
> - CHECK_TYPE_VAL(NUM);
> - if (term->val.num >= (u64)cpu__max_present_cpu().cpu) {
> - parse_events_error__handle(err, term->err_val,
> - strdup("too big"),
> - NULL);
> - return -EINVAL;
> + if (term->type_val == PARSE_EVENTS__TERM_TYPE_NUM) {
> + if (term->val.num >= (u64)cpu__max_present_cpu().cpu) {
> + parse_events_error__handle(err, term->err_val,
> + strdup("too big"),
> + /*help=*/NULL);
> + return -EINVAL;
> + }
> + } else {
> + assert(term->type_val == PARSE_EVENTS__TERM_TYPE_STR);
> + if (perf_pmus__find(term->val.str) == NULL) {
> + parse_events_error__handle(err, term->err_val,
> + strdup("not a valid PMU"),
> + /*help=*/NULL);
> + return -EINVAL;
> + }
> }
> break;
> case PARSE_EVENTS__TERM_TYPE_DRV_CFG:
> --
> 2.50.0.727.gbf7dc18ff4-goog
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/12] perf parse-events: Allow the cpu term to be a PMU
2025-07-16 20:09 ` Namhyung Kim
@ 2025-07-16 20:25 ` Ian Rogers
2025-07-18 17:56 ` Namhyung Kim
0 siblings, 1 reply; 22+ messages in thread
From: Ian Rogers @ 2025-07-16 20:25 UTC (permalink / raw)
To: Namhyung Kim
Cc: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Jiri Olsa, Adrian Hunter, Kan Liang, Ben Gainey, James Clark,
Howard Chu, Weilin Wang, Levi Yun, Dr. David Alan Gilbert,
Zhongqiu Han, Blake Jones, Yicong Yang, Anubhav Shelat,
Thomas Richter, Jean-Philippe Romain, Song Liu, linux-perf-users,
linux-kernel
On Wed, Jul 16, 2025 at 1:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Jun 27, 2025 at 12:24:09PM -0700, Ian Rogers wrote:
> > On hybrid systems, events like msr/tsc/ will aggregate counts across
> > all CPUs. Often metrics only want a value like msr/tsc/ for the cores
> > on which the metric is being computed. Listing each CPU with terms
> > cpu=0,cpu=1.. is laborious and would need to be encoded for all
> > variations of a CPU model.
> >
> > Allow the cpumask from a PMU to be an argument to the cpu term. For
> > example in the following the cpumask of the cstate_pkg PMU selects the
> > CPUs to count msr/tsc/ counter upon:
> > ```
> > $ cat /sys/bus/event_source/devices/cstate_pkg/cpumask
> > 0
> > $ perf stat -A -e 'msr/tsc,cpu=cstate_pkg/' -a sleep 0.1
>
> It can be confusing if 'cpu' takes a number or a PMU name. What about
> adding a new term (maybe 'cpu_from') to handle this case?
So it is possible for terms to be defined in sysfs in the 'format/' folder:
```
$ ls /sys/bus/event_source/devices/cpu_core/format/
cmask edge event frontend inv ldlat offcore_rsp pc umask
```
By not introducing a new term we leave 'cpu_from' open for use in this
way. When I spoke to Kan we thought using the existing term made sense
and fits the idea of leaving things open for the kernel/drivers to
use. It is possible to add a new term though. Let me know and I can
update the patch and documentation accordingly.
Thanks,
Ian
> Also please update the documentation.
>
> Thanks,
> Namhyung
>
> >
> > Performance counter stats for 'system wide':
> >
> > CPU0 252,621,253 msr/tsc,cpu=cstate_pkg/
> >
> > 0.101184092 seconds time elapsed
> > ```
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> > tools/perf/util/parse-events.c | 37 +++++++++++++++++++++++++---------
> > 1 file changed, 28 insertions(+), 9 deletions(-)
> >
> > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> > index 7a32d5234a64..ef38eb082342 100644
> > --- a/tools/perf/util/parse-events.c
> > +++ b/tools/perf/util/parse-events.c
> > @@ -192,10 +192,20 @@ static struct perf_cpu_map *get_config_cpu(const struct parse_events_terms *head
> >
> > list_for_each_entry(term, &head_terms->terms, list) {
> > if (term->type_term == PARSE_EVENTS__TERM_TYPE_CPU) {
> > - struct perf_cpu_map *cpu = perf_cpu_map__new_int(term->val.num);
> > + struct perf_cpu_map *term_cpus;
> >
> > - perf_cpu_map__merge(&cpus, cpu);
> > - perf_cpu_map__put(cpu);
> > + if (term->type_val == PARSE_EVENTS__TERM_TYPE_NUM) {
> > + term_cpus = perf_cpu_map__new_int(term->val.num);
> > + } else {
> > + struct perf_pmu *pmu = perf_pmus__find(term->val.str);
> > +
> > + if (perf_cpu_map__is_empty(pmu->cpus))
> > + term_cpus = pmu->is_core ? cpu_map__online() : NULL;
> > + else
> > + term_cpus = perf_cpu_map__get(pmu->cpus);
> > + }
> > + perf_cpu_map__merge(&cpus, term_cpus);
> > + perf_cpu_map__put(term_cpus);
> > }
> > }
> >
> > @@ -1054,12 +1064,21 @@ do { \
> > }
> > break;
> > case PARSE_EVENTS__TERM_TYPE_CPU:
> > - CHECK_TYPE_VAL(NUM);
> > - if (term->val.num >= (u64)cpu__max_present_cpu().cpu) {
> > - parse_events_error__handle(err, term->err_val,
> > - strdup("too big"),
> > - NULL);
> > - return -EINVAL;
> > + if (term->type_val == PARSE_EVENTS__TERM_TYPE_NUM) {
> > + if (term->val.num >= (u64)cpu__max_present_cpu().cpu) {
> > + parse_events_error__handle(err, term->err_val,
> > + strdup("too big"),
> > + /*help=*/NULL);
> > + return -EINVAL;
> > + }
> > + } else {
> > + assert(term->type_val == PARSE_EVENTS__TERM_TYPE_STR);
> > + if (perf_pmus__find(term->val.str) == NULL) {
> > + parse_events_error__handle(err, term->err_val,
> > + strdup("not a valid PMU"),
> > + /*help=*/NULL);
> > + return -EINVAL;
> > + }
> > }
> > break;
> > case PARSE_EVENTS__TERM_TYPE_DRV_CFG:
> > --
> > 2.50.0.727.gbf7dc18ff4-goog
> >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 12/12] perf parse-events: Support user CPUs mixed with threads/processes
2025-06-27 19:24 ` [PATCH v1 12/12] perf parse-events: Support user CPUs mixed with threads/processes Ian Rogers
@ 2025-07-16 20:28 ` Namhyung Kim
2025-07-17 0:04 ` Ian Rogers
0 siblings, 1 reply; 22+ messages in thread
From: Namhyung Kim @ 2025-07-16 20:28 UTC (permalink / raw)
To: Ian Rogers
Cc: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Jiri Olsa, Adrian Hunter, Kan Liang, Ben Gainey, James Clark,
Howard Chu, Weilin Wang, Levi Yun, Dr. David Alan Gilbert,
Zhongqiu Han, Blake Jones, Yicong Yang, Anubhav Shelat,
Thomas Richter, Jean-Philippe Romain, Song Liu, linux-perf-users,
linux-kernel
On Fri, Jun 27, 2025 at 12:24:17PM -0700, Ian Rogers wrote:
> Counting events system-wide with a specified CPU prior to this change worked:
> ```
> $ perf stat -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' -a sleep 1
>
> Performance counter stats for 'system wide':
>
> 59,393,419,099 msr/tsc/
> 33,927,965,927 msr/tsc,cpu=cpu_core/
> 25,465,608,044 msr/tsc,cpu=cpu_atom/
> ```
>
> However, when counting with process the counts became system wide:
> ```
> $ perf stat -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10
> 10.1: Basic parsing test : Ok
> 10.2: Parsing without PMU name : Ok
> 10.3: Parsing with PMU name : Ok
>
> Performance counter stats for 'perf test -F 10':
>
> 59,233,549 msr/tsc/
> 59,227,556 msr/tsc,cpu=cpu_core/
> 59,224,053 msr/tsc,cpu=cpu_atom/
> ```
>
> Make the handling of CPU maps with event parsing clearer. When an
> event is parsed creating an evsel the cpus should be either the PMU's
> cpumask or user specified CPUs.
>
> Update perf_evlist__propagate_maps so that it doesn't clobber the user
> specified CPUs. Try to make the behavior clearer, firstly fix up
> missing cpumasks. Next, perform sanity checks and adjustments from the
> global evlist CPU requests and for the PMU including simplifying to
> the "any CPU"(-1) value. Finally remove the event if the cpumask is
> empty.
>
> So that events are opened with a CPU and a thread change stat's
> create_perf_stat_counter to give both.
>
> With the change things are fixed:
> ```
> $ perf stat --no-scale -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10
> 10.1: Basic parsing test : Ok
> 10.2: Parsing without PMU name : Ok
> 10.3: Parsing with PMU name : Ok
>
> Performance counter stats for 'perf test -F 10':
>
> 63,704,975 msr/tsc/
> 47,060,704 msr/tsc,cpu=cpu_core/ (4.62%)
> 16,640,591 msr/tsc,cpu=cpu_atom/ (2.18%)
> ```
>
> However, note the "--no-scale" option is used. This is necessary as
> the running time for the event on the counter isn't the same as the
> enabled time because the thread doesn't necessarily run on the CPUs
> specified for the counter. All counter values are scaled with:
>
> scaled_value = value * time_enabled / time_running
>
> and so without --no-scale the scaled_value becomes very large. This
> problem already exists on hybrid systems for the same reason. Here are
> 2 runs of the same code with an instructions event that counts the
> same on both types of core, there is no real multiplexing happening on
> the event:
This is unfortunate. The event is in a task context but also has a CPU
constraint. So it's not schedulable on other CPUs.
A problem is that it cannot distinguish the real multiplexing..
>
> ```
> $ perf stat -e instructions perf test -F 10
> ...
> Performance counter stats for 'perf test -F 10':
>
> 87,896,447 cpu_atom/instructions/ (14.37%)
> 98,171,964 cpu_core/instructions/ (85.63%)
> ...
> $ perf stat --no-scale -e instructions perf test -F 10
> ...
> Performance counter stats for 'perf test -F 10':
>
> 13,069,890 cpu_atom/instructions/ (19.32%)
> 83,460,274 cpu_core/instructions/ (80.68%)
> ...
> ```
> The scaling has inflated per-PMU instruction counts and the overall
> count by 2x.
>
> To fix this the kernel needs changing when a task+CPU event (or just
> task event on hybrid) is scheduled out. A fix could be that the state
> isn't inactive but off for such events, so that time_enabled counts
> don't accumulate on them.
Right, maybe we need to add a new state (UNSCHEDULABLE?) to skip
updating the enabled time.
Thanks,
Namhyung
>
> Signed-off-by: Ian Rogers <irogers@google.com>
> ---
> tools/lib/perf/evlist.c | 118 ++++++++++++++++++++++-----------
> tools/perf/util/parse-events.c | 10 ++-
> tools/perf/util/stat.c | 6 +-
> 3 files changed, 86 insertions(+), 48 deletions(-)
>
> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> index 9d9dec21f510..2d2236400220 100644
> --- a/tools/lib/perf/evlist.c
> +++ b/tools/lib/perf/evlist.c
> @@ -36,49 +36,87 @@ void perf_evlist__init(struct perf_evlist *evlist)
> static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
> struct perf_evsel *evsel)
> {
> - if (evsel->system_wide) {
> - /* System wide: set the cpu map of the evsel to all online CPUs. */
> - perf_cpu_map__put(evsel->cpus);
> - evsel->cpus = perf_cpu_map__new_online_cpus();
> - } else if (evlist->has_user_cpus && evsel->is_pmu_core) {
> - /*
> - * User requested CPUs on a core PMU, ensure the requested CPUs
> - * are valid by intersecting with those of the PMU.
> - */
> + if (perf_cpu_map__is_empty(evsel->cpus)) {
> + if (perf_cpu_map__is_empty(evsel->pmu_cpus)) {
> + /*
> + * Assume the unset PMU cpus were for a system-wide
> + * event, like a software or tracepoint.
> + */
> + evsel->pmu_cpus = perf_cpu_map__new_online_cpus();
> + }
> + if (evlist->has_user_cpus && !evsel->system_wide) {
> + /*
> + * Use the user CPUs unless the evsel is set to be
> + * system wide, such as the dummy event.
> + */
> + evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
> + } else {
> + /*
> + * System wide and other modes, assume the cpu map
> + * should be set to all PMU CPUs.
> + */
> + evsel->cpus = perf_cpu_map__get(evsel->pmu_cpus);
> + }
> + }
> + /*
> + * Avoid "any CPU"(-1) for uncore and PMUs that require a CPU, even if
> + * requested.
> + */
> + if (evsel->requires_cpu && perf_cpu_map__has_any_cpu(evsel->cpus)) {
> perf_cpu_map__put(evsel->cpus);
> - evsel->cpus = perf_cpu_map__intersect(evlist->user_requested_cpus, evsel->pmu_cpus);
> + evsel->cpus = perf_cpu_map__get(evsel->pmu_cpus);
> + }
>
> - /*
> - * Empty cpu lists would eventually get opened as "any" so remove
> - * genuinely empty ones before they're opened in the wrong place.
> - */
> - if (perf_cpu_map__is_empty(evsel->cpus)) {
> - struct perf_evsel *next = perf_evlist__next(evlist, evsel);
> -
> - perf_evlist__remove(evlist, evsel);
> - /* Keep idx contiguous */
> - if (next)
> - list_for_each_entry_from(next, &evlist->entries, node)
> - next->idx--;
> + /*
> + * Globally requested CPUs replace user requested unless the evsel is
> + * set to be system wide.
> + */
> + if (evlist->has_user_cpus && !evsel->system_wide) {
> + assert(!perf_cpu_map__has_any_cpu(evlist->user_requested_cpus));
> + if (!perf_cpu_map__equal(evsel->cpus, evlist->user_requested_cpus)) {
> + perf_cpu_map__put(evsel->cpus);
> + evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
> }
> - } else if (!evsel->pmu_cpus || evlist->has_user_cpus ||
> - (!evsel->requires_cpu && perf_cpu_map__has_any_cpu(evlist->user_requested_cpus))) {
> - /*
> - * The PMU didn't specify a default cpu map, this isn't a core
> - * event and the user requested CPUs or the evlist user
> - * requested CPUs have the "any CPU" (aka dummy) CPU value. In
> - * which case use the user requested CPUs rather than the PMU
> - * ones.
> - */
> + }
> +
> + /* Ensure cpus only references valid PMU CPUs. */
> + if (!perf_cpu_map__has_any_cpu(evsel->cpus) &&
> + !perf_cpu_map__is_subset(evsel->pmu_cpus, evsel->cpus)) {
> + struct perf_cpu_map *tmp = perf_cpu_map__intersect(evsel->pmu_cpus, evsel->cpus);
> +
> perf_cpu_map__put(evsel->cpus);
> - evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
> - } else if (evsel->cpus != evsel->pmu_cpus) {
> - /*
> - * No user requested cpu map but the PMU cpu map doesn't match
> - * the evsel's. Reset it back to the PMU cpu map.
> - */
> + evsel->cpus = tmp;
> + }
> +
> + /*
> + * Was event requested on all the PMU's CPUs but the user requested is
> + * any CPU (-1)? If so switch to using any CPU (-1) to reduce the number
> + * of events.
> + */
> + if (!evsel->system_wide &&
> + perf_cpu_map__equal(evsel->cpus, evsel->pmu_cpus) &&
> + perf_cpu_map__has_any_cpu(evlist->user_requested_cpus)) {
> perf_cpu_map__put(evsel->cpus);
> - evsel->cpus = perf_cpu_map__get(evsel->pmu_cpus);
> + evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
> + }
> +
> + /* Sanity check assert before the evsel is potentially removed. */
> + assert(!evsel->requires_cpu || !perf_cpu_map__has_any_cpu(evsel->cpus));
> +
> + /*
> + * Empty cpu lists would eventually get opened as "any" so remove
> + * genuinely empty ones before they're opened in the wrong place.
> + */
> + if (perf_cpu_map__is_empty(evsel->cpus)) {
> + struct perf_evsel *next = perf_evlist__next(evlist, evsel);
> +
> + perf_evlist__remove(evlist, evsel);
> + /* Keep idx contiguous */
> + if (next)
> + list_for_each_entry_from(next, &evlist->entries, node)
> + next->idx--;
> +
> + return;
> }
>
> if (evsel->system_wide) {
> @@ -98,6 +136,10 @@ static void perf_evlist__propagate_maps(struct perf_evlist *evlist)
>
> evlist->needs_map_propagation = true;
>
> + /* Clear the all_cpus set which will be merged into during propagation. */
> + perf_cpu_map__put(evlist->all_cpus);
> + evlist->all_cpus = NULL;
> +
> list_for_each_entry_safe(evsel, n, &evlist->entries, node)
> __perf_evlist__propagate_maps(evlist, evsel);
> }
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 4092a43aa84e..0ff7ae75d8f9 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -313,20 +313,18 @@ __add_event(struct list_head *list, int *idx,
> if (pmu) {
> is_pmu_core = pmu->is_core;
> pmu_cpus = perf_cpu_map__get(pmu->cpus);
> + if (perf_cpu_map__is_empty(pmu_cpus))
> + pmu_cpus = cpu_map__online();
> } else {
> is_pmu_core = (attr->type == PERF_TYPE_HARDWARE ||
> attr->type == PERF_TYPE_HW_CACHE);
> pmu_cpus = is_pmu_core ? cpu_map__online() : NULL;
> }
>
> - if (has_user_cpus) {
> + if (has_user_cpus)
> cpus = perf_cpu_map__get(user_cpus);
> - /* Existing behavior that pmu_cpus matches the given user ones. */
> - perf_cpu_map__put(pmu_cpus);
> - pmu_cpus = perf_cpu_map__get(user_cpus);
> - } else {
> + else
> cpus = perf_cpu_map__get(pmu_cpus);
> - }
>
> if (init_attr)
> event_attr_init(attr);
> diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> index 355a7d5c8ab8..8d3bcdb69d37 100644
> --- a/tools/perf/util/stat.c
> +++ b/tools/perf/util/stat.c
> @@ -769,8 +769,6 @@ int create_perf_stat_counter(struct evsel *evsel,
> attr->enable_on_exec = 1;
> }
>
> - if (target__has_cpu(target) && !target__has_per_thread(target))
> - return evsel__open_per_cpu(evsel, evsel__cpus(evsel), cpu_map_idx);
> -
> - return evsel__open_per_thread(evsel, evsel->core.threads);
> + return evsel__open_per_cpu_and_thread(evsel, evsel__cpus(evsel), cpu_map_idx,
> + evsel->core.threads);
> }
> --
> 2.50.0.727.gbf7dc18ff4-goog
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 12/12] perf parse-events: Support user CPUs mixed with threads/processes
2025-07-16 20:28 ` Namhyung Kim
@ 2025-07-17 0:04 ` Ian Rogers
0 siblings, 0 replies; 22+ messages in thread
From: Ian Rogers @ 2025-07-17 0:04 UTC (permalink / raw)
To: Namhyung Kim, Mi, Dapeng1, Andi Kleen, Peter Zijlstra,
Ingo Molnar
Cc: Thomas Falcon, Arnaldo Carvalho de Melo, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang,
Ben Gainey, James Clark, Howard Chu, Weilin Wang, Levi Yun,
Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones, Yicong Yang,
Anubhav Shelat, Thomas Richter, Jean-Philippe Romain, Song Liu,
linux-perf-users, linux-kernel
On Wed, Jul 16, 2025 at 1:28 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Jun 27, 2025 at 12:24:17PM -0700, Ian Rogers wrote:
> > Counting events system-wide with a specified CPU prior to this change worked:
> > ```
> > $ perf stat -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' -a sleep 1
> >
> > Performance counter stats for 'system wide':
> >
> > 59,393,419,099 msr/tsc/
> > 33,927,965,927 msr/tsc,cpu=cpu_core/
> > 25,465,608,044 msr/tsc,cpu=cpu_atom/
> > ```
> >
> > However, when counting with process the counts became system wide:
> > ```
> > $ perf stat -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10
> > 10.1: Basic parsing test : Ok
> > 10.2: Parsing without PMU name : Ok
> > 10.3: Parsing with PMU name : Ok
> >
> > Performance counter stats for 'perf test -F 10':
> >
> > 59,233,549 msr/tsc/
> > 59,227,556 msr/tsc,cpu=cpu_core/
> > 59,224,053 msr/tsc,cpu=cpu_atom/
> > ```
> >
> > Make the handling of CPU maps with event parsing clearer. When an
> > event is parsed creating an evsel the cpus should be either the PMU's
> > cpumask or user specified CPUs.
> >
> > Update perf_evlist__propagate_maps so that it doesn't clobber the user
> > specified CPUs. Try to make the behavior clearer, firstly fix up
> > missing cpumasks. Next, perform sanity checks and adjustments from the
> > global evlist CPU requests and for the PMU including simplifying to
> > the "any CPU"(-1) value. Finally remove the event if the cpumask is
> > empty.
> >
> > So that events are opened with a CPU and a thread change stat's
> > create_perf_stat_counter to give both.
> >
> > With the change things are fixed:
> > ```
> > $ perf stat --no-scale -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10
> > 10.1: Basic parsing test : Ok
> > 10.2: Parsing without PMU name : Ok
> > 10.3: Parsing with PMU name : Ok
> >
> > Performance counter stats for 'perf test -F 10':
> >
> > 63,704,975 msr/tsc/
> > 47,060,704 msr/tsc,cpu=cpu_core/ (4.62%)
> > 16,640,591 msr/tsc,cpu=cpu_atom/ (2.18%)
> > ```
> >
> > However, note the "--no-scale" option is used. This is necessary as
> > the running time for the event on the counter isn't the same as the
> > enabled time because the thread doesn't necessarily run on the CPUs
> > specified for the counter. All counter values are scaled with:
> >
> > scaled_value = value * time_enabled / time_running
> >
> > and so without --no-scale the scaled_value becomes very large. This
> > problem already exists on hybrid systems for the same reason. Here are
> > 2 runs of the same code with an instructions event that counts the
> > same on both types of core, there is no real multiplexing happening on
> > the event:
>
> This is unfortunate. The event is in a task context but also has a CPU
> constraint. So it's not schedulable on other CPUs.
>
> A problem is that it cannot distinguish the real multiplexing..
>
> >
> > ```
> > $ perf stat -e instructions perf test -F 10
> > ...
> > Performance counter stats for 'perf test -F 10':
> >
> > 87,896,447 cpu_atom/instructions/ (14.37%)
> > 98,171,964 cpu_core/instructions/ (85.63%)
> > ...
> > $ perf stat --no-scale -e instructions perf test -F 10
> > ...
> > Performance counter stats for 'perf test -F 10':
> >
> > 13,069,890 cpu_atom/instructions/ (19.32%)
> > 83,460,274 cpu_core/instructions/ (80.68%)
> > ...
> > ```
> > The scaling has inflated per-PMU instruction counts and the overall
> > count by 2x.
> >
> > To fix this the kernel needs changing when a task+CPU event (or just
> > task event on hybrid) is scheduled out. A fix could be that the state
> > isn't inactive but off for such events, so that time_enabled counts
> > don't accumulate on them.
>
> Right, maybe we need to add a new state (UNSCHEDULABLE?) to skip
> updating the enabled time.
Right having a new state would mean in __perf_update_times the enabled
and time wouldn't increase if the filter happened:
https://web.git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/tree/kernel/events/core.c#n716
```
enum perf_event_state state = __perf_effective_state(event);
u64 delta = now - event->tstamp;
*enabled = event->total_time_enabled;
if (state >= PERF_EVENT_STATE_INACTIVE)
*enabled += delta;
*running = event->total_time_running;
if (state >= PERF_EVENT_STATE_ACTIVE)
*running += delta;
```
I sent out this RFC patch that just about makes this change:
https://lore.kernel.org/lkml/20250716223924.825772-1-irogers@google.com/
but for now it seems the only workaround is to use `--no-scale` :-(
This series is still necessary for the hybrid TMA fix for msr/tsc/.
Thanks,
Ian
> Thanks,
> Namhyung
>
> >
> > Signed-off-by: Ian Rogers <irogers@google.com>
> > ---
> > tools/lib/perf/evlist.c | 118 ++++++++++++++++++++++-----------
> > tools/perf/util/parse-events.c | 10 ++-
> > tools/perf/util/stat.c | 6 +-
> > 3 files changed, 86 insertions(+), 48 deletions(-)
> >
> > diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> > index 9d9dec21f510..2d2236400220 100644
> > --- a/tools/lib/perf/evlist.c
> > +++ b/tools/lib/perf/evlist.c
> > @@ -36,49 +36,87 @@ void perf_evlist__init(struct perf_evlist *evlist)
> > static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
> > struct perf_evsel *evsel)
> > {
> > - if (evsel->system_wide) {
> > - /* System wide: set the cpu map of the evsel to all online CPUs. */
> > - perf_cpu_map__put(evsel->cpus);
> > - evsel->cpus = perf_cpu_map__new_online_cpus();
> > - } else if (evlist->has_user_cpus && evsel->is_pmu_core) {
> > - /*
> > - * User requested CPUs on a core PMU, ensure the requested CPUs
> > - * are valid by intersecting with those of the PMU.
> > - */
> > + if (perf_cpu_map__is_empty(evsel->cpus)) {
> > + if (perf_cpu_map__is_empty(evsel->pmu_cpus)) {
> > + /*
> > + * Assume the unset PMU cpus were for a system-wide
> > + * event, like a software or tracepoint.
> > + */
> > + evsel->pmu_cpus = perf_cpu_map__new_online_cpus();
> > + }
> > + if (evlist->has_user_cpus && !evsel->system_wide) {
> > + /*
> > + * Use the user CPUs unless the evsel is set to be
> > + * system wide, such as the dummy event.
> > + */
> > + evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
> > + } else {
> > + /*
> > + * System wide and other modes, assume the cpu map
> > + * should be set to all PMU CPUs.
> > + */
> > + evsel->cpus = perf_cpu_map__get(evsel->pmu_cpus);
> > + }
> > + }
> > + /*
> > + * Avoid "any CPU"(-1) for uncore and PMUs that require a CPU, even if
> > + * requested.
> > + */
> > + if (evsel->requires_cpu && perf_cpu_map__has_any_cpu(evsel->cpus)) {
> > perf_cpu_map__put(evsel->cpus);
> > - evsel->cpus = perf_cpu_map__intersect(evlist->user_requested_cpus, evsel->pmu_cpus);
> > + evsel->cpus = perf_cpu_map__get(evsel->pmu_cpus);
> > + }
> >
> > - /*
> > - * Empty cpu lists would eventually get opened as "any" so remove
> > - * genuinely empty ones before they're opened in the wrong place.
> > - */
> > - if (perf_cpu_map__is_empty(evsel->cpus)) {
> > - struct perf_evsel *next = perf_evlist__next(evlist, evsel);
> > -
> > - perf_evlist__remove(evlist, evsel);
> > - /* Keep idx contiguous */
> > - if (next)
> > - list_for_each_entry_from(next, &evlist->entries, node)
> > - next->idx--;
> > + /*
> > + * Globally requested CPUs replace user requested unless the evsel is
> > + * set to be system wide.
> > + */
> > + if (evlist->has_user_cpus && !evsel->system_wide) {
> > + assert(!perf_cpu_map__has_any_cpu(evlist->user_requested_cpus));
> > + if (!perf_cpu_map__equal(evsel->cpus, evlist->user_requested_cpus)) {
> > + perf_cpu_map__put(evsel->cpus);
> > + evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
> > }
> > - } else if (!evsel->pmu_cpus || evlist->has_user_cpus ||
> > - (!evsel->requires_cpu && perf_cpu_map__has_any_cpu(evlist->user_requested_cpus))) {
> > - /*
> > - * The PMU didn't specify a default cpu map, this isn't a core
> > - * event and the user requested CPUs or the evlist user
> > - * requested CPUs have the "any CPU" (aka dummy) CPU value. In
> > - * which case use the user requested CPUs rather than the PMU
> > - * ones.
> > - */
> > + }
> > +
> > + /* Ensure cpus only references valid PMU CPUs. */
> > + if (!perf_cpu_map__has_any_cpu(evsel->cpus) &&
> > + !perf_cpu_map__is_subset(evsel->pmu_cpus, evsel->cpus)) {
> > + struct perf_cpu_map *tmp = perf_cpu_map__intersect(evsel->pmu_cpus, evsel->cpus);
> > +
> > perf_cpu_map__put(evsel->cpus);
> > - evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
> > - } else if (evsel->cpus != evsel->pmu_cpus) {
> > - /*
> > - * No user requested cpu map but the PMU cpu map doesn't match
> > - * the evsel's. Reset it back to the PMU cpu map.
> > - */
> > + evsel->cpus = tmp;
> > + }
> > +
> > + /*
> > + * Was event requested on all the PMU's CPUs but the user requested is
> > + * any CPU (-1)? If so switch to using any CPU (-1) to reduce the number
> > + * of events.
> > + */
> > + if (!evsel->system_wide &&
> > + perf_cpu_map__equal(evsel->cpus, evsel->pmu_cpus) &&
> > + perf_cpu_map__has_any_cpu(evlist->user_requested_cpus)) {
> > perf_cpu_map__put(evsel->cpus);
> > - evsel->cpus = perf_cpu_map__get(evsel->pmu_cpus);
> > + evsel->cpus = perf_cpu_map__get(evlist->user_requested_cpus);
> > + }
> > +
> > + /* Sanity check assert before the evsel is potentially removed. */
> > + assert(!evsel->requires_cpu || !perf_cpu_map__has_any_cpu(evsel->cpus));
> > +
> > + /*
> > + * Empty cpu lists would eventually get opened as "any" so remove
> > + * genuinely empty ones before they're opened in the wrong place.
> > + */
> > + if (perf_cpu_map__is_empty(evsel->cpus)) {
> > + struct perf_evsel *next = perf_evlist__next(evlist, evsel);
> > +
> > + perf_evlist__remove(evlist, evsel);
> > + /* Keep idx contiguous */
> > + if (next)
> > + list_for_each_entry_from(next, &evlist->entries, node)
> > + next->idx--;
> > +
> > + return;
> > }
> >
> > if (evsel->system_wide) {
> > @@ -98,6 +136,10 @@ static void perf_evlist__propagate_maps(struct perf_evlist *evlist)
> >
> > evlist->needs_map_propagation = true;
> >
> > + /* Clear the all_cpus set which will be merged into during propagation. */
> > + perf_cpu_map__put(evlist->all_cpus);
> > + evlist->all_cpus = NULL;
> > +
> > list_for_each_entry_safe(evsel, n, &evlist->entries, node)
> > __perf_evlist__propagate_maps(evlist, evsel);
> > }
> > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> > index 4092a43aa84e..0ff7ae75d8f9 100644
> > --- a/tools/perf/util/parse-events.c
> > +++ b/tools/perf/util/parse-events.c
> > @@ -313,20 +313,18 @@ __add_event(struct list_head *list, int *idx,
> > if (pmu) {
> > is_pmu_core = pmu->is_core;
> > pmu_cpus = perf_cpu_map__get(pmu->cpus);
> > + if (perf_cpu_map__is_empty(pmu_cpus))
> > + pmu_cpus = cpu_map__online();
> > } else {
> > is_pmu_core = (attr->type == PERF_TYPE_HARDWARE ||
> > attr->type == PERF_TYPE_HW_CACHE);
> > pmu_cpus = is_pmu_core ? cpu_map__online() : NULL;
> > }
> >
> > - if (has_user_cpus) {
> > + if (has_user_cpus)
> > cpus = perf_cpu_map__get(user_cpus);
> > - /* Existing behavior that pmu_cpus matches the given user ones. */
> > - perf_cpu_map__put(pmu_cpus);
> > - pmu_cpus = perf_cpu_map__get(user_cpus);
> > - } else {
> > + else
> > cpus = perf_cpu_map__get(pmu_cpus);
> > - }
> >
> > if (init_attr)
> > event_attr_init(attr);
> > diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
> > index 355a7d5c8ab8..8d3bcdb69d37 100644
> > --- a/tools/perf/util/stat.c
> > +++ b/tools/perf/util/stat.c
> > @@ -769,8 +769,6 @@ int create_perf_stat_counter(struct evsel *evsel,
> > attr->enable_on_exec = 1;
> > }
> >
> > - if (target__has_cpu(target) && !target__has_per_thread(target))
> > - return evsel__open_per_cpu(evsel, evsel__cpus(evsel), cpu_map_idx);
> > -
> > - return evsel__open_per_thread(evsel, evsel->core.threads);
> > + return evsel__open_per_cpu_and_thread(evsel, evsel__cpus(evsel), cpu_map_idx,
> > + evsel->core.threads);
> > }
> > --
> > 2.50.0.727.gbf7dc18ff4-goog
> >
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 04/12] perf parse-events: Allow the cpu term to be a PMU
2025-07-16 20:25 ` Ian Rogers
@ 2025-07-18 17:56 ` Namhyung Kim
0 siblings, 0 replies; 22+ messages in thread
From: Namhyung Kim @ 2025-07-18 17:56 UTC (permalink / raw)
To: Ian Rogers
Cc: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
Jiri Olsa, Adrian Hunter, Kan Liang, Ben Gainey, James Clark,
Howard Chu, Weilin Wang, Levi Yun, Dr. David Alan Gilbert,
Zhongqiu Han, Blake Jones, Yicong Yang, Anubhav Shelat,
Thomas Richter, Jean-Philippe Romain, Song Liu, linux-perf-users,
linux-kernel
Hi Ian,
On Wed, Jul 16, 2025 at 01:25:17PM -0700, Ian Rogers wrote:
> On Wed, Jul 16, 2025 at 1:09 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, Jun 27, 2025 at 12:24:09PM -0700, Ian Rogers wrote:
> > > On hybrid systems, events like msr/tsc/ will aggregate counts across
> > > all CPUs. Often metrics only want a value like msr/tsc/ for the cores
> > > on which the metric is being computed. Listing each CPU with terms
> > > cpu=0,cpu=1.. is laborious and would need to be encoded for all
> > > variations of a CPU model.
> > >
> > > Allow the cpumask from a PMU to be an argument to the cpu term. For
> > > example in the following the cpumask of the cstate_pkg PMU selects the
> > > CPUs to count msr/tsc/ counter upon:
> > > ```
> > > $ cat /sys/bus/event_source/devices/cstate_pkg/cpumask
> > > 0
> > > $ perf stat -A -e 'msr/tsc,cpu=cstate_pkg/' -a sleep 0.1
> >
> > It can be confusing if 'cpu' takes a number or a PMU name. What about
> > adding a new term (maybe 'cpu_from') to handle this case?
>
> So it is possible for terms to be defined in sysfs in the 'format/' folder:
> ```
> $ ls /sys/bus/event_source/devices/cpu_core/format/
> cmask edge event frontend inv ldlat offcore_rsp pc umask
> ```
> By not introducing a new term we leave 'cpu_from' open for use in this
> way. When I spoke to Kan we thought using the existing term made sense
> and fits the idea of leaving things open for the kernel/drivers to
> use. It is possible to add a new term though. Let me know and I can
> update the patch and documentation accordingly.
Oh, you thought about this already. It's true that it's possible to
clash with PMU formats in sysfs unless we have a separate namespace for
tools somehow. But that would add (maybe unnecessary) complexity.
So I'm not against with this change. I just wanted to ring an alarm for
potential issues. Up to you. :)
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
` (12 preceding siblings ...)
2025-07-15 19:55 ` [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
@ 2025-07-21 16:13 ` James Clark
2025-07-21 17:44 ` Ian Rogers
13 siblings, 1 reply; 22+ messages in thread
From: James Clark @ 2025-07-21 16:13 UTC (permalink / raw)
To: Ian Rogers
Cc: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang,
Ben Gainey, Howard Chu, Weilin Wang, Levi Yun,
Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones, Yicong Yang,
Anubhav Shelat, Thomas Richter, Jean-Philippe Romain, Song Liu,
linux-perf-users, linux-kernel
On 27/06/2025 8:24 pm, Ian Rogers wrote:
> On hybrid systems some PMUs apply to all core types, particularly for
> metrics the msr PMU and the tsc event. The metrics often only want the
> values of the counter for their specific core type. These patches
> allow the cpu term in an event to give a PMU name to take the cpumask
> from. For example:
>
> $ perf stat -e msr/tsc,cpu=cpu_atom/ ...
>
> will aggregate the msr/tsc/ value but only for atom cores. In doing
> this problems were identified in how cpumasks are handled by parsing
> and event setup when cpumasks are specified along with a task to
> profile. The event parsing, cpumask evlist propagation code and perf
> stat code are updated accordingly.
>
> The final result of the patch series is to be able to run:
> ```
> $ perf stat --no-scale -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10
> 10.1: Basic parsing test : Ok
> 10.2: Parsing without PMU name : Ok
> 10.3: Parsing with PMU name : Ok
>
> Performance counter stats for 'perf test -F 10':
>
> 63,704,975 msr/tsc/
> 47,060,704 msr/tsc,cpu=cpu_core/ (4.62%)
> 16,640,591 msr/tsc,cpu=cpu_atom/ (2.18%)
> ```
>
> This has (further) identified a kernel bug for task events around the
> enabled time being too large leading to invalid scaling (hence the
> --no-scale in the command line above).
>
> Ian Rogers (12):
> perf parse-events: Warn if a cpu term is unsupported by a CPU
> perf stat: Avoid buffer overflow to the aggregation map
> perf stat: Don't size aggregation ids from user_requested_cpus
> perf parse-events: Allow the cpu term to be a PMU
> perf tool_pmu: Allow num_cpus(_online) to be specific to a cpumask
> libperf evsel: Rename own_cpus to pmu_cpus
> libperf evsel: Factor perf_evsel__exit out of perf_evsel__delete
> perf evsel: Use libperf perf_evsel__exit
> perf pmus: Factor perf_pmus__find_by_attr out of evsel__find_pmu
> perf parse-events: Minor __add_event refactoring
> perf evsel: Add evsel__open_per_cpu_and_thread
> perf parse-events: Support user CPUs mixed with threads/processes
>
> tools/lib/perf/evlist.c | 118 ++++++++++++++++--------
> tools/lib/perf/evsel.c | 9 +-
> tools/lib/perf/include/internal/evsel.h | 3 +-
> tools/perf/builtin-stat.c | 9 +-
> tools/perf/tests/event_update.c | 4 +-
> tools/perf/util/evlist.c | 15 +--
> tools/perf/util/evsel.c | 55 +++++++++--
> tools/perf/util/evsel.h | 5 +
> tools/perf/util/expr.c | 2 +-
> tools/perf/util/header.c | 4 +-
> tools/perf/util/parse-events.c | 102 ++++++++++++++------
> tools/perf/util/pmus.c | 29 +++---
> tools/perf/util/pmus.h | 2 +
> tools/perf/util/stat.c | 6 +-
> tools/perf/util/synthetic-events.c | 4 +-
> tools/perf/util/tool_pmu.c | 56 +++++++++--
> tools/perf/util/tool_pmu.h | 2 +-
> 17 files changed, 297 insertions(+), 128 deletions(-)
>
Tested-by: James Clark <james.clark@linaro.org>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid
2025-07-21 16:13 ` James Clark
@ 2025-07-21 17:44 ` Ian Rogers
0 siblings, 0 replies; 22+ messages in thread
From: Ian Rogers @ 2025-07-21 17:44 UTC (permalink / raw)
To: James Clark
Cc: Thomas Falcon, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Adrian Hunter, Kan Liang,
Ben Gainey, Howard Chu, Weilin Wang, Levi Yun,
Dr. David Alan Gilbert, Zhongqiu Han, Blake Jones, Yicong Yang,
Anubhav Shelat, Thomas Richter, Jean-Philippe Romain, Song Liu,
linux-perf-users, linux-kernel
On Mon, Jul 21, 2025 at 9:13 AM James Clark <james.clark@linaro.org> wrote:
>
>
>
> On 27/06/2025 8:24 pm, Ian Rogers wrote:
> > On hybrid systems some PMUs apply to all core types, particularly for
> > metrics the msr PMU and the tsc event. The metrics often only want the
> > values of the counter for their specific core type. These patches
> > allow the cpu term in an event to give a PMU name to take the cpumask
> > from. For example:
> >
> > $ perf stat -e msr/tsc,cpu=cpu_atom/ ...
> >
> > will aggregate the msr/tsc/ value but only for atom cores. In doing
> > this problems were identified in how cpumasks are handled by parsing
> > and event setup when cpumasks are specified along with a task to
> > profile. The event parsing, cpumask evlist propagation code and perf
> > stat code are updated accordingly.
> >
> > The final result of the patch series is to be able to run:
> > ```
> > $ perf stat --no-scale -e 'msr/tsc/,msr/tsc,cpu=cpu_core/,msr/tsc,cpu=cpu_atom/' perf test -F 10
> > 10.1: Basic parsing test : Ok
> > 10.2: Parsing without PMU name : Ok
> > 10.3: Parsing with PMU name : Ok
> >
> > Performance counter stats for 'perf test -F 10':
> >
> > 63,704,975 msr/tsc/
> > 47,060,704 msr/tsc,cpu=cpu_core/ (4.62%)
> > 16,640,591 msr/tsc,cpu=cpu_atom/ (2.18%)
> > ```
> >
> > This has (further) identified a kernel bug for task events around the
> > enabled time being too large leading to invalid scaling (hence the
> > --no-scale in the command line above).
> >
> > Ian Rogers (12):
> > perf parse-events: Warn if a cpu term is unsupported by a CPU
> > perf stat: Avoid buffer overflow to the aggregation map
> > perf stat: Don't size aggregation ids from user_requested_cpus
> > perf parse-events: Allow the cpu term to be a PMU
> > perf tool_pmu: Allow num_cpus(_online) to be specific to a cpumask
> > libperf evsel: Rename own_cpus to pmu_cpus
> > libperf evsel: Factor perf_evsel__exit out of perf_evsel__delete
> > perf evsel: Use libperf perf_evsel__exit
> > perf pmus: Factor perf_pmus__find_by_attr out of evsel__find_pmu
> > perf parse-events: Minor __add_event refactoring
> > perf evsel: Add evsel__open_per_cpu_and_thread
> > perf parse-events: Support user CPUs mixed with threads/processes
> >
> > tools/lib/perf/evlist.c | 118 ++++++++++++++++--------
> > tools/lib/perf/evsel.c | 9 +-
> > tools/lib/perf/include/internal/evsel.h | 3 +-
> > tools/perf/builtin-stat.c | 9 +-
> > tools/perf/tests/event_update.c | 4 +-
> > tools/perf/util/evlist.c | 15 +--
> > tools/perf/util/evsel.c | 55 +++++++++--
> > tools/perf/util/evsel.h | 5 +
> > tools/perf/util/expr.c | 2 +-
> > tools/perf/util/header.c | 4 +-
> > tools/perf/util/parse-events.c | 102 ++++++++++++++------
> > tools/perf/util/pmus.c | 29 +++---
> > tools/perf/util/pmus.h | 2 +
> > tools/perf/util/stat.c | 6 +-
> > tools/perf/util/synthetic-events.c | 4 +-
> > tools/perf/util/tool_pmu.c | 56 +++++++++--
> > tools/perf/util/tool_pmu.h | 2 +-
> > 17 files changed, 297 insertions(+), 128 deletions(-)
> >
>
> Tested-by: James Clark <james.clark@linaro.org>
Much appreciated, thanks James!
There's a v2 patch set but the Tested-by will be good for the majority
of patches that are unchanged in that:
https://lore.kernel.org/lkml/20250717210233.1143622-1-irogers@google.com/
I'm of course interested in getting RFC feedback on:
https://lore.kernel.org/lkml/20250716223924.825772-1-irogers@google.com/
which introduces an extra state to avoid gathering enabled time on
CPUs an event can't run on.
Thanks,
Ian
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2025-07-21 17:44 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-06-27 19:24 [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
2025-06-27 19:24 ` [PATCH v1 01/12] perf parse-events: Warn if a cpu term is unsupported by a CPU Ian Rogers
2025-06-27 19:24 ` [PATCH v1 02/12] perf stat: Avoid buffer overflow to the aggregation map Ian Rogers
2025-06-27 19:24 ` [PATCH v1 03/12] perf stat: Don't size aggregation ids from user_requested_cpus Ian Rogers
2025-06-27 19:24 ` [PATCH v1 04/12] perf parse-events: Allow the cpu term to be a PMU Ian Rogers
2025-07-16 20:09 ` Namhyung Kim
2025-07-16 20:25 ` Ian Rogers
2025-07-18 17:56 ` Namhyung Kim
2025-06-27 19:24 ` [PATCH v1 05/12] perf tool_pmu: Allow num_cpus(_online) to be specific to a cpumask Ian Rogers
2025-06-27 19:24 ` [PATCH v1 06/12] libperf evsel: Rename own_cpus to pmu_cpus Ian Rogers
2025-06-27 19:24 ` [PATCH v1 07/12] libperf evsel: Factor perf_evsel__exit out of perf_evsel__delete Ian Rogers
2025-06-27 19:24 ` [PATCH v1 08/12] perf evsel: Use libperf perf_evsel__exit Ian Rogers
2025-06-27 19:24 ` [PATCH v1 09/12] perf pmus: Factor perf_pmus__find_by_attr out of evsel__find_pmu Ian Rogers
2025-06-27 19:24 ` [PATCH v1 10/12] perf parse-events: Minor __add_event refactoring Ian Rogers
2025-06-27 19:24 ` [PATCH v1 11/12] perf evsel: Add evsel__open_per_cpu_and_thread Ian Rogers
2025-06-27 19:24 ` [PATCH v1 12/12] perf parse-events: Support user CPUs mixed with threads/processes Ian Rogers
2025-07-16 20:28 ` Namhyung Kim
2025-07-17 0:04 ` Ian Rogers
2025-07-15 19:55 ` [PATCH v1 00/12] CPU mask improvements/fixes particularly for hybrid Ian Rogers
2025-07-16 20:03 ` Falcon, Thomas
2025-07-21 16:13 ` James Clark
2025-07-21 17:44 ` Ian Rogers
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).