[PATCH] perf tools topdown: Fix incorrect topdown slots regroup

linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] perf tools topdown: Fix incorrect topdown slots regroup
@ 2025-08-22  8:22 Dapeng Mi
  2025-08-22 17:35 ` Ian Rogers
  0 siblings, 1 reply; 6+ messages in thread
From: Dapeng Mi @ 2025-08-22  8:22 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Kan Liang
  Cc: linux-perf-users, linux-kernel, Dapeng Mi, Dapeng Mi, Xudong Hao

When running the command "perf stat -e "slots,slots" -C0 sleep 1", we
see below error.

perf stat -e "slots,slots" -C0 sleep 1
WARNING: events were regrouped to match PMUs
 Performance counter stats for 'CPU(s) 0':
     <not counted>      slots
   <not supported>      slots

     1.002265769 seconds time elapsed

The topdown slots events are not correctly counted. The root cause is
that perf tools incorrectly regroup these 2 slots events into a group.
If there are only topdown slots events but no topdown metrics events,
the regroup should not be done since topdown slots event can only be
programed on fixed counter 3 and multiple slots events can only be
multiplexed to run on fixed counter 3, but grouping them blocks
multiplexing.

So avoid to regroup topdown slots events if there is no topdown metrics
events.

With this change, above command can be run successfully.

perf stat -e "slots,slots" -C0 sleep 1
 Performance counter stats for 'CPU(s) 0':
       103,973,791      slots
       106,488,170      slots

       1.003517284 seconds time elapsed

Besides, run perf stats/record test on SPR and PTL, both passed.

Reported-by: Xudong Hao <xudong.hao@intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 tools/perf/arch/x86/util/topdown.h |  2 --
 tools/perf/util/evsel.c            | 10 ++++++++++
 tools/perf/util/evsel.h            |  2 ++
 tools/perf/util/parse-events.c     | 11 +++++++++++
 4 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/tools/perf/arch/x86/util/topdown.h b/tools/perf/arch/x86/util/topdown.h
index 69035565e649..6a917b2066f7 100644
--- a/tools/perf/arch/x86/util/topdown.h
+++ b/tools/perf/arch/x86/util/topdown.h
@@ -8,8 +8,6 @@ struct evsel;
 struct list_head;
 
 bool topdown_sys_has_perf_metrics(void);
-bool arch_is_topdown_slots(const struct evsel *evsel);
-bool arch_is_topdown_metrics(const struct evsel *evsel);
 int topdown_insert_slots_event(struct list_head *list, int idx, struct evsel *metric_event);
 
 #endif
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index d264c143b592..6aaae1ac026e 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -3965,6 +3965,16 @@ int evsel__source_count(const struct evsel *evsel)
 	return count;
 }
 
+bool __weak arch_is_topdown_slots(const struct evsel *evsel __maybe_unused)
+{
+	return false;
+}
+
+bool __weak arch_is_topdown_metrics(const struct evsel *evsel __maybe_unused)
+{
+	return false;
+}
+
 bool __weak arch_evsel__must_be_in_group(const struct evsel *evsel __maybe_unused)
 {
 	return false;
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 5797a02e5d6a..33f8aab675a9 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -556,6 +556,8 @@ void evsel__set_leader(struct evsel *evsel, struct evsel *leader);
 int evsel__source_count(const struct evsel *evsel);
 void evsel__remove_from_group(struct evsel *evsel, struct evsel *leader);
 
+bool arch_is_topdown_slots(const struct evsel *evsel);
+bool arch_is_topdown_metrics(const struct evsel *evsel);
 bool arch_evsel__must_be_in_group(const struct evsel *evsel);
 
 bool evsel__set_needs_uniquify(struct evsel *counter, const struct perf_stat_config *config);
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 8282ddf68b98..bd09fc47ea90 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -2127,6 +2127,8 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
 	int ret;
 	struct evsel *force_grouped_leader = NULL;
 	bool last_event_was_forced_leader = false;
+	bool has_slots = false;
+	bool has_metrics = false;
 
 	/* On x86 topdown metrics events require a slots event. */
 	ret = arch_evlist__add_required_events(list);
@@ -2147,6 +2149,11 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
 		if (pos == pos_leader)
 			orig_num_leaders++;
 
+		if (!has_slots)
+			has_slots = arch_is_topdown_slots(pos);
+		if (!has_metrics)
+			has_metrics = arch_is_topdown_metrics(pos);
+
 		/*
 		 * Ensure indexes are sequential, in particular for multiple
 		 * event lists being merged. The indexes are used to detect when
@@ -2163,6 +2170,10 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
 			force_grouped_idx = pos_leader->core.idx;
 	}
 
+	/* Don't regroup if there are only topdown slots events. */
+	if (force_grouped_idx != -1 && has_slots && !has_metrics)
+		force_grouped_idx = -1;
+
 	/* Sort events. */
 	list_sort(&force_grouped_idx, list, evlist__cmp);
 

base-commit: 6235ce77749f45cac27f630337e2fdf04e8a6c73
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf tools topdown: Fix incorrect topdown slots regroup
  2025-08-22  8:22 [PATCH] perf tools topdown: Fix incorrect topdown slots regroup Dapeng Mi
@ 2025-08-22 17:35 ` Ian Rogers
  2025-08-25  0:54   ` Mi, Dapeng
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Rogers @ 2025-08-22 17:35 UTC (permalink / raw)
  To: Dapeng Mi
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Adrian Hunter, Alexander Shishkin, Kan Liang,
	linux-perf-users, linux-kernel, Dapeng Mi, Xudong Hao

On Fri, Aug 22, 2025 at 1:23 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>
> When running the command "perf stat -e "slots,slots" -C0 sleep 1", we
> see below error.
>
> perf stat -e "slots,slots" -C0 sleep 1
> WARNING: events were regrouped to match PMUs
>  Performance counter stats for 'CPU(s) 0':
>      <not counted>      slots
>    <not supported>      slots
>
>      1.002265769 seconds time elapsed
>
> The topdown slots events are not correctly counted. The root cause is
> that perf tools incorrectly regroup these 2 slots events into a group.
> If there are only topdown slots events but no topdown metrics events,
> the regroup should not be done since topdown slots event can only be
> programed on fixed counter 3 and multiple slots events can only be
> multiplexed to run on fixed counter 3, but grouping them blocks
> multiplexing.
>
> So avoid to regroup topdown slots events if there is no topdown metrics
> events.
>
> With this change, above command can be run successfully.
>
> perf stat -e "slots,slots" -C0 sleep 1
>  Performance counter stats for 'CPU(s) 0':
>        103,973,791      slots
>        106,488,170      slots
>
>        1.003517284 seconds time elapsed
>
> Besides, run perf stats/record test on SPR and PTL, both passed.
>
> Reported-by: Xudong Hao <xudong.hao@intel.com>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>

I don't think we should do this and if we were to do it we shouldn't
do it in the common code. The perf metrics requiring a slots event is
a massive mess that never seems to end. What should we do with:
```
$ perf stat -e "topdown-fe-bound,topdown-fe-bound" true

Performance counter stats for 'true':

    <not counted>      slots
    <not counted>      topdown-fe-bound
  <not supported>      topdown-fe-bound

      0.000960472 seconds time elapsed

      0.001060000 seconds user
      0.000000000 seconds sys


Some events weren't counted. Try disabling the NMI watchdog:
       echo 0 > /proc/sys/kernel/nmi_watchdog
       perf stat ...
       echo 1 > /proc/sys/kernel/nmi_watchdog
```

We've injected the slots event but the repeated topdown-fe-bound
causes the group to fail in a similar way. Why is repeating slots a
case we care about more?
Were we to say, okay slots is more special than the other perf metric
events then I'd prefer arch_evsel__must_be_in_group to return false
for the slots event when there are no other perf metric events in the
evlist. But then what do you do if the slots event is in a different
group like:
```
$ perf stat -e "slots,{slots,topdown-fe-bound}" true
```
It is pretty easy to teach the code to deduplicate events, but then
again, what about the group behavior?
It is not clear to me we can ever clean up all the mess that the perf
metric events on p-cores create and I'm in favor of having the code be
no more complex than it needs to be, so I'm not sure we should be
solving this problem.

Thanks,
Ian

> ---
>  tools/perf/arch/x86/util/topdown.h |  2 --
>  tools/perf/util/evsel.c            | 10 ++++++++++
>  tools/perf/util/evsel.h            |  2 ++
>  tools/perf/util/parse-events.c     | 11 +++++++++++
>  4 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/arch/x86/util/topdown.h b/tools/perf/arch/x86/util/topdown.h
> index 69035565e649..6a917b2066f7 100644
> --- a/tools/perf/arch/x86/util/topdown.h
> +++ b/tools/perf/arch/x86/util/topdown.h
> @@ -8,8 +8,6 @@ struct evsel;
>  struct list_head;
>
>  bool topdown_sys_has_perf_metrics(void);
> -bool arch_is_topdown_slots(const struct evsel *evsel);
> -bool arch_is_topdown_metrics(const struct evsel *evsel);
>  int topdown_insert_slots_event(struct list_head *list, int idx, struct evsel *metric_event);
>
>  #endif
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index d264c143b592..6aaae1ac026e 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -3965,6 +3965,16 @@ int evsel__source_count(const struct evsel *evsel)
>         return count;
>  }
>
> +bool __weak arch_is_topdown_slots(const struct evsel *evsel __maybe_unused)
> +{
> +       return false;
> +}
> +
> +bool __weak arch_is_topdown_metrics(const struct evsel *evsel __maybe_unused)
> +{
> +       return false;
> +}
> +
>  bool __weak arch_evsel__must_be_in_group(const struct evsel *evsel __maybe_unused)
>  {
>         return false;
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index 5797a02e5d6a..33f8aab675a9 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -556,6 +556,8 @@ void evsel__set_leader(struct evsel *evsel, struct evsel *leader);
>  int evsel__source_count(const struct evsel *evsel);
>  void evsel__remove_from_group(struct evsel *evsel, struct evsel *leader);
>
> +bool arch_is_topdown_slots(const struct evsel *evsel);
> +bool arch_is_topdown_metrics(const struct evsel *evsel);
>  bool arch_evsel__must_be_in_group(const struct evsel *evsel);
>
>  bool evsel__set_needs_uniquify(struct evsel *counter, const struct perf_stat_config *config);
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 8282ddf68b98..bd09fc47ea90 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -2127,6 +2127,8 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
>         int ret;
>         struct evsel *force_grouped_leader = NULL;
>         bool last_event_was_forced_leader = false;
> +       bool has_slots = false;
> +       bool has_metrics = false;
>
>         /* On x86 topdown metrics events require a slots event. */
>         ret = arch_evlist__add_required_events(list);
> @@ -2147,6 +2149,11 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
>                 if (pos == pos_leader)
>                         orig_num_leaders++;
>
> +               if (!has_slots)
> +                       has_slots = arch_is_topdown_slots(pos);
> +               if (!has_metrics)
> +                       has_metrics = arch_is_topdown_metrics(pos);
> +
>                 /*
>                  * Ensure indexes are sequential, in particular for multiple
>                  * event lists being merged. The indexes are used to detect when
> @@ -2163,6 +2170,10 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
>                         force_grouped_idx = pos_leader->core.idx;
>         }
>
> +       /* Don't regroup if there are only topdown slots events. */
> +       if (force_grouped_idx != -1 && has_slots && !has_metrics)
> +               force_grouped_idx = -1;
> +
>         /* Sort events. */
>         list_sort(&force_grouped_idx, list, evlist__cmp);
>
>
> base-commit: 6235ce77749f45cac27f630337e2fdf04e8a6c73
> --
> 2.34.1
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf tools topdown: Fix incorrect topdown slots regroup
  2025-08-22 17:35 ` Ian Rogers
@ 2025-08-25  0:54   ` Mi, Dapeng
  2025-08-25 15:29     ` Ian Rogers
  0 siblings, 1 reply; 6+ messages in thread
From: Mi, Dapeng @ 2025-08-25  0:54 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Adrian Hunter, Alexander Shishkin, Kan Liang,
	linux-perf-users, linux-kernel, Dapeng Mi, Xudong Hao


On 8/23/2025 1:35 AM, Ian Rogers wrote:
> On Fri, Aug 22, 2025 at 1:23 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
>> When running the command "perf stat -e "slots,slots" -C0 sleep 1", we
>> see below error.
>>
>> perf stat -e "slots,slots" -C0 sleep 1
>> WARNING: events were regrouped to match PMUs
>>  Performance counter stats for 'CPU(s) 0':
>>      <not counted>      slots
>>    <not supported>      slots
>>
>>      1.002265769 seconds time elapsed
>>
>> The topdown slots events are not correctly counted. The root cause is
>> that perf tools incorrectly regroup these 2 slots events into a group.
>> If there are only topdown slots events but no topdown metrics events,
>> the regroup should not be done since topdown slots event can only be
>> programed on fixed counter 3 and multiple slots events can only be
>> multiplexed to run on fixed counter 3, but grouping them blocks
>> multiplexing.
>>
>> So avoid to regroup topdown slots events if there is no topdown metrics
>> events.
>>
>> With this change, above command can be run successfully.
>>
>> perf stat -e "slots,slots" -C0 sleep 1
>>  Performance counter stats for 'CPU(s) 0':
>>        103,973,791      slots
>>        106,488,170      slots
>>
>>        1.003517284 seconds time elapsed
>>
>> Besides, run perf stats/record test on SPR and PTL, both passed.
>>
>> Reported-by: Xudong Hao <xudong.hao@intel.com>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> I don't think we should do this and if we were to do it we shouldn't
> do it in the common code. The perf metrics requiring a slots event is
> a massive mess that never seems to end. What should we do with:
> ```
> $ perf stat -e "topdown-fe-bound,topdown-fe-bound" true
>
> Performance counter stats for 'true':
>
>     <not counted>      slots
>     <not counted>      topdown-fe-bound
>   <not supported>      topdown-fe-bound
>
>       0.000960472 seconds time elapsed
>
>       0.001060000 seconds user
>       0.000000000 seconds sys
>
>
> Some events weren't counted. Try disabling the NMI watchdog:
>        echo 0 > /proc/sys/kernel/nmi_watchdog
>        perf stat ...
>        echo 1 > /proc/sys/kernel/nmi_watchdog
> ```
>
> We've injected the slots event but the repeated topdown-fe-bound
> causes the group to fail in a similar way. Why is repeating slots a
> case we care about more?
> Were we to say, okay slots is more special than the other perf metric
> events then I'd prefer arch_evsel__must_be_in_group to return false
> for the slots event when there are no other perf metric events in the
> evlist. But then what do you do if the slots event is in a different
> group like:
> ```
> $ perf stat -e "slots,{slots,topdown-fe-bound}" true
> ```
> It is pretty easy to teach the code to deduplicate events, but then
> again, what about the group behavior?
> It is not clear to me we can ever clean up all the mess that the perf
> metric events on p-cores create and I'm in favor of having the code be
> no more complex than it needs to be, so I'm not sure we should be
> solving this problem.

Ian, thanks for reviewing. Yeah, I agree what you said "topdown events are
a massive mess", we could never solve all issues. But it's annoying for
users that it reports "not counted" instead of an error. Is it possible
that we take a step back and don't try to resolve this issue and just
report an error (maybe plus a message) to warn users that multiple same
topdown events are not allowed? Thanks. 


>
> Thanks,
> Ian
>
>> ---
>>  tools/perf/arch/x86/util/topdown.h |  2 --
>>  tools/perf/util/evsel.c            | 10 ++++++++++
>>  tools/perf/util/evsel.h            |  2 ++
>>  tools/perf/util/parse-events.c     | 11 +++++++++++
>>  4 files changed, 23 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/arch/x86/util/topdown.h b/tools/perf/arch/x86/util/topdown.h
>> index 69035565e649..6a917b2066f7 100644
>> --- a/tools/perf/arch/x86/util/topdown.h
>> +++ b/tools/perf/arch/x86/util/topdown.h
>> @@ -8,8 +8,6 @@ struct evsel;
>>  struct list_head;
>>
>>  bool topdown_sys_has_perf_metrics(void);
>> -bool arch_is_topdown_slots(const struct evsel *evsel);
>> -bool arch_is_topdown_metrics(const struct evsel *evsel);
>>  int topdown_insert_slots_event(struct list_head *list, int idx, struct evsel *metric_event);
>>
>>  #endif
>> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
>> index d264c143b592..6aaae1ac026e 100644
>> --- a/tools/perf/util/evsel.c
>> +++ b/tools/perf/util/evsel.c
>> @@ -3965,6 +3965,16 @@ int evsel__source_count(const struct evsel *evsel)
>>         return count;
>>  }
>>
>> +bool __weak arch_is_topdown_slots(const struct evsel *evsel __maybe_unused)
>> +{
>> +       return false;
>> +}
>> +
>> +bool __weak arch_is_topdown_metrics(const struct evsel *evsel __maybe_unused)
>> +{
>> +       return false;
>> +}
>> +
>>  bool __weak arch_evsel__must_be_in_group(const struct evsel *evsel __maybe_unused)
>>  {
>>         return false;
>> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
>> index 5797a02e5d6a..33f8aab675a9 100644
>> --- a/tools/perf/util/evsel.h
>> +++ b/tools/perf/util/evsel.h
>> @@ -556,6 +556,8 @@ void evsel__set_leader(struct evsel *evsel, struct evsel *leader);
>>  int evsel__source_count(const struct evsel *evsel);
>>  void evsel__remove_from_group(struct evsel *evsel, struct evsel *leader);
>>
>> +bool arch_is_topdown_slots(const struct evsel *evsel);
>> +bool arch_is_topdown_metrics(const struct evsel *evsel);
>>  bool arch_evsel__must_be_in_group(const struct evsel *evsel);
>>
>>  bool evsel__set_needs_uniquify(struct evsel *counter, const struct perf_stat_config *config);
>> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
>> index 8282ddf68b98..bd09fc47ea90 100644
>> --- a/tools/perf/util/parse-events.c
>> +++ b/tools/perf/util/parse-events.c
>> @@ -2127,6 +2127,8 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
>>         int ret;
>>         struct evsel *force_grouped_leader = NULL;
>>         bool last_event_was_forced_leader = false;
>> +       bool has_slots = false;
>> +       bool has_metrics = false;
>>
>>         /* On x86 topdown metrics events require a slots event. */
>>         ret = arch_evlist__add_required_events(list);
>> @@ -2147,6 +2149,11 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
>>                 if (pos == pos_leader)
>>                         orig_num_leaders++;
>>
>> +               if (!has_slots)
>> +                       has_slots = arch_is_topdown_slots(pos);
>> +               if (!has_metrics)
>> +                       has_metrics = arch_is_topdown_metrics(pos);
>> +
>>                 /*
>>                  * Ensure indexes are sequential, in particular for multiple
>>                  * event lists being merged. The indexes are used to detect when
>> @@ -2163,6 +2170,10 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
>>                         force_grouped_idx = pos_leader->core.idx;
>>         }
>>
>> +       /* Don't regroup if there are only topdown slots events. */
>> +       if (force_grouped_idx != -1 && has_slots && !has_metrics)
>> +               force_grouped_idx = -1;
>> +
>>         /* Sort events. */
>>         list_sort(&force_grouped_idx, list, evlist__cmp);
>>
>>
>> base-commit: 6235ce77749f45cac27f630337e2fdf04e8a6c73
>> --
>> 2.34.1
>>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf tools topdown: Fix incorrect topdown slots regroup
  2025-08-25  0:54   ` Mi, Dapeng
@ 2025-08-25 15:29     ` Ian Rogers
  2025-08-25 15:54       ` Ian Rogers
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Rogers @ 2025-08-25 15:29 UTC (permalink / raw)
  To: Mi, Dapeng
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Adrian Hunter, Alexander Shishkin, Kan Liang,
	linux-perf-users, linux-kernel, Dapeng Mi, Xudong Hao

On Sun, Aug 24, 2025 at 5:54 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>
>
> On 8/23/2025 1:35 AM, Ian Rogers wrote:
> > On Fri, Aug 22, 2025 at 1:23 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
> >> When running the command "perf stat -e "slots,slots" -C0 sleep 1", we
> >> see below error.
> >>
> >> perf stat -e "slots,slots" -C0 sleep 1
> >> WARNING: events were regrouped to match PMUs
> >>  Performance counter stats for 'CPU(s) 0':
> >>      <not counted>      slots
> >>    <not supported>      slots
> >>
> >>      1.002265769 seconds time elapsed
> >>
> >> The topdown slots events are not correctly counted. The root cause is
> >> that perf tools incorrectly regroup these 2 slots events into a group.
> >> If there are only topdown slots events but no topdown metrics events,
> >> the regroup should not be done since topdown slots event can only be
> >> programed on fixed counter 3 and multiple slots events can only be
> >> multiplexed to run on fixed counter 3, but grouping them blocks
> >> multiplexing.
> >>
> >> So avoid to regroup topdown slots events if there is no topdown metrics
> >> events.
> >>
> >> With this change, above command can be run successfully.
> >>
> >> perf stat -e "slots,slots" -C0 sleep 1
> >>  Performance counter stats for 'CPU(s) 0':
> >>        103,973,791      slots
> >>        106,488,170      slots
> >>
> >>        1.003517284 seconds time elapsed
> >>
> >> Besides, run perf stats/record test on SPR and PTL, both passed.
> >>
> >> Reported-by: Xudong Hao <xudong.hao@intel.com>
> >> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> > I don't think we should do this and if we were to do it we shouldn't
> > do it in the common code. The perf metrics requiring a slots event is
> > a massive mess that never seems to end. What should we do with:
> > ```
> > $ perf stat -e "topdown-fe-bound,topdown-fe-bound" true
> >
> > Performance counter stats for 'true':
> >
> >     <not counted>      slots
> >     <not counted>      topdown-fe-bound
> >   <not supported>      topdown-fe-bound
> >
> >       0.000960472 seconds time elapsed
> >
> >       0.001060000 seconds user
> >       0.000000000 seconds sys
> >
> >
> > Some events weren't counted. Try disabling the NMI watchdog:
> >        echo 0 > /proc/sys/kernel/nmi_watchdog
> >        perf stat ...
> >        echo 1 > /proc/sys/kernel/nmi_watchdog
> > ```
> >
> > We've injected the slots event but the repeated topdown-fe-bound
> > causes the group to fail in a similar way. Why is repeating slots a
> > case we care about more?
> > Were we to say, okay slots is more special than the other perf metric
> > events then I'd prefer arch_evsel__must_be_in_group to return false
> > for the slots event when there are no other perf metric events in the
> > evlist. But then what do you do if the slots event is in a different
> > group like:
> > ```
> > $ perf stat -e "slots,{slots,topdown-fe-bound}" true
> > ```
> > It is pretty easy to teach the code to deduplicate events, but then
> > again, what about the group behavior?
> > It is not clear to me we can ever clean up all the mess that the perf
> > metric events on p-cores create and I'm in favor of having the code be
> > no more complex than it needs to be, so I'm not sure we should be
> > solving this problem.
>
> Ian, thanks for reviewing. Yeah, I agree what you said "topdown events are
> a massive mess", we could never solve all issues. But it's annoying for
> users that it reports "not counted" instead of an error. Is it possible
> that we take a step back and don't try to resolve this issue and just
> report an error (maybe plus a message) to warn users that multiple same
> topdown events are not allowed? Thanks.

Hi Dapeng,

looking at:
```
$ sudo perf stat -vv -e
'cpu_core/topdown-fe-bound/,cpu_core/topdown-fe-bound/' true
Using CPUID GenuineIntel-6-B7-1
Attempt to add: cpu_core/topdown-fe-bound/
..after resolving event: cpu_core/event=0,umask=0x82/
Attempt to add: cpu_core/topdown-fe-bound/
..after resolving event: cpu_core/event=0,umask=0x82/
WARNING: events were regrouped to match PMUs
evlist after sorting/fixing:
'{cpu_core/slots/,cpu_core/topdown-fe-bound/,cpu_core/topdown-fe-bound/}'
Control descriptor is not initialized
------------------------------------------------------------
perf_event_attr:
  type                             4 (cpu_core)
  size                             136
  config                           0x400 (slots)
  sample_type                      IDENTIFIER
  read_format
TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
  disabled                         1
  inherit                          1
  enable_on_exec                   1
------------------------------------------------------------
sys_perf_event_open: pid 2631204  cpu -1  group_fd -1  flags 0x8 = 3
------------------------------------------------------------
perf_event_attr:
  type                             4 (cpu_core)
  size                             136
  config                           0x8200 (topdown-fe-bound)
  sample_type                      IDENTIFIER
  read_format
TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
  inherit                          1
------------------------------------------------------------
sys_perf_event_open: pid 2631204  cpu -1  group_fd 3  flags 0x8 = 4
------------------------------------------------------------
perf_event_attr:
  type                             4 (cpu_core)
  size                             136
  config                           0x8200 (topdown-fe-bound)
  sample_type                      IDENTIFIER
  read_format
TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
  inherit                          1
------------------------------------------------------------
sys_perf_event_open: pid 2631204  cpu -1  group_fd 3  flags 0x8
sys_perf_event_open failed, error -22
switching off exclude_guest for PMU cpu_core
Using PERF_SAMPLE_READ / :S modifier is not compatible with inherit,
falling back to no-inherit.
Warning:
cpu_core/topdown-fe-bound/ event is not supported by the kernel.
failed to read counter cpu_core/slots/
failed to read counter cpu_core/topdown-fe-bound/
failed to read counter cpu_core/topdown-fe-bound/

 Performance counter stats for 'true':

     <not counted>      cpu_core/slots/
     <not counted>      cpu_core/topdown-fe-bound/
   <not supported>      cpu_core/topdown-fe-bound/

       0.000975600 seconds time elapsed

       0.001034000 seconds user
       0.000000000 seconds sys
```
We can see that we have two good perf_event_opens and then the evsel
open code is trying to disable features because of the perf event open
failing. We can improve the error message:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evsel.c?h=perf-tools-next#n3828
so that a perf metric event that returns EINVAL warns about the broken
grouping (perhaps some arch/PMU specific EINVAL handling). Perhaps if
this is a slots event that isn't the group leader, the event can be
removed from the group at this point. There is some overlap with
breaking weak groups, when too many events are placed within a group:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evlist.c?h=perf-tools-next#n1848

Generating the warning message looks relatively easy, making this a
case that fails or changes the evlist looks to be a larger refactor.
Should the failing perf_event_open cause the other members of the
group to break? That seems like an kernel and Intel PMU decision that
was already made. My suggestion is we create a warning message for
these perf metric event EINVAL cases but otherwise leave things
unchanged.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf tools topdown: Fix incorrect topdown slots regroup
  2025-08-25 15:29     ` Ian Rogers
@ 2025-08-25 15:54       ` Ian Rogers
  2025-08-25 21:13         ` Ian Rogers
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Rogers @ 2025-08-25 15:54 UTC (permalink / raw)
  To: Mi, Dapeng
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Adrian Hunter, Alexander Shishkin, Kan Liang,
	linux-perf-users, linux-kernel, Dapeng Mi, Xudong Hao

On Mon, Aug 25, 2025 at 8:29 AM Ian Rogers <irogers@google.com> wrote:
>
> On Sun, Aug 24, 2025 at 5:54 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
> >
> >
> > On 8/23/2025 1:35 AM, Ian Rogers wrote:
> > > On Fri, Aug 22, 2025 at 1:23 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
> > >> When running the command "perf stat -e "slots,slots" -C0 sleep 1", we
> > >> see below error.
> > >>
> > >> perf stat -e "slots,slots" -C0 sleep 1
> > >> WARNING: events were regrouped to match PMUs
> > >>  Performance counter stats for 'CPU(s) 0':
> > >>      <not counted>      slots
> > >>    <not supported>      slots
> > >>
> > >>      1.002265769 seconds time elapsed
> > >>
> > >> The topdown slots events are not correctly counted. The root cause is
> > >> that perf tools incorrectly regroup these 2 slots events into a group.
> > >> If there are only topdown slots events but no topdown metrics events,
> > >> the regroup should not be done since topdown slots event can only be
> > >> programed on fixed counter 3 and multiple slots events can only be
> > >> multiplexed to run on fixed counter 3, but grouping them blocks
> > >> multiplexing.
> > >>
> > >> So avoid to regroup topdown slots events if there is no topdown metrics
> > >> events.
> > >>
> > >> With this change, above command can be run successfully.
> > >>
> > >> perf stat -e "slots,slots" -C0 sleep 1
> > >>  Performance counter stats for 'CPU(s) 0':
> > >>        103,973,791      slots
> > >>        106,488,170      slots
> > >>
> > >>        1.003517284 seconds time elapsed
> > >>
> > >> Besides, run perf stats/record test on SPR and PTL, both passed.
> > >>
> > >> Reported-by: Xudong Hao <xudong.hao@intel.com>
> > >> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> > > I don't think we should do this and if we were to do it we shouldn't
> > > do it in the common code. The perf metrics requiring a slots event is
> > > a massive mess that never seems to end. What should we do with:
> > > ```
> > > $ perf stat -e "topdown-fe-bound,topdown-fe-bound" true
> > >
> > > Performance counter stats for 'true':
> > >
> > >     <not counted>      slots
> > >     <not counted>      topdown-fe-bound
> > >   <not supported>      topdown-fe-bound
> > >
> > >       0.000960472 seconds time elapsed
> > >
> > >       0.001060000 seconds user
> > >       0.000000000 seconds sys
> > >
> > >
> > > Some events weren't counted. Try disabling the NMI watchdog:
> > >        echo 0 > /proc/sys/kernel/nmi_watchdog
> > >        perf stat ...
> > >        echo 1 > /proc/sys/kernel/nmi_watchdog
> > > ```
> > >
> > > We've injected the slots event but the repeated topdown-fe-bound
> > > causes the group to fail in a similar way. Why is repeating slots a
> > > case we care about more?
> > > Were we to say, okay slots is more special than the other perf metric
> > > events then I'd prefer arch_evsel__must_be_in_group to return false
> > > for the slots event when there are no other perf metric events in the
> > > evlist. But then what do you do if the slots event is in a different
> > > group like:
> > > ```
> > > $ perf stat -e "slots,{slots,topdown-fe-bound}" true
> > > ```
> > > It is pretty easy to teach the code to deduplicate events, but then
> > > again, what about the group behavior?
> > > It is not clear to me we can ever clean up all the mess that the perf
> > > metric events on p-cores create and I'm in favor of having the code be
> > > no more complex than it needs to be, so I'm not sure we should be
> > > solving this problem.
> >
> > Ian, thanks for reviewing. Yeah, I agree what you said "topdown events are
> > a massive mess", we could never solve all issues. But it's annoying for
> > users that it reports "not counted" instead of an error. Is it possible
> > that we take a step back and don't try to resolve this issue and just
> > report an error (maybe plus a message) to warn users that multiple same
> > topdown events are not allowed? Thanks.
>
> Hi Dapeng,
>
> looking at:
> ```
> $ sudo perf stat -vv -e
> 'cpu_core/topdown-fe-bound/,cpu_core/topdown-fe-bound/' true
> Using CPUID GenuineIntel-6-B7-1
> Attempt to add: cpu_core/topdown-fe-bound/
> ..after resolving event: cpu_core/event=0,umask=0x82/
> Attempt to add: cpu_core/topdown-fe-bound/
> ..after resolving event: cpu_core/event=0,umask=0x82/
> WARNING: events were regrouped to match PMUs
> evlist after sorting/fixing:
> '{cpu_core/slots/,cpu_core/topdown-fe-bound/,cpu_core/topdown-fe-bound/}'
> Control descriptor is not initialized
> ------------------------------------------------------------
> perf_event_attr:
>   type                             4 (cpu_core)
>   size                             136
>   config                           0x400 (slots)
>   sample_type                      IDENTIFIER
>   read_format
> TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
>   disabled                         1
>   inherit                          1
>   enable_on_exec                   1
> ------------------------------------------------------------
> sys_perf_event_open: pid 2631204  cpu -1  group_fd -1  flags 0x8 = 3
> ------------------------------------------------------------
> perf_event_attr:
>   type                             4 (cpu_core)
>   size                             136
>   config                           0x8200 (topdown-fe-bound)
>   sample_type                      IDENTIFIER
>   read_format
> TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
>   inherit                          1
> ------------------------------------------------------------
> sys_perf_event_open: pid 2631204  cpu -1  group_fd 3  flags 0x8 = 4
> ------------------------------------------------------------
> perf_event_attr:
>   type                             4 (cpu_core)
>   size                             136
>   config                           0x8200 (topdown-fe-bound)
>   sample_type                      IDENTIFIER
>   read_format
> TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
>   inherit                          1
> ------------------------------------------------------------
> sys_perf_event_open: pid 2631204  cpu -1  group_fd 3  flags 0x8
> sys_perf_event_open failed, error -22
> switching off exclude_guest for PMU cpu_core
> Using PERF_SAMPLE_READ / :S modifier is not compatible with inherit,
> falling back to no-inherit.
> Warning:
> cpu_core/topdown-fe-bound/ event is not supported by the kernel.
> failed to read counter cpu_core/slots/
> failed to read counter cpu_core/topdown-fe-bound/
> failed to read counter cpu_core/topdown-fe-bound/
>
>  Performance counter stats for 'true':
>
>      <not counted>      cpu_core/slots/
>      <not counted>      cpu_core/topdown-fe-bound/
>    <not supported>      cpu_core/topdown-fe-bound/
>
>        0.000975600 seconds time elapsed
>
>        0.001034000 seconds user
>        0.000000000 seconds sys
> ```
> We can see that we have two good perf_event_opens and then the evsel
> open code is trying to disable features because of the perf event open
> failing. We can improve the error message:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evsel.c?h=perf-tools-next#n3828
> so that a perf metric event that returns EINVAL warns about the broken
> grouping (perhaps some arch/PMU specific EINVAL handling). Perhaps if
> this is a slots event that isn't the group leader, the event can be
> removed from the group at this point. There is some overlap with
> breaking weak groups, when too many events are placed within a group:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evlist.c?h=perf-tools-next#n1848
>
> Generating the warning message looks relatively easy, making this a
> case that fails or changes the evlist looks to be a larger refactor.
> Should the failing perf_event_open cause the other members of the
> group to break? That seems like an kernel and Intel PMU decision that
> was already made. My suggestion is we create a warning message for
> these perf metric event EINVAL cases but otherwise leave things
> unchanged.

I also think it'd be worthwhile to add a command line option to
disable parse_events__sort_events_and_fix_groups code so that things
like "slots,slots" can be just passed through and not get broken. This
may be unfortunate for uncore things like:
```
$ sudo perf stat -v -e '{data_read,data_write}' -a -A true
Using CPUID GenuineIntel-6-B7-1
data_read -> uncore_imc_free_running_0/data_read/
data_read -> uncore_imc_free_running_1/data_read/
data_write -> uncore_imc_free_running_0/data_write/
data_write -> uncore_imc_free_running_1/data_write/
WARNING: events were regrouped to match PMUs
evlist after sorting/fixing: '{data_read,data_write},{data_read,data_write}'
Control descriptor is not initialized

Performance counter stats for 'system wide':

CPU0                 1.85 MiB  uncore_imc_free_running_0/data_read/

CPU0                 0.08 MiB  uncore_imc_free_running_0/data_write/

CPU0                 1.87 MiB  uncore_imc_free_running_1/data_read/

CPU0                 0.07 MiB  uncore_imc_free_running_1/data_write/


      0.000851335 seconds time elapsed
```
where the different uncore PMUs for the data_read and data_write
events won't get resorted and the groups broken, etc. Perhaps rather
than a command line option for the evlist it can be an option on the
event, so something like:
```
$ perf stat -e 'slots:X,slots:X,{data_read,data_write}' ...
```
where the X modifier is indicating not to change the group of the event.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf tools topdown: Fix incorrect topdown slots regroup
  2025-08-25 15:54       ` Ian Rogers
@ 2025-08-25 21:13         ` Ian Rogers
  0 siblings, 0 replies; 6+ messages in thread
From: Ian Rogers @ 2025-08-25 21:13 UTC (permalink / raw)
  To: Mi, Dapeng
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Adrian Hunter, Alexander Shishkin, Kan Liang,
	linux-perf-users, linux-kernel, Dapeng Mi, Xudong Hao

On Mon, Aug 25, 2025 at 8:54 AM Ian Rogers <irogers@google.com> wrote:
>
> On Mon, Aug 25, 2025 at 8:29 AM Ian Rogers <irogers@google.com> wrote:
> >
> > On Sun, Aug 24, 2025 at 5:54 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
> > >
> > >
> > > On 8/23/2025 1:35 AM, Ian Rogers wrote:
> > > > On Fri, Aug 22, 2025 at 1:23 AM Dapeng Mi <dapeng1.mi@linux.intel.com> wrote:
> > > >> When running the command "perf stat -e "slots,slots" -C0 sleep 1", we
> > > >> see below error.
> > > >>
> > > >> perf stat -e "slots,slots" -C0 sleep 1
> > > >> WARNING: events were regrouped to match PMUs
> > > >>  Performance counter stats for 'CPU(s) 0':
> > > >>      <not counted>      slots
> > > >>    <not supported>      slots
> > > >>
> > > >>      1.002265769 seconds time elapsed
> > > >>
> > > >> The topdown slots events are not correctly counted. The root cause is
> > > >> that perf tools incorrectly regroup these 2 slots events into a group.
> > > >> If there are only topdown slots events but no topdown metrics events,
> > > >> the regroup should not be done since topdown slots event can only be
> > > >> programed on fixed counter 3 and multiple slots events can only be
> > > >> multiplexed to run on fixed counter 3, but grouping them blocks
> > > >> multiplexing.
> > > >>
> > > >> So avoid to regroup topdown slots events if there is no topdown metrics
> > > >> events.
> > > >>
> > > >> With this change, above command can be run successfully.
> > > >>
> > > >> perf stat -e "slots,slots" -C0 sleep 1
> > > >>  Performance counter stats for 'CPU(s) 0':
> > > >>        103,973,791      slots
> > > >>        106,488,170      slots
> > > >>
> > > >>        1.003517284 seconds time elapsed
> > > >>
> > > >> Besides, run perf stats/record test on SPR and PTL, both passed.
> > > >>
> > > >> Reported-by: Xudong Hao <xudong.hao@intel.com>
> > > >> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> > > > I don't think we should do this and if we were to do it we shouldn't
> > > > do it in the common code. The perf metrics requiring a slots event is
> > > > a massive mess that never seems to end. What should we do with:
> > > > ```
> > > > $ perf stat -e "topdown-fe-bound,topdown-fe-bound" true
> > > >
> > > > Performance counter stats for 'true':
> > > >
> > > >     <not counted>      slots
> > > >     <not counted>      topdown-fe-bound
> > > >   <not supported>      topdown-fe-bound
> > > >
> > > >       0.000960472 seconds time elapsed
> > > >
> > > >       0.001060000 seconds user
> > > >       0.000000000 seconds sys
> > > >
> > > >
> > > > Some events weren't counted. Try disabling the NMI watchdog:
> > > >        echo 0 > /proc/sys/kernel/nmi_watchdog
> > > >        perf stat ...
> > > >        echo 1 > /proc/sys/kernel/nmi_watchdog
> > > > ```
> > > >
> > > > We've injected the slots event but the repeated topdown-fe-bound
> > > > causes the group to fail in a similar way. Why is repeating slots a
> > > > case we care about more?
> > > > Were we to say, okay slots is more special than the other perf metric
> > > > events then I'd prefer arch_evsel__must_be_in_group to return false
> > > > for the slots event when there are no other perf metric events in the
> > > > evlist. But then what do you do if the slots event is in a different
> > > > group like:
> > > > ```
> > > > $ perf stat -e "slots,{slots,topdown-fe-bound}" true
> > > > ```
> > > > It is pretty easy to teach the code to deduplicate events, but then
> > > > again, what about the group behavior?
> > > > It is not clear to me we can ever clean up all the mess that the perf
> > > > metric events on p-cores create and I'm in favor of having the code be
> > > > no more complex than it needs to be, so I'm not sure we should be
> > > > solving this problem.
> > >
> > > Ian, thanks for reviewing. Yeah, I agree what you said "topdown events are
> > > a massive mess", we could never solve all issues. But it's annoying for
> > > users that it reports "not counted" instead of an error. Is it possible
> > > that we take a step back and don't try to resolve this issue and just
> > > report an error (maybe plus a message) to warn users that multiple same
> > > topdown events are not allowed? Thanks.
> >
> > Hi Dapeng,
> >
> > looking at:
> > ```
> > $ sudo perf stat -vv -e
> > 'cpu_core/topdown-fe-bound/,cpu_core/topdown-fe-bound/' true
> > Using CPUID GenuineIntel-6-B7-1
> > Attempt to add: cpu_core/topdown-fe-bound/
> > ..after resolving event: cpu_core/event=0,umask=0x82/
> > Attempt to add: cpu_core/topdown-fe-bound/
> > ..after resolving event: cpu_core/event=0,umask=0x82/
> > WARNING: events were regrouped to match PMUs
> > evlist after sorting/fixing:
> > '{cpu_core/slots/,cpu_core/topdown-fe-bound/,cpu_core/topdown-fe-bound/}'
> > Control descriptor is not initialized
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             4 (cpu_core)
> >   size                             136
> >   config                           0x400 (slots)
> >   sample_type                      IDENTIFIER
> >   read_format
> > TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
> >   disabled                         1
> >   inherit                          1
> >   enable_on_exec                   1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 2631204  cpu -1  group_fd -1  flags 0x8 = 3
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             4 (cpu_core)
> >   size                             136
> >   config                           0x8200 (topdown-fe-bound)
> >   sample_type                      IDENTIFIER
> >   read_format
> > TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
> >   inherit                          1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 2631204  cpu -1  group_fd 3  flags 0x8 = 4
> > ------------------------------------------------------------
> > perf_event_attr:
> >   type                             4 (cpu_core)
> >   size                             136
> >   config                           0x8200 (topdown-fe-bound)
> >   sample_type                      IDENTIFIER
> >   read_format
> > TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING|ID|GROUP
> >   inherit                          1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid 2631204  cpu -1  group_fd 3  flags 0x8
> > sys_perf_event_open failed, error -22
> > switching off exclude_guest for PMU cpu_core
> > Using PERF_SAMPLE_READ / :S modifier is not compatible with inherit,
> > falling back to no-inherit.
> > Warning:
> > cpu_core/topdown-fe-bound/ event is not supported by the kernel.
> > failed to read counter cpu_core/slots/
> > failed to read counter cpu_core/topdown-fe-bound/
> > failed to read counter cpu_core/topdown-fe-bound/
> >
> >  Performance counter stats for 'true':
> >
> >      <not counted>      cpu_core/slots/
> >      <not counted>      cpu_core/topdown-fe-bound/
> >    <not supported>      cpu_core/topdown-fe-bound/
> >
> >        0.000975600 seconds time elapsed
> >
> >        0.001034000 seconds user
> >        0.000000000 seconds sys
> > ```
> > We can see that we have two good perf_event_opens and then the evsel
> > open code is trying to disable features because of the perf event open
> > failing. We can improve the error message:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evsel.c?h=perf-tools-next#n3828
> > so that a perf metric event that returns EINVAL warns about the broken
> > grouping (perhaps some arch/PMU specific EINVAL handling). Perhaps if
> > this is a slots event that isn't the group leader, the event can be
> > removed from the group at this point. There is some overlap with
> > breaking weak groups, when too many events are placed within a group:
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/evlist.c?h=perf-tools-next#n1848
> >
> > Generating the warning message looks relatively easy, making this a
> > case that fails or changes the evlist looks to be a larger refactor.
> > Should the failing perf_event_open cause the other members of the
> > group to break? That seems like an kernel and Intel PMU decision that
> > was already made. My suggestion is we create a warning message for
> > these perf metric event EINVAL cases but otherwise leave things
> > unchanged.
>
> I also think it'd be worthwhile to add a command line option to
> disable parse_events__sort_events_and_fix_groups code so that things
> like "slots,slots" can be just passed through and not get broken. This
> may be unfortunate for uncore things like:
> ```
> $ sudo perf stat -v -e '{data_read,data_write}' -a -A true
> Using CPUID GenuineIntel-6-B7-1
> data_read -> uncore_imc_free_running_0/data_read/
> data_read -> uncore_imc_free_running_1/data_read/
> data_write -> uncore_imc_free_running_0/data_write/
> data_write -> uncore_imc_free_running_1/data_write/
> WARNING: events were regrouped to match PMUs
> evlist after sorting/fixing: '{data_read,data_write},{data_read,data_write}'
> Control descriptor is not initialized
>
> Performance counter stats for 'system wide':
>
> CPU0                 1.85 MiB  uncore_imc_free_running_0/data_read/
>
> CPU0                 0.08 MiB  uncore_imc_free_running_0/data_write/
>
> CPU0                 1.87 MiB  uncore_imc_free_running_1/data_read/
>
> CPU0                 0.07 MiB  uncore_imc_free_running_1/data_write/
>
>
>       0.000851335 seconds time elapsed
> ```
> where the different uncore PMUs for the data_read and data_write
> events won't get resorted and the groups broken, etc. Perhaps rather
> than a command line option for the evlist it can be an option on the
> event, so something like:
> ```
> $ perf stat -e 'slots:X,slots:X,{data_read,data_write}' ...
> ```
> where the X modifier is indicating not to change the group of the event.

So a patch series doing this is here:
https://lore.kernel.org/lkml/20250825211204.2784695-1-irogers@google.com/

Thanks,
Ian

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-08-25 21:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-22  8:22 [PATCH] perf tools topdown: Fix incorrect topdown slots regroup Dapeng Mi
2025-08-22 17:35 ` Ian Rogers
2025-08-25  0:54   ` Mi, Dapeng
2025-08-25 15:29     ` Ian Rogers
2025-08-25 15:54       ` Ian Rogers
2025-08-25 21:13         ` Ian Rogers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).