Re: [PATCH 1/4] perf topdown: Correct leader selection with sample

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH 1/4] perf topdown: Correct leader selection with sample_read enabled
  2024-07-02 22:40 ` [PATCH 1/4] perf topdown: Correct leader selection with sample_read enabled Dapeng Mi
@ 2024-07-02 16:05   ` Liang, Kan
  2024-07-03  2:46     ` Mi, Dapeng
  0 siblings, 1 reply; 12+ messages in thread
From: Liang, Kan @ 2024-07-02 16:05 UTC (permalink / raw)
  To: Dapeng Mi, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin
  Cc: linux-perf-users, linux-kernel, Yanfei Xu, Dapeng Mi



On 2024-07-02 6:40 p.m., Dapeng Mi wrote:
> Addresses an issue where, in the absence of a topdown metrics event
> within a sampling group, the slots event was incorrectly bypassed as
> the sampling leader when sample_read was enabled.
> 
> perf record -e '{slots,branches}:S' -c 10000 -vv sleep 1
> 
> In this case, the slots event should be sampled as leader but the
> branches event is sampled in fact like the verbose output shows.
> 
> perf_event_attr:
>   type                             4 (cpu)
>   size                             168
>   config                           0x400 (slots)
>   sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
>   read_format                      ID|GROUP|LOST
>   disabled                         1
>   sample_id_all                    1
>   exclude_guest                    1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
> ------------------------------------------------------------
> perf_event_attr:
>   type                             0 (PERF_TYPE_HARDWARE)
>   size                             168
>   config                           0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS)
>   { sample_period, sample_freq }   10000
>   sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
>   read_format                      ID|GROUP|LOST
>   sample_id_all                    1
>   exclude_guest                    1
> 
> The sample period of slots event instead of branches event is reset to
> 0.
> 
> This fix ensures the slots event remains the leader under these
> conditions.
> 
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
>  tools/perf/arch/x86/util/topdown.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
> index 3f9a267d4501..5d7b78eb7516 100644
> --- a/tools/perf/arch/x86/util/topdown.c
> +++ b/tools/perf/arch/x86/util/topdown.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  #include "api/fs/fs.h"
>  #include "util/evsel.h"
> +#include "util/evlist.h"
>  #include "util/pmu.h"
>  #include "util/pmus.h"
>  #include "util/topdown.h"
> @@ -41,11 +42,22 @@ bool topdown_sys_has_perf_metrics(void)
>   */
>  bool arch_topdown_sample_read(struct evsel *leader)
>  {
> +	struct evsel *event;
> +
>  	if (!evsel__sys_has_perf_metrics(leader))
>  		return false;
>  
> -	if (leader->core.attr.config == TOPDOWN_SLOTS)
> -		return true;
> +	if (leader->core.attr.config != TOPDOWN_SLOTS)
> +		return false;
> +
> +	/*
> +	 * If slots event as leader event but no topdown metric events in group,
> +	 * slots event should still sample as leader.
> +	 */
> +	evlist__for_each_entry(leader->evlist, event) {
> +		if (event != leader && strcasestr(event->name, "topdown"))

User may uses the RAW format. It may not be good enough to just check
the event name.

I recall you have a complete support for this in the previous patch. Why
drop it?

Thanks,
Kan

> +			return true;
> +	}
>  
>  	return false;
>  }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] perf parse-events: Don't move slots event when no topdwon metrics event
  2024-07-02 22:40 ` [PATCH 3/4] perf parse-events: Don't move slots event when no topdwon metrics event Dapeng Mi
@ 2024-07-02 18:03   ` Liang, Kan
  2024-07-03  2:51     ` Mi, Dapeng
  0 siblings, 1 reply; 12+ messages in thread
From: Liang, Kan @ 2024-07-02 18:03 UTC (permalink / raw)
  To: Dapeng Mi, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin
  Cc: linux-perf-users, linux-kernel, Yanfei Xu, Dapeng Mi



On 2024-07-02 6:40 p.m., Dapeng Mi wrote:
> Currently the helper arch_evlist__cmp() would unconditionally move slots
> event to be the leader event even though there is no topdown metrics
> event in the group.
> 
> perf stat -e "{instructions,slots}" -C 0 sleep 1
> WARNING: events were regrouped to match PMUs
> 
>  Performance counter stats for 'CPU(s) 0':
> 
>         27,581,148      slots
>          8,390,827      instructions
> 
>        1.045546595 seconds time elapsed
> 
> This is an overkill. It's not necessary to move slots event as the leader
> event if there is no topdown metrics event.
> 
> Thus only regroup events when there are both topdown slots and metrics
> events in a group.
> 
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
>  tools/perf/arch/x86/util/evlist.c |  6 ++++--
>  tools/perf/util/evlist.h          |  7 ++++++-
>  tools/perf/util/parse-events.c    | 35 ++++++++++++++++++-------------
>  3 files changed, 31 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
> index 7215c7c7b435..a1e78be6ebd1 100644
> --- a/tools/perf/arch/x86/util/evlist.c
> +++ b/tools/perf/arch/x86/util/evlist.c
> @@ -73,9 +73,11 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
>  	return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
>  }
>  
> -int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
> +int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs, void *priv)
>  {
> -	if (topdown_sys_has_perf_metrics() &&
> +	struct sort_priv *_priv = priv;
> +
> +	if (topdown_sys_has_perf_metrics() && _priv->topdown_metrics_in_group &&
>  	    (arch_evsel__must_be_in_group(lhs) || arch_evsel__must_be_in_group(rhs))) {
>  		/* Ensure the topdown slots comes first. */
>  		if (strcasestr(lhs->name, "slots") && !strcasestr(lhs->name, "uops_retired.slots"))
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index cb91dc9117a2..14c858dcf5a2 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -89,6 +89,11 @@ struct evsel_str_handler {
>  	void	   *handler;
>  };
>  
> +struct sort_priv {
> +	int force_grouped_idx;
> +	bool topdown_metrics_in_group;

The topdown metrics should be only available in some Intel platforms. I
don't think we want to add such platform-specific variable in the
generic code.

The current code just re-order the events, not re-group. So it doesn't
impact the result accuracy.
So the issue is just an annoying WARNING, right?

It seems the issue has been there for more than 1 year. No complaints
except for one internal test case, which can be easily fixed.
Considering the complexity of the fix, I guess we may leave it as is.

Thanks,
Kan
> +};
> +
>  struct evlist *evlist__new(void);
>  struct evlist *evlist__new_default(void);
>  struct evlist *evlist__new_dummy(void);
> @@ -112,7 +117,7 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
>  #define evlist__add_default_attrs(evlist, array) \
>  	arch_evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array))
>  
> -int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs);
> +int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs, void *priv);
>  
>  int evlist__add_dummy(struct evlist *evlist);
>  struct evsel *evlist__add_aux_dummy(struct evlist *evlist, bool system_wide);
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 6ed0f9c5581d..a3f7173a7ae2 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -1962,19 +1962,21 @@ static int evsel__compute_group_pmu_name(struct evsel *evsel,
>  	return evsel->group_pmu_name ? 0 : -ENOMEM;
>  }
>  
> -__weak int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
> +__weak int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs,
> +			    void *priv __maybe_unused)
>  {
>  	/* Order by insertion index. */
>  	return lhs->core.idx - rhs->core.idx;
>  }
>  
> -static int evlist__cmp(void *_fg_idx, const struct list_head *l, const struct list_head *r)
> +static int evlist__cmp(void *_sort_priv, const struct list_head *l, const struct list_head *r)
>  {
>  	const struct perf_evsel *lhs_core = container_of(l, struct perf_evsel, node);
>  	const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
>  	const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
>  	const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
> -	int *force_grouped_idx = _fg_idx;
> +	struct sort_priv *sort_priv = _sort_priv;
> +	int force_grouped_idx = sort_priv->force_grouped_idx;
>  	int lhs_sort_idx, rhs_sort_idx, ret;
>  	const char *lhs_pmu_name, *rhs_pmu_name;
>  	bool lhs_has_group, rhs_has_group;
> @@ -1992,8 +1994,8 @@ static int evlist__cmp(void *_fg_idx, const struct list_head *l, const struct li
>  		lhs_sort_idx = lhs_core->leader->idx;
>  	} else {
>  		lhs_has_group = false;
> -		lhs_sort_idx = *force_grouped_idx != -1 && arch_evsel__must_be_in_group(lhs)
> -			? *force_grouped_idx
> +		lhs_sort_idx = force_grouped_idx != -1 && arch_evsel__must_be_in_group(lhs)
> +			? force_grouped_idx
>  			: lhs_core->idx;
>  	}
>  	if (rhs_core->leader != rhs_core || rhs_core->nr_members > 1) {
> @@ -2001,8 +2003,8 @@ static int evlist__cmp(void *_fg_idx, const struct list_head *l, const struct li
>  		rhs_sort_idx = rhs_core->leader->idx;
>  	} else {
>  		rhs_has_group = false;
> -		rhs_sort_idx = *force_grouped_idx != -1 && arch_evsel__must_be_in_group(rhs)
> -			? *force_grouped_idx
> +		rhs_sort_idx = force_grouped_idx != -1 && arch_evsel__must_be_in_group(rhs)
> +			? force_grouped_idx
>  			: rhs_core->idx;
>  	}
>  
> @@ -2019,16 +2021,17 @@ static int evlist__cmp(void *_fg_idx, const struct list_head *l, const struct li
>  	}
>  
>  	/* Architecture specific sorting. */
> -	return arch_evlist__cmp(lhs, rhs);
> +	return arch_evlist__cmp(lhs, rhs, _sort_priv);
>  }
>  
>  static int parse_events__sort_events_and_fix_groups(struct list_head *list)
>  {
> -	int idx = 0, force_grouped_idx = -1;
>  	struct evsel *pos, *cur_leader = NULL;
>  	struct perf_evsel *cur_leaders_grp = NULL;
>  	bool idx_changed = false, cur_leader_force_grouped = false;
>  	int orig_num_leaders = 0, num_leaders = 0;
> +	struct sort_priv sort_priv = {-1, false};
> +	int idx = 0;
>  	int ret;
>  
>  	/*
> @@ -2053,13 +2056,17 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
>  		pos->core.idx = idx++;
>  
>  		/* Remember an index to sort all forced grouped events together to. */
> -		if (force_grouped_idx == -1 && pos == pos_leader && pos->core.nr_members < 2 &&
> -		    arch_evsel__must_be_in_group(pos))
> -			force_grouped_idx = pos->core.idx;
> +		if (sort_priv.force_grouped_idx == -1 && pos == pos_leader &&
> +		    pos->core.nr_members < 2 && arch_evsel__must_be_in_group(pos))
> +			sort_priv.force_grouped_idx = pos->core.idx;
> +
> +		if (!sort_priv.topdown_metrics_in_group &&
> +		    strcasestr(pos->name, "topdown"))
> +			sort_priv.topdown_metrics_in_group = true;
>  	}
>  
>  	/* Sort events. */
> -	list_sort(&force_grouped_idx, list, evlist__cmp);
> +	list_sort(&sort_priv, list, evlist__cmp);
>  
>  	/*
>  	 * Recompute groups, splitting for PMUs and adding groups for events
> @@ -2070,7 +2077,7 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
>  		const struct evsel *pos_leader = evsel__leader(pos);
>  		const char *pos_pmu_name = pos->group_pmu_name;
>  		const char *cur_leader_pmu_name;
> -		bool pos_force_grouped = force_grouped_idx != -1 &&
> +		bool pos_force_grouped = sort_priv.force_grouped_idx != -1 &&
>  			arch_evsel__must_be_in_group(pos);
>  
>  		/* Reset index and nr_members. */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] perf tests: Add leader sampling test in record tests
  2024-07-02 22:40 ` [PATCH 4/4] perf tests: Add leader sampling test in record tests Dapeng Mi
@ 2024-07-02 18:07   ` Liang, Kan
  2024-07-03  2:53     ` Mi, Dapeng
  0 siblings, 1 reply; 12+ messages in thread
From: Liang, Kan @ 2024-07-02 18:07 UTC (permalink / raw)
  To: Dapeng Mi, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin
  Cc: linux-perf-users, linux-kernel, Yanfei Xu, Dapeng Mi



On 2024-07-02 6:40 p.m., Dapeng Mi wrote:
> Add leader sampling test to validate event counts are captured into
> record and the count value is consistent.
> 
> Suggested-by: Kan Liang <kan.liang@linux.intel.com>
> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
> ---
>  tools/perf/tests/shell/record.sh | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
> 
> diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh
> index 3d1a7759a7b2..8e3e66780fed 100755
> --- a/tools/perf/tests/shell/record.sh
> +++ b/tools/perf/tests/shell/record.sh
> @@ -17,6 +17,7 @@ skip_test_missing_symbol ${testsym}
>  
>  err=0
>  perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
> +script_output=$(mktemp /tmp/__perf_test.perf.data.XXXXX.script)
>  testprog="perf test -w thloop"
>  cpu_pmu_dir="/sys/bus/event_source/devices/cpu*"
>  br_cntr_file="/caps/branch_counter_nr"
> @@ -190,11 +191,38 @@ test_branch_counter() {
>    echo "Basic branch counter test [Success]"
>  }
>  
> +test_leader_sampling() {
> +  echo "Basic leader sampling test"
> +  if ! perf record -o "${perfdata}" -e "{branches,branches}:Su" perf test -w brstack 2> /dev/null

I think we still need a case to verify the topdown fix you did in the
patch set.
Since the topdown is platform specific, you may want to skip it on the
platform which doesn't support perf metrics. For example, check if the
slots event exists in the event folder.

Thanks,
Kan
> +  then
> +    echo "Leader sampling [Failed record]"
> +    err=1
> +    return
> +  fi
> +  index=0
> +  perf script -i "${perfdata}" > $script_output
> +  while IFS= read -r line
> +  do
> +    # Check if the two branches counts are equal in each record
> +    branches=$(echo $line | awk '{for(i=1;i<=NF;i++) if($i=="branches:") print $(i-1)}')
> +    if [ $(($index%2)) -ne 0 ] && [ ${branches}x != ${prev_branches}x ]
> +    then
> +      echo "Leader sampling [Failed inconsistent branches count]"
> +      err=1
> +      return
> +    fi
> +    index=$(($index+1))
> +    prev_branches=$branches
> +  done < $script_output
> +  echo "Basic leader sampling test [Success]"
> +}
> +
>  test_per_thread
>  test_register_capture
>  test_system_wide
>  test_workload
>  test_branch_counter
> +test_leader_sampling
>  
>  cleanup
>  exit $err

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 0/4] Bug fixes on topdown metrics group leader selection
@ 2024-07-02 22:40 Dapeng Mi
  2024-07-02 22:40 ` [PATCH 1/4] perf topdown: Correct leader selection with sample_read enabled Dapeng Mi
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Dapeng Mi @ 2024-07-02 22:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Kan Liang
  Cc: linux-perf-users, linux-kernel, Yanfei Xu, Dapeng Mi, Dapeng Mi

when counting/sampling topdown slots and metrics events, the following
issues are found.

a. incorrect sampling leader selection if group only contains topdown
slots event without topdown metrics event, such as

perf record -e '{slots,branches}:S' -c 10000 -vv sleep 1

In this case, the slots event should be sampled as leader but the
branches event is sampled in fact like the verbose output shows.

perf_event_attr:
  type                             4 (cpu)
  size                             168
  config                           0x400 (slots)
  sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
  read_format                      ID|GROUP|LOST
  disabled                         1
  sample_id_all                    1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  size                             168
  config                           0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS)
  { sample_period, sample_freq }   10000
  sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
  read_format                      ID|GROUP|LOST
  sample_id_all                    1
  exclude_guest                    1

b. Fail to run the perf command

perf record -e "{slots,instructions,topdown-retiring}:S" -vv -C0 sleep 1

------------------------------------------------------------
perf_event_attr:
  type                             4 (cpu)
  size                             168
  config                           0x400 (slots)
  sample_type                      IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
  read_format                      ID|GROUP|LOST
  disabled                         1
  sample_id_all                    1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
  type                             4 (cpu)
  size                             168
  config                           0x8000 (topdown-retiring)
  { sample_period, sample_freq }   4000
  sample_type                      IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
  read_format                      ID|GROUP|LOST
  freq                             1
  sample_id_all                    1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd 5  flags 0x8
sys_perf_event_open failed, error -22

Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-retiring).

Perf tool tries to regroup the events and move topdown-retiring event
closely after slots event and leads to topdown-retiring event is select
to sample. This is incorrect.

c. unnecessary events regroup for the group which only has slots event
but without topdown metrics events, such as

perf stat -e "{instructions,slots}" -C 0 sleep 1
WARNING: events were regrouped to match PMUs

 Performance counter stats for 'CPU(s) 0':

        27,581,148      slots
         8,390,827      instructions

       1.045546595 seconds time elapsed

Obviously, this events regroup is unnecessary.

The patches 1-3 separately fixes the above 3 issues in order and the
patch 4/4 adds a new perf test to verify the leader sampling.


Dapeng Mi (4):
  perf topdown: Correct leader selection with sample_read enabled
  perf parse-events: Don't move topdown metrics events when sorting
    events
  perf parse-events: Don't move slots event when no topdwon metrics
    event
  perf tests: Add leader sampling test in record tests

 tools/perf/arch/x86/util/evlist.c  | 11 ++++------
 tools/perf/arch/x86/util/topdown.c | 16 ++++++++++++--
 tools/perf/tests/shell/record.sh   | 28 ++++++++++++++++++++++++
 tools/perf/util/evlist.h           |  7 +++++-
 tools/perf/util/parse-events.c     | 35 ++++++++++++++++++------------
 5 files changed, 73 insertions(+), 24 deletions(-)


base-commit: 73e931504f8e0d42978bfcda37b323dbbd1afc08
-- 
2.40.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH 1/4] perf topdown: Correct leader selection with sample_read enabled
  2024-07-02 22:40 [PATCH 0/4] Bug fixes on topdown metrics group leader selection Dapeng Mi
@ 2024-07-02 22:40 ` Dapeng Mi
  2024-07-02 16:05   ` Liang, Kan
  2024-07-02 22:40 ` [PATCH 2/4] perf parse-events: Don't move topdown metrics events when sorting events Dapeng Mi
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Dapeng Mi @ 2024-07-02 22:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Kan Liang
  Cc: linux-perf-users, linux-kernel, Yanfei Xu, Dapeng Mi, Dapeng Mi

Addresses an issue where, in the absence of a topdown metrics event
within a sampling group, the slots event was incorrectly bypassed as
the sampling leader when sample_read was enabled.

perf record -e '{slots,branches}:S' -c 10000 -vv sleep 1

In this case, the slots event should be sampled as leader but the
branches event is sampled in fact like the verbose output shows.

perf_event_attr:
  type                             4 (cpu)
  size                             168
  config                           0x400 (slots)
  sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
  read_format                      ID|GROUP|LOST
  disabled                         1
  sample_id_all                    1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
  type                             0 (PERF_TYPE_HARDWARE)
  size                             168
  config                           0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS)
  { sample_period, sample_freq }   10000
  sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
  read_format                      ID|GROUP|LOST
  sample_id_all                    1
  exclude_guest                    1

The sample period of slots event instead of branches event is reset to
0.

This fix ensures the slots event remains the leader under these
conditions.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 tools/perf/arch/x86/util/topdown.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
index 3f9a267d4501..5d7b78eb7516 100644
--- a/tools/perf/arch/x86/util/topdown.c
+++ b/tools/perf/arch/x86/util/topdown.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 #include "api/fs/fs.h"
 #include "util/evsel.h"
+#include "util/evlist.h"
 #include "util/pmu.h"
 #include "util/pmus.h"
 #include "util/topdown.h"
@@ -41,11 +42,22 @@ bool topdown_sys_has_perf_metrics(void)
  */
 bool arch_topdown_sample_read(struct evsel *leader)
 {
+	struct evsel *event;
+
 	if (!evsel__sys_has_perf_metrics(leader))
 		return false;
 
-	if (leader->core.attr.config == TOPDOWN_SLOTS)
-		return true;
+	if (leader->core.attr.config != TOPDOWN_SLOTS)
+		return false;
+
+	/*
+	 * If slots event as leader event but no topdown metric events in group,
+	 * slots event should still sample as leader.
+	 */
+	evlist__for_each_entry(leader->evlist, event) {
+		if (event != leader && strcasestr(event->name, "topdown"))
+			return true;
+	}
 
 	return false;
 }
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 2/4] perf parse-events: Don't move topdown metrics events when sorting events
  2024-07-02 22:40 [PATCH 0/4] Bug fixes on topdown metrics group leader selection Dapeng Mi
  2024-07-02 22:40 ` [PATCH 1/4] perf topdown: Correct leader selection with sample_read enabled Dapeng Mi
@ 2024-07-02 22:40 ` Dapeng Mi
  2024-07-02 22:40 ` [PATCH 3/4] perf parse-events: Don't move slots event when no topdwon metrics event Dapeng Mi
  2024-07-02 22:40 ` [PATCH 4/4] perf tests: Add leader sampling test in record tests Dapeng Mi
  3 siblings, 0 replies; 12+ messages in thread
From: Dapeng Mi @ 2024-07-02 22:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Kan Liang
  Cc: linux-perf-users, linux-kernel, Yanfei Xu, Dapeng Mi, Dapeng Mi

when running below perf command, we say error is reported.

perf record -e "{slots,instructions,topdown-retiring}:S" -vv -C0 sleep 1

------------------------------------------------------------
perf_event_attr:
  type                             4 (cpu)
  size                             168
  config                           0x400 (slots)
  sample_type                      IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
  read_format                      ID|GROUP|LOST
  disabled                         1
  sample_id_all                    1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
  type                             4 (cpu)
  size                             168
  config                           0x8000 (topdown-retiring)
  { sample_period, sample_freq }   4000
  sample_type                      IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
  read_format                      ID|GROUP|LOST
  freq                             1
  sample_id_all                    1
  exclude_guest                    1
------------------------------------------------------------
sys_perf_event_open: pid -1  cpu 0  group_fd 5  flags 0x8
sys_perf_event_open failed, error -22

Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (topdown-retiring).

The reason of error is that the events are regrouped and
topdown-retiring event is moved to closely after the slots event and
topdown-retiring event needs to do the sampling, but Intel PMU driver
doesn't support to sample topdown metrics events.

For topdown metrics events, it just requires to be in a group which has
slots event as leader. It doesn't require topdown metrics event must be
closely after slots event. Thus it's a overkill to move topdown metrics
event closely after slots event in events regrouping and furtherly cause
the above issue.

Thus delete the code that moving topdown metrics events to fix the
issue.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 tools/perf/arch/x86/util/evlist.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
index b1ce0c52d88d..7215c7c7b435 100644
--- a/tools/perf/arch/x86/util/evlist.c
+++ b/tools/perf/arch/x86/util/evlist.c
@@ -82,11 +82,6 @@ int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
 			return -1;
 		if (strcasestr(rhs->name, "slots") && !strcasestr(rhs->name, "uops_retired.slots"))
 			return 1;
-		/* Followed by topdown events. */
-		if (strcasestr(lhs->name, "topdown") && !strcasestr(rhs->name, "topdown"))
-			return -1;
-		if (!strcasestr(lhs->name, "topdown") && strcasestr(rhs->name, "topdown"))
-			return 1;
 	}
 
 	/* Default ordering by insertion index. */
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 3/4] perf parse-events: Don't move slots event when no topdwon metrics event
  2024-07-02 22:40 [PATCH 0/4] Bug fixes on topdown metrics group leader selection Dapeng Mi
  2024-07-02 22:40 ` [PATCH 1/4] perf topdown: Correct leader selection with sample_read enabled Dapeng Mi
  2024-07-02 22:40 ` [PATCH 2/4] perf parse-events: Don't move topdown metrics events when sorting events Dapeng Mi
@ 2024-07-02 22:40 ` Dapeng Mi
  2024-07-02 18:03   ` Liang, Kan
  2024-07-02 22:40 ` [PATCH 4/4] perf tests: Add leader sampling test in record tests Dapeng Mi
  3 siblings, 1 reply; 12+ messages in thread
From: Dapeng Mi @ 2024-07-02 22:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Kan Liang
  Cc: linux-perf-users, linux-kernel, Yanfei Xu, Dapeng Mi, Dapeng Mi

Currently the helper arch_evlist__cmp() would unconditionally move slots
event to be the leader event even though there is no topdown metrics
event in the group.

perf stat -e "{instructions,slots}" -C 0 sleep 1
WARNING: events were regrouped to match PMUs

 Performance counter stats for 'CPU(s) 0':

        27,581,148      slots
         8,390,827      instructions

       1.045546595 seconds time elapsed

This is an overkill. It's not necessary to move slots event as the leader
event if there is no topdown metrics event.

Thus only regroup events when there are both topdown slots and metrics
events in a group.

Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 tools/perf/arch/x86/util/evlist.c |  6 ++++--
 tools/perf/util/evlist.h          |  7 ++++++-
 tools/perf/util/parse-events.c    | 35 ++++++++++++++++++-------------
 3 files changed, 31 insertions(+), 17 deletions(-)

diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
index 7215c7c7b435..a1e78be6ebd1 100644
--- a/tools/perf/arch/x86/util/evlist.c
+++ b/tools/perf/arch/x86/util/evlist.c
@@ -73,9 +73,11 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
 	return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
 }
 
-int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
+int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs, void *priv)
 {
-	if (topdown_sys_has_perf_metrics() &&
+	struct sort_priv *_priv = priv;
+
+	if (topdown_sys_has_perf_metrics() && _priv->topdown_metrics_in_group &&
 	    (arch_evsel__must_be_in_group(lhs) || arch_evsel__must_be_in_group(rhs))) {
 		/* Ensure the topdown slots comes first. */
 		if (strcasestr(lhs->name, "slots") && !strcasestr(lhs->name, "uops_retired.slots"))
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index cb91dc9117a2..14c858dcf5a2 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -89,6 +89,11 @@ struct evsel_str_handler {
 	void	   *handler;
 };
 
+struct sort_priv {
+	int force_grouped_idx;
+	bool topdown_metrics_in_group;
+};
+
 struct evlist *evlist__new(void);
 struct evlist *evlist__new_default(void);
 struct evlist *evlist__new_dummy(void);
@@ -112,7 +117,7 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
 #define evlist__add_default_attrs(evlist, array) \
 	arch_evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array))
 
-int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs);
+int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs, void *priv);
 
 int evlist__add_dummy(struct evlist *evlist);
 struct evsel *evlist__add_aux_dummy(struct evlist *evlist, bool system_wide);
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 6ed0f9c5581d..a3f7173a7ae2 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1962,19 +1962,21 @@ static int evsel__compute_group_pmu_name(struct evsel *evsel,
 	return evsel->group_pmu_name ? 0 : -ENOMEM;
 }
 
-__weak int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
+__weak int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs,
+			    void *priv __maybe_unused)
 {
 	/* Order by insertion index. */
 	return lhs->core.idx - rhs->core.idx;
 }
 
-static int evlist__cmp(void *_fg_idx, const struct list_head *l, const struct list_head *r)
+static int evlist__cmp(void *_sort_priv, const struct list_head *l, const struct list_head *r)
 {
 	const struct perf_evsel *lhs_core = container_of(l, struct perf_evsel, node);
 	const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
 	const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
 	const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
-	int *force_grouped_idx = _fg_idx;
+	struct sort_priv *sort_priv = _sort_priv;
+	int force_grouped_idx = sort_priv->force_grouped_idx;
 	int lhs_sort_idx, rhs_sort_idx, ret;
 	const char *lhs_pmu_name, *rhs_pmu_name;
 	bool lhs_has_group, rhs_has_group;
@@ -1992,8 +1994,8 @@ static int evlist__cmp(void *_fg_idx, const struct list_head *l, const struct li
 		lhs_sort_idx = lhs_core->leader->idx;
 	} else {
 		lhs_has_group = false;
-		lhs_sort_idx = *force_grouped_idx != -1 && arch_evsel__must_be_in_group(lhs)
-			? *force_grouped_idx
+		lhs_sort_idx = force_grouped_idx != -1 && arch_evsel__must_be_in_group(lhs)
+			? force_grouped_idx
 			: lhs_core->idx;
 	}
 	if (rhs_core->leader != rhs_core || rhs_core->nr_members > 1) {
@@ -2001,8 +2003,8 @@ static int evlist__cmp(void *_fg_idx, const struct list_head *l, const struct li
 		rhs_sort_idx = rhs_core->leader->idx;
 	} else {
 		rhs_has_group = false;
-		rhs_sort_idx = *force_grouped_idx != -1 && arch_evsel__must_be_in_group(rhs)
-			? *force_grouped_idx
+		rhs_sort_idx = force_grouped_idx != -1 && arch_evsel__must_be_in_group(rhs)
+			? force_grouped_idx
 			: rhs_core->idx;
 	}
 
@@ -2019,16 +2021,17 @@ static int evlist__cmp(void *_fg_idx, const struct list_head *l, const struct li
 	}
 
 	/* Architecture specific sorting. */
-	return arch_evlist__cmp(lhs, rhs);
+	return arch_evlist__cmp(lhs, rhs, _sort_priv);
 }
 
 static int parse_events__sort_events_and_fix_groups(struct list_head *list)
 {
-	int idx = 0, force_grouped_idx = -1;
 	struct evsel *pos, *cur_leader = NULL;
 	struct perf_evsel *cur_leaders_grp = NULL;
 	bool idx_changed = false, cur_leader_force_grouped = false;
 	int orig_num_leaders = 0, num_leaders = 0;
+	struct sort_priv sort_priv = {-1, false};
+	int idx = 0;
 	int ret;
 
 	/*
@@ -2053,13 +2056,17 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
 		pos->core.idx = idx++;
 
 		/* Remember an index to sort all forced grouped events together to. */
-		if (force_grouped_idx == -1 && pos == pos_leader && pos->core.nr_members < 2 &&
-		    arch_evsel__must_be_in_group(pos))
-			force_grouped_idx = pos->core.idx;
+		if (sort_priv.force_grouped_idx == -1 && pos == pos_leader &&
+		    pos->core.nr_members < 2 && arch_evsel__must_be_in_group(pos))
+			sort_priv.force_grouped_idx = pos->core.idx;
+
+		if (!sort_priv.topdown_metrics_in_group &&
+		    strcasestr(pos->name, "topdown"))
+			sort_priv.topdown_metrics_in_group = true;
 	}
 
 	/* Sort events. */
-	list_sort(&force_grouped_idx, list, evlist__cmp);
+	list_sort(&sort_priv, list, evlist__cmp);
 
 	/*
 	 * Recompute groups, splitting for PMUs and adding groups for events
@@ -2070,7 +2077,7 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
 		const struct evsel *pos_leader = evsel__leader(pos);
 		const char *pos_pmu_name = pos->group_pmu_name;
 		const char *cur_leader_pmu_name;
-		bool pos_force_grouped = force_grouped_idx != -1 &&
+		bool pos_force_grouped = sort_priv.force_grouped_idx != -1 &&
 			arch_evsel__must_be_in_group(pos);
 
 		/* Reset index and nr_members. */
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 4/4] perf tests: Add leader sampling test in record tests
  2024-07-02 22:40 [PATCH 0/4] Bug fixes on topdown metrics group leader selection Dapeng Mi
                   ` (2 preceding siblings ...)
  2024-07-02 22:40 ` [PATCH 3/4] perf parse-events: Don't move slots event when no topdwon metrics event Dapeng Mi
@ 2024-07-02 22:40 ` Dapeng Mi
  2024-07-02 18:07   ` Liang, Kan
  3 siblings, 1 reply; 12+ messages in thread
From: Dapeng Mi @ 2024-07-02 22:40 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin,
	Kan Liang
  Cc: linux-perf-users, linux-kernel, Yanfei Xu, Dapeng Mi, Dapeng Mi

Add leader sampling test to validate event counts are captured into
record and the count value is consistent.

Suggested-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
---
 tools/perf/tests/shell/record.sh | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh
index 3d1a7759a7b2..8e3e66780fed 100755
--- a/tools/perf/tests/shell/record.sh
+++ b/tools/perf/tests/shell/record.sh
@@ -17,6 +17,7 @@ skip_test_missing_symbol ${testsym}
 
 err=0
 perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
+script_output=$(mktemp /tmp/__perf_test.perf.data.XXXXX.script)
 testprog="perf test -w thloop"
 cpu_pmu_dir="/sys/bus/event_source/devices/cpu*"
 br_cntr_file="/caps/branch_counter_nr"
@@ -190,11 +191,38 @@ test_branch_counter() {
   echo "Basic branch counter test [Success]"
 }
 
+test_leader_sampling() {
+  echo "Basic leader sampling test"
+  if ! perf record -o "${perfdata}" -e "{branches,branches}:Su" perf test -w brstack 2> /dev/null
+  then
+    echo "Leader sampling [Failed record]"
+    err=1
+    return
+  fi
+  index=0
+  perf script -i "${perfdata}" > $script_output
+  while IFS= read -r line
+  do
+    # Check if the two branches counts are equal in each record
+    branches=$(echo $line | awk '{for(i=1;i<=NF;i++) if($i=="branches:") print $(i-1)}')
+    if [ $(($index%2)) -ne 0 ] && [ ${branches}x != ${prev_branches}x ]
+    then
+      echo "Leader sampling [Failed inconsistent branches count]"
+      err=1
+      return
+    fi
+    index=$(($index+1))
+    prev_branches=$branches
+  done < $script_output
+  echo "Basic leader sampling test [Success]"
+}
+
 test_per_thread
 test_register_capture
 test_system_wide
 test_workload
 test_branch_counter
+test_leader_sampling
 
 cleanup
 exit $err
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] perf topdown: Correct leader selection with sample_read enabled
  2024-07-02 16:05   ` Liang, Kan
@ 2024-07-03  2:46     ` Mi, Dapeng
  2024-07-03 13:50       ` Liang, Kan
  0 siblings, 1 reply; 12+ messages in thread
From: Mi, Dapeng @ 2024-07-03  2:46 UTC (permalink / raw)
  To: Liang, Kan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin
  Cc: linux-perf-users, linux-kernel, Yanfei Xu, Dapeng Mi


On 7/3/2024 12:05 AM, Liang, Kan wrote:
>
> On 2024-07-02 6:40 p.m., Dapeng Mi wrote:
>> Addresses an issue where, in the absence of a topdown metrics event
>> within a sampling group, the slots event was incorrectly bypassed as
>> the sampling leader when sample_read was enabled.
>>
>> perf record -e '{slots,branches}:S' -c 10000 -vv sleep 1
>>
>> In this case, the slots event should be sampled as leader but the
>> branches event is sampled in fact like the verbose output shows.
>>
>> perf_event_attr:
>>   type                             4 (cpu)
>>   size                             168
>>   config                           0x400 (slots)
>>   sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
>>   read_format                      ID|GROUP|LOST
>>   disabled                         1
>>   sample_id_all                    1
>>   exclude_guest                    1
>> ------------------------------------------------------------
>> sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
>> ------------------------------------------------------------
>> perf_event_attr:
>>   type                             0 (PERF_TYPE_HARDWARE)
>>   size                             168
>>   config                           0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS)
>>   { sample_period, sample_freq }   10000
>>   sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
>>   read_format                      ID|GROUP|LOST
>>   sample_id_all                    1
>>   exclude_guest                    1
>>
>> The sample period of slots event instead of branches event is reset to
>> 0.
>>
>> This fix ensures the slots event remains the leader under these
>> conditions.
>>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>>  tools/perf/arch/x86/util/topdown.c | 16 ++++++++++++++--
>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
>> index 3f9a267d4501..5d7b78eb7516 100644
>> --- a/tools/perf/arch/x86/util/topdown.c
>> +++ b/tools/perf/arch/x86/util/topdown.c
>> @@ -1,6 +1,7 @@
>>  // SPDX-License-Identifier: GPL-2.0
>>  #include "api/fs/fs.h"
>>  #include "util/evsel.h"
>> +#include "util/evlist.h"
>>  #include "util/pmu.h"
>>  #include "util/pmus.h"
>>  #include "util/topdown.h"
>> @@ -41,11 +42,22 @@ bool topdown_sys_has_perf_metrics(void)
>>   */
>>  bool arch_topdown_sample_read(struct evsel *leader)
>>  {
>> +	struct evsel *event;
>> +
>>  	if (!evsel__sys_has_perf_metrics(leader))
>>  		return false;
>>  
>> -	if (leader->core.attr.config == TOPDOWN_SLOTS)
>> -		return true;
>> +	if (leader->core.attr.config != TOPDOWN_SLOTS)
>> +		return false;
>> +
>> +	/*
>> +	 * If slots event as leader event but no topdown metric events in group,
>> +	 * slots event should still sample as leader.
>> +	 */
>> +	evlist__for_each_entry(leader->evlist, event) {
>> +		if (event != leader && strcasestr(event->name, "topdown"))
> User may uses the RAW format. It may not be good enough to just check
> the event name.
>
> I recall you have a complete support for this in the previous patch. Why
> drop it?


Oh, I ignored the RAW format case. Yes, there is a complete comparison in
previous patch, but I originally thought it's over-complicated, so I just
simplified it (refer other helpers to compare the name).  If we need to
consider the RAW format, it may be not correct for the comparisons in the
helpers arch_evsel__must_be_in_group() and arch_evlist__cmp() as well.

If we want to fix the issue thoroughly, we may have to expose two helpers
which check if an event is topdown slots or metrics event and use these two
helpers to replace current name comparison.

>
> Thanks,
> Kan
>
>> +			return true;
>> +	}
>>  
>>  	return false;
>>  }

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 3/4] perf parse-events: Don't move slots event when no topdwon metrics event
  2024-07-02 18:03   ` Liang, Kan
@ 2024-07-03  2:51     ` Mi, Dapeng
  0 siblings, 0 replies; 12+ messages in thread
From: Mi, Dapeng @ 2024-07-03  2:51 UTC (permalink / raw)
  To: Liang, Kan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin
  Cc: linux-perf-users, linux-kernel, Yanfei Xu, Dapeng Mi


On 7/3/2024 2:03 AM, Liang, Kan wrote:
>
> On 2024-07-02 6:40 p.m., Dapeng Mi wrote:
>> Currently the helper arch_evlist__cmp() would unconditionally move slots
>> event to be the leader event even though there is no topdown metrics
>> event in the group.
>>
>> perf stat -e "{instructions,slots}" -C 0 sleep 1
>> WARNING: events were regrouped to match PMUs
>>
>>  Performance counter stats for 'CPU(s) 0':
>>
>>         27,581,148      slots
>>          8,390,827      instructions
>>
>>        1.045546595 seconds time elapsed
>>
>> This is an overkill. It's not necessary to move slots event as the leader
>> event if there is no topdown metrics event.
>>
>> Thus only regroup events when there are both topdown slots and metrics
>> events in a group.
>>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>>  tools/perf/arch/x86/util/evlist.c |  6 ++++--
>>  tools/perf/util/evlist.h          |  7 ++++++-
>>  tools/perf/util/parse-events.c    | 35 ++++++++++++++++++-------------
>>  3 files changed, 31 insertions(+), 17 deletions(-)
>>
>> diff --git a/tools/perf/arch/x86/util/evlist.c b/tools/perf/arch/x86/util/evlist.c
>> index 7215c7c7b435..a1e78be6ebd1 100644
>> --- a/tools/perf/arch/x86/util/evlist.c
>> +++ b/tools/perf/arch/x86/util/evlist.c
>> @@ -73,9 +73,11 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
>>  	return ___evlist__add_default_attrs(evlist, attrs, nr_attrs);
>>  }
>>  
>> -int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
>> +int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs, void *priv)
>>  {
>> -	if (topdown_sys_has_perf_metrics() &&
>> +	struct sort_priv *_priv = priv;
>> +
>> +	if (topdown_sys_has_perf_metrics() && _priv->topdown_metrics_in_group &&
>>  	    (arch_evsel__must_be_in_group(lhs) || arch_evsel__must_be_in_group(rhs))) {
>>  		/* Ensure the topdown slots comes first. */
>>  		if (strcasestr(lhs->name, "slots") && !strcasestr(lhs->name, "uops_retired.slots"))
>> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
>> index cb91dc9117a2..14c858dcf5a2 100644
>> --- a/tools/perf/util/evlist.h
>> +++ b/tools/perf/util/evlist.h
>> @@ -89,6 +89,11 @@ struct evsel_str_handler {
>>  	void	   *handler;
>>  };
>>  
>> +struct sort_priv {
>> +	int force_grouped_idx;
>> +	bool topdown_metrics_in_group;
> The topdown metrics should be only available in some Intel platforms. I
> don't think we want to add such platform-specific variable in the
> generic code.
>
> The current code just re-order the events, not re-group. So it doesn't
> impact the result accuracy.
> So the issue is just an annoying WARNING, right?
>
> It seems the issue has been there for more than 1 year. No complaints
> except for one internal test case, which can be easily fixed.
> Considering the complexity of the fix, I guess we may leave it as is.

I was also hesitating on whether posting the patch. Not sure if other tools
depends on the events output order. If not, it's just an annoying WARNING.
But I think there should no other tools depending on it as you said no any
complaints on this.

I would drop this patch in next version.


>
> Thanks,
> Kan
>> +};
>> +
>>  struct evlist *evlist__new(void);
>>  struct evlist *evlist__new_default(void);
>>  struct evlist *evlist__new_dummy(void);
>> @@ -112,7 +117,7 @@ int arch_evlist__add_default_attrs(struct evlist *evlist,
>>  #define evlist__add_default_attrs(evlist, array) \
>>  	arch_evlist__add_default_attrs(evlist, array, ARRAY_SIZE(array))
>>  
>> -int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs);
>> +int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs, void *priv);
>>  
>>  int evlist__add_dummy(struct evlist *evlist);
>>  struct evsel *evlist__add_aux_dummy(struct evlist *evlist, bool system_wide);
>> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
>> index 6ed0f9c5581d..a3f7173a7ae2 100644
>> --- a/tools/perf/util/parse-events.c
>> +++ b/tools/perf/util/parse-events.c
>> @@ -1962,19 +1962,21 @@ static int evsel__compute_group_pmu_name(struct evsel *evsel,
>>  	return evsel->group_pmu_name ? 0 : -ENOMEM;
>>  }
>>  
>> -__weak int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs)
>> +__weak int arch_evlist__cmp(const struct evsel *lhs, const struct evsel *rhs,
>> +			    void *priv __maybe_unused)
>>  {
>>  	/* Order by insertion index. */
>>  	return lhs->core.idx - rhs->core.idx;
>>  }
>>  
>> -static int evlist__cmp(void *_fg_idx, const struct list_head *l, const struct list_head *r)
>> +static int evlist__cmp(void *_sort_priv, const struct list_head *l, const struct list_head *r)
>>  {
>>  	const struct perf_evsel *lhs_core = container_of(l, struct perf_evsel, node);
>>  	const struct evsel *lhs = container_of(lhs_core, struct evsel, core);
>>  	const struct perf_evsel *rhs_core = container_of(r, struct perf_evsel, node);
>>  	const struct evsel *rhs = container_of(rhs_core, struct evsel, core);
>> -	int *force_grouped_idx = _fg_idx;
>> +	struct sort_priv *sort_priv = _sort_priv;
>> +	int force_grouped_idx = sort_priv->force_grouped_idx;
>>  	int lhs_sort_idx, rhs_sort_idx, ret;
>>  	const char *lhs_pmu_name, *rhs_pmu_name;
>>  	bool lhs_has_group, rhs_has_group;
>> @@ -1992,8 +1994,8 @@ static int evlist__cmp(void *_fg_idx, const struct list_head *l, const struct li
>>  		lhs_sort_idx = lhs_core->leader->idx;
>>  	} else {
>>  		lhs_has_group = false;
>> -		lhs_sort_idx = *force_grouped_idx != -1 && arch_evsel__must_be_in_group(lhs)
>> -			? *force_grouped_idx
>> +		lhs_sort_idx = force_grouped_idx != -1 && arch_evsel__must_be_in_group(lhs)
>> +			? force_grouped_idx
>>  			: lhs_core->idx;
>>  	}
>>  	if (rhs_core->leader != rhs_core || rhs_core->nr_members > 1) {
>> @@ -2001,8 +2003,8 @@ static int evlist__cmp(void *_fg_idx, const struct list_head *l, const struct li
>>  		rhs_sort_idx = rhs_core->leader->idx;
>>  	} else {
>>  		rhs_has_group = false;
>> -		rhs_sort_idx = *force_grouped_idx != -1 && arch_evsel__must_be_in_group(rhs)
>> -			? *force_grouped_idx
>> +		rhs_sort_idx = force_grouped_idx != -1 && arch_evsel__must_be_in_group(rhs)
>> +			? force_grouped_idx
>>  			: rhs_core->idx;
>>  	}
>>  
>> @@ -2019,16 +2021,17 @@ static int evlist__cmp(void *_fg_idx, const struct list_head *l, const struct li
>>  	}
>>  
>>  	/* Architecture specific sorting. */
>> -	return arch_evlist__cmp(lhs, rhs);
>> +	return arch_evlist__cmp(lhs, rhs, _sort_priv);
>>  }
>>  
>>  static int parse_events__sort_events_and_fix_groups(struct list_head *list)
>>  {
>> -	int idx = 0, force_grouped_idx = -1;
>>  	struct evsel *pos, *cur_leader = NULL;
>>  	struct perf_evsel *cur_leaders_grp = NULL;
>>  	bool idx_changed = false, cur_leader_force_grouped = false;
>>  	int orig_num_leaders = 0, num_leaders = 0;
>> +	struct sort_priv sort_priv = {-1, false};
>> +	int idx = 0;
>>  	int ret;
>>  
>>  	/*
>> @@ -2053,13 +2056,17 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
>>  		pos->core.idx = idx++;
>>  
>>  		/* Remember an index to sort all forced grouped events together to. */
>> -		if (force_grouped_idx == -1 && pos == pos_leader && pos->core.nr_members < 2 &&
>> -		    arch_evsel__must_be_in_group(pos))
>> -			force_grouped_idx = pos->core.idx;
>> +		if (sort_priv.force_grouped_idx == -1 && pos == pos_leader &&
>> +		    pos->core.nr_members < 2 && arch_evsel__must_be_in_group(pos))
>> +			sort_priv.force_grouped_idx = pos->core.idx;
>> +
>> +		if (!sort_priv.topdown_metrics_in_group &&
>> +		    strcasestr(pos->name, "topdown"))
>> +			sort_priv.topdown_metrics_in_group = true;
>>  	}
>>  
>>  	/* Sort events. */
>> -	list_sort(&force_grouped_idx, list, evlist__cmp);
>> +	list_sort(&sort_priv, list, evlist__cmp);
>>  
>>  	/*
>>  	 * Recompute groups, splitting for PMUs and adding groups for events
>> @@ -2070,7 +2077,7 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list)
>>  		const struct evsel *pos_leader = evsel__leader(pos);
>>  		const char *pos_pmu_name = pos->group_pmu_name;
>>  		const char *cur_leader_pmu_name;
>> -		bool pos_force_grouped = force_grouped_idx != -1 &&
>> +		bool pos_force_grouped = sort_priv.force_grouped_idx != -1 &&
>>  			arch_evsel__must_be_in_group(pos);
>>  
>>  		/* Reset index and nr_members. */

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 4/4] perf tests: Add leader sampling test in record tests
  2024-07-02 18:07   ` Liang, Kan
@ 2024-07-03  2:53     ` Mi, Dapeng
  0 siblings, 0 replies; 12+ messages in thread
From: Mi, Dapeng @ 2024-07-03  2:53 UTC (permalink / raw)
  To: Liang, Kan, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin
  Cc: linux-perf-users, linux-kernel, Yanfei Xu, Dapeng Mi


On 7/3/2024 2:07 AM, Liang, Kan wrote:
>
> On 2024-07-02 6:40 p.m., Dapeng Mi wrote:
>> Add leader sampling test to validate event counts are captured into
>> record and the count value is consistent.
>>
>> Suggested-by: Kan Liang <kan.liang@linux.intel.com>
>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>> ---
>>  tools/perf/tests/shell/record.sh | 28 ++++++++++++++++++++++++++++
>>  1 file changed, 28 insertions(+)
>>
>> diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh
>> index 3d1a7759a7b2..8e3e66780fed 100755
>> --- a/tools/perf/tests/shell/record.sh
>> +++ b/tools/perf/tests/shell/record.sh
>> @@ -17,6 +17,7 @@ skip_test_missing_symbol ${testsym}
>>  
>>  err=0
>>  perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX)
>> +script_output=$(mktemp /tmp/__perf_test.perf.data.XXXXX.script)
>>  testprog="perf test -w thloop"
>>  cpu_pmu_dir="/sys/bus/event_source/devices/cpu*"
>>  br_cntr_file="/caps/branch_counter_nr"
>> @@ -190,11 +191,38 @@ test_branch_counter() {
>>    echo "Basic branch counter test [Success]"
>>  }
>>  
>> +test_leader_sampling() {
>> +  echo "Basic leader sampling test"
>> +  if ! perf record -o "${perfdata}" -e "{branches,branches}:Su" perf test -w brstack 2> /dev/null
> I think we still need a case to verify the topdown fix you did in the
> patch set.
> Since the topdown is platform specific, you may want to skip it on the
> platform which doesn't support perf metrics. For example, check if the
> slots event exists in the event folder.

Yeah, I would look how to design a test case to verify the topdown case.


>
> Thanks,
> Kan
>> +  then
>> +    echo "Leader sampling [Failed record]"
>> +    err=1
>> +    return
>> +  fi
>> +  index=0
>> +  perf script -i "${perfdata}" > $script_output
>> +  while IFS= read -r line
>> +  do
>> +    # Check if the two branches counts are equal in each record
>> +    branches=$(echo $line | awk '{for(i=1;i<=NF;i++) if($i=="branches:") print $(i-1)}')
>> +    if [ $(($index%2)) -ne 0 ] && [ ${branches}x != ${prev_branches}x ]
>> +    then
>> +      echo "Leader sampling [Failed inconsistent branches count]"
>> +      err=1
>> +      return
>> +    fi
>> +    index=$(($index+1))
>> +    prev_branches=$branches
>> +  done < $script_output
>> +  echo "Basic leader sampling test [Success]"
>> +}
>> +
>>  test_per_thread
>>  test_register_capture
>>  test_system_wide
>>  test_workload
>>  test_branch_counter
>> +test_leader_sampling
>>  
>>  cleanup
>>  exit $err

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/4] perf topdown: Correct leader selection with sample_read enabled
  2024-07-03  2:46     ` Mi, Dapeng
@ 2024-07-03 13:50       ` Liang, Kan
  0 siblings, 0 replies; 12+ messages in thread
From: Liang, Kan @ 2024-07-03 13:50 UTC (permalink / raw)
  To: Mi, Dapeng, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	Namhyung Kim, Ian Rogers, Adrian Hunter, Alexander Shishkin
  Cc: linux-perf-users, linux-kernel, Yanfei Xu, Dapeng Mi



On 2024-07-02 10:46 p.m., Mi, Dapeng wrote:
> 
> On 7/3/2024 12:05 AM, Liang, Kan wrote:
>>
>> On 2024-07-02 6:40 p.m., Dapeng Mi wrote:
>>> Addresses an issue where, in the absence of a topdown metrics event
>>> within a sampling group, the slots event was incorrectly bypassed as
>>> the sampling leader when sample_read was enabled.
>>>
>>> perf record -e '{slots,branches}:S' -c 10000 -vv sleep 1
>>>
>>> In this case, the slots event should be sampled as leader but the
>>> branches event is sampled in fact like the verbose output shows.
>>>
>>> perf_event_attr:
>>>   type                             4 (cpu)
>>>   size                             168
>>>   config                           0x400 (slots)
>>>   sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
>>>   read_format                      ID|GROUP|LOST
>>>   disabled                         1
>>>   sample_id_all                    1
>>>   exclude_guest                    1
>>> ------------------------------------------------------------
>>> sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8 = 5
>>> ------------------------------------------------------------
>>> perf_event_attr:
>>>   type                             0 (PERF_TYPE_HARDWARE)
>>>   size                             168
>>>   config                           0x4 (PERF_COUNT_HW_BRANCH_INSTRUCTIONS)
>>>   { sample_period, sample_freq }   10000
>>>   sample_type                      IP|TID|TIME|READ|CPU|IDENTIFIER
>>>   read_format                      ID|GROUP|LOST
>>>   sample_id_all                    1
>>>   exclude_guest                    1
>>>
>>> The sample period of slots event instead of branches event is reset to
>>> 0.
>>>
>>> This fix ensures the slots event remains the leader under these
>>> conditions.
>>>
>>> Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
>>> ---
>>>  tools/perf/arch/x86/util/topdown.c | 16 ++++++++++++++--
>>>  1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/tools/perf/arch/x86/util/topdown.c b/tools/perf/arch/x86/util/topdown.c
>>> index 3f9a267d4501..5d7b78eb7516 100644
>>> --- a/tools/perf/arch/x86/util/topdown.c
>>> +++ b/tools/perf/arch/x86/util/topdown.c
>>> @@ -1,6 +1,7 @@
>>>  // SPDX-License-Identifier: GPL-2.0
>>>  #include "api/fs/fs.h"
>>>  #include "util/evsel.h"
>>> +#include "util/evlist.h"
>>>  #include "util/pmu.h"
>>>  #include "util/pmus.h"
>>>  #include "util/topdown.h"
>>> @@ -41,11 +42,22 @@ bool topdown_sys_has_perf_metrics(void)
>>>   */
>>>  bool arch_topdown_sample_read(struct evsel *leader)
>>>  {
>>> +	struct evsel *event;
>>> +
>>>  	if (!evsel__sys_has_perf_metrics(leader))
>>>  		return false;
>>>  
>>> -	if (leader->core.attr.config == TOPDOWN_SLOTS)
>>> -		return true;
>>> +	if (leader->core.attr.config != TOPDOWN_SLOTS)
>>> +		return false;
>>> +
>>> +	/*
>>> +	 * If slots event as leader event but no topdown metric events in group,
>>> +	 * slots event should still sample as leader.
>>> +	 */
>>> +	evlist__for_each_entry(leader->evlist, event) {
>>> +		if (event != leader && strcasestr(event->name, "topdown"))
>> User may uses the RAW format. It may not be good enough to just check
>> the event name.
>>
>> I recall you have a complete support for this in the previous patch. Why
>> drop it?
> 
> 
> Oh, I ignored the RAW format case. Yes, there is a complete comparison in
> previous patch, but I originally thought it's over-complicated, so I just
> simplified it (refer other helpers to compare the name).  If we need to
> consider the RAW format, it may be not correct for the comparisons in the
> helpers arch_evsel__must_be_in_group() and arch_evlist__cmp() as well.
>

Right, those need to be fixed as well.

> If we want to fix the issue thoroughly, we may have to expose two helpers
> which check if an event is topdown slots or metrics event and use these two
> helpers to replace current name comparison.

Yes, you may have to add an extra patch to introduce the two helper
functions and replace the existing function.

Thanks,
Kan
> 
>>
>> Thanks,
>> Kan
>>
>>> +			return true;
>>> +	}
>>>  
>>>  	return false;
>>>  }

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-07-03 13:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-02 22:40 [PATCH 0/4] Bug fixes on topdown metrics group leader selection Dapeng Mi
2024-07-02 22:40 ` [PATCH 1/4] perf topdown: Correct leader selection with sample_read enabled Dapeng Mi
2024-07-02 16:05   ` Liang, Kan
2024-07-03  2:46     ` Mi, Dapeng
2024-07-03 13:50       ` Liang, Kan
2024-07-02 22:40 ` [PATCH 2/4] perf parse-events: Don't move topdown metrics events when sorting events Dapeng Mi
2024-07-02 22:40 ` [PATCH 3/4] perf parse-events: Don't move slots event when no topdwon metrics event Dapeng Mi
2024-07-02 18:03   ` Liang, Kan
2024-07-03  2:51     ` Mi, Dapeng
2024-07-02 22:40 ` [PATCH 4/4] perf tests: Add leader sampling test in record tests Dapeng Mi
2024-07-02 18:07   ` Liang, Kan
2024-07-03  2:53     ` Mi, Dapeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).