[PATCH v7 1/4] Create source symlink in perf object dir

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v7 1/4] Create source symlink in perf object dir
@ 2024-07-24 19:01 Andi Kleen
  2024-07-24 19:01 ` [PATCH v7 2/4] perf test: Support external tests for separate objdir Andi Kleen
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Andi Kleen @ 2024-07-24 19:01 UTC (permalink / raw)
  To: linux-perf-users; +Cc: Andi Kleen

Create a source symlink to the original source in the objdir.
This is similar to what the main kernel build script does.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Makefile.perf | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 175e4c7898f0..d46892d8223b 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -163,6 +163,8 @@ ifneq ($(OUTPUT),)
 # for flex/bison parsers.
 VPATH += $(OUTPUT)
 export VPATH
+# create symlink to the original source
+SOURCE := $(shell ln -sf $(srctree)/tools/perf $(OUTPUT)/source)
 endif
 
 ifeq ($(V),1)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v7 2/4] perf test: Support external tests for separate objdir
  2024-07-24 19:01 [PATCH v7 1/4] Create source symlink in perf object dir Andi Kleen
@ 2024-07-24 19:01 ` Andi Kleen
  2024-07-26  0:07   ` Namhyung Kim
  2024-07-24 19:01 ` [PATCH v7 3/4] perf script: Fix perf script -F +metric Andi Kleen
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2024-07-24 19:01 UTC (permalink / raw)
  To: linux-perf-users; +Cc: Andi Kleen

Extend the searching for the test files so that it works
when running perf from a separate objdir, and also when
the perf executable is symlinked.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/tests/tests-scripts.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/tools/perf/tests/tests-scripts.c b/tools/perf/tests/tests-scripts.c
index e2042b368269..63be17289ac3 100644
--- a/tools/perf/tests/tests-scripts.c
+++ b/tools/perf/tests/tests-scripts.c
@@ -29,16 +29,41 @@
 
 static int shell_tests__dir_fd(void)
 {
-	char path[PATH_MAX], *exec_path;
-	static const char * const devel_dirs[] = { "./tools/perf/tests/shell", "./tests/shell", };
+	struct stat st;
+	char path[PATH_MAX], path2[PATH_MAX], *exec_path;
+	static const char * const devel_dirs[] = {
+		"./tools/perf/tests/shell",
+		"./tests/shell",
+		"./source/tests/shell"
+	};
+	int fd;
+	char *p;
 
 	for (size_t i = 0; i < ARRAY_SIZE(devel_dirs); ++i) {
-		int fd = open(devel_dirs[i], O_PATH);
+		fd = open(devel_dirs[i], O_PATH);
 
 		if (fd >= 0)
 			return fd;
 	}
 
+	/* Use directory of executable */
+	if (readlink("/proc/self/exe", path2, sizeof path2) < 0)
+		return -1;
+	/* Follow another level of symlink if there */
+	if (lstat(path2, &st) == 0 && (st.st_mode & S_IFMT) == S_IFLNK) {
+		scnprintf(path, sizeof(path), path2);
+		if (readlink(path, path2, sizeof path2) < 0)
+			return -1;
+	}
+	/* Get directory */
+	p = strrchr(path2, '/');
+	if (*p)
+		p[1] = 0;
+	scnprintf(path, sizeof(path), "%s/tests/shell", path2);
+	fd = open(path, O_PATH);
+	if (fd >= 0)
+		return fd;
+
 	/* Then installed path. */
 	exec_path = get_argv_exec_path();
 	scnprintf(path, sizeof(path), "%s/tests/shell", exec_path);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v7 3/4] perf script: Fix perf script -F +metric
  2024-07-24 19:01 [PATCH v7 1/4] Create source symlink in perf object dir Andi Kleen
  2024-07-24 19:01 ` [PATCH v7 2/4] perf test: Support external tests for separate objdir Andi Kleen
@ 2024-07-24 19:01 ` Andi Kleen
  2024-07-26  0:31   ` Namhyung Kim
  2024-07-24 19:01 ` [PATCH v7 4/4] Add a test case for " Andi Kleen
  2024-07-24 20:29 ` [PATCH v7 1/4] Create source symlink in perf object dir Ian Rogers
  3 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2024-07-24 19:01 UTC (permalink / raw)
  To: linux-perf-users; +Cc: Andi Kleen

This fixes a regression with perf script -F +metric originally caused by :

commit 37cc8ad77cf81f3ffd226856c367b0e15333a738
Author: Ian Rogers <irogers@google.com>
Date:   Sun Feb 19 01:28:46 2023 -0800

    perf metric: Directly use counts rather than saved_value

In the perf script environment the evsel wouldn't allocate an aggr
values array, which led to a -1 reference because the metric
evaluation would try to reference NULL - 1 (for aggr_idx)

Give the perf script evsels a single CPU aggr setup. That's
enough because the groups are always contiguous, so no need
to store more than one CPU's worth of values.

Before

% perf record -e '{cycles,instructions}:S' perf bench  mem memcpy
% perf script -F +metric
Segmentation fault (core dumped)

After:

% perf record -e '{cycles,instructions}:S' perf bench  mem memcpy
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.028 MB perf.data (90 samples) ]
% perf script -F +metric
       perf-exec 1847557 264658.180789:       3009       cycles:  ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms])
       perf-exec 1847557 264658.180789:        382 instructions:  ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms])
       perf-exec 1847557 264658.180789:         metric:    0.13  insn per cycle
...

Fixes: 37cc8ad77cf8 ("perf metric: Directly use counts rather ...")
Signed-off-by: Andi Kleen <ak@linux.intel.com>

----

v2: Reformat code
v3: Work around bogus warning
v4: Set up aggr map only for metrics case to keep perf stat record
working
v5: Broken version
v6: Only set up limited aggregation mode with -F +metric. Add conflict
checks with perf stat record files.
v7: Remove some unnecessary conflict checks. Fix buffer overflow. Minor cleanups.
---
 tools/perf/builtin-script.c | 42 ++++++++++++++++++++++++++++++++-----
 1 file changed, 37 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index c16224b1fef3..8058bb19a956 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -335,7 +335,6 @@ struct evsel_script {
        FILE *fp;
        u64  samples;
        /* For metric output */
-       u64  val;
        int  gnum;
 };
 
@@ -2132,13 +2131,17 @@ static void perf_sample__fprint_metric(struct perf_script *script,
 		evlist__alloc_stats(&stat_config, script->session->evlist, /*alloc_raw=*/false);
 	if (evsel_script(leader)->gnum++ == 0)
 		perf_stat__reset_shadow_stats();
-	val = sample->period * evsel->scale;
-	evsel_script(evsel)->val = val;
+	val = sample->period;
+	/*
+	 * Always use the first storage because the groups are contiguous
+	 * and there's no need to handle multiple indexes for anything
+	 */
+	evsel->stats->aggr[0].counts.val = val;
 	if (evsel_script(leader)->gnum == leader->core.nr_members) {
 		for_each_group_member (ev2, leader) {
 			perf_stat__print_shadow_stats(&stat_config, ev2,
-						      evsel_script(ev2)->val,
-						      sample->cpu,
+						      evsel->stats->aggr[0].counts.val,
+						      0,
 						      &ctx,
 						      NULL);
 		}
@@ -2325,6 +2328,20 @@ static void process_event(struct perf_script *script,
 		fflush(fp);
 }
 
+static void check_metric_conflict(void)
+{
+	int i;
+	/*
+	 * Avoid conflict with the aggregation mode used for the metric printing.
+	 */
+	for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
+		if (output[i].fields & PERF_OUTPUT_METRIC) {
+			fprintf(stderr, "perf stat record files are not supported with -F metric\n");
+			exit(1);
+		}
+	}
+}
+
 static struct scripting_ops	*scripting_ops;
 
 static void __process_stat(struct evsel *counter, u64 tstamp)
@@ -2334,6 +2351,8 @@ static void __process_stat(struct evsel *counter, u64 tstamp)
 	struct perf_cpu cpu;
 	static int header_printed;
 
+	check_metric_conflict();
+
 	if (!header_printed) {
 		printf("%3s %8s %15s %15s %15s %15s %s\n",
 		       "CPU", "THREAD", "VAL", "ENA", "RUN", "TIME", "EVENT");
@@ -3725,6 +3744,8 @@ static int process_stat_config_event(struct perf_session *session __maybe_unused
 {
 	perf_event__read_stat_config(&stat_config, &event->stat_config);
 
+	check_metric_conflict();
+
 	/*
 	 * Aggregation modes are not used since post-processing scripts are
 	 * supposed to take care of such requirements
@@ -4088,6 +4109,17 @@ int cmd_script(int argc, const char **argv)
 
 	argc = parse_options_subcommand(argc, argv, options, script_subcommands, script_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
+	for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
+		if (output[i].fields & PERF_OUTPUT_METRIC) {
+			stat_config.aggr_map = cpu_aggr_map__empty_new(1);
+			err = -ENOMEM;
+			if (!stat_config.aggr_map)
+				goto out;
+			err = 0;
+			stat_config.aggr_map->nr = 1;
+			break;
+		}
+	}
 
 	if (symbol_conf.guestmount ||
 	    symbol_conf.default_guest_vmlinux_name ||
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v7 4/4] Add a test case for perf script -F +metric
  2024-07-24 19:01 [PATCH v7 1/4] Create source symlink in perf object dir Andi Kleen
  2024-07-24 19:01 ` [PATCH v7 2/4] perf test: Support external tests for separate objdir Andi Kleen
  2024-07-24 19:01 ` [PATCH v7 3/4] perf script: Fix perf script -F +metric Andi Kleen
@ 2024-07-24 19:01 ` Andi Kleen
  2024-07-26  0:32   ` Namhyung Kim
  2024-07-24 20:29 ` [PATCH v7 1/4] Create source symlink in perf object dir Ian Rogers
  3 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2024-07-24 19:01 UTC (permalink / raw)
  To: linux-perf-users; +Cc: Andi Kleen

Just a simple test

Signed-off-by: Andi Kleen <ak@linux.intel.com>

----

v2: Avoid bashisms. Use noploop
v3: Avoid false positive in shellcheck
---
 tools/perf/tests/shell/script.sh | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tools/perf/tests/shell/script.sh b/tools/perf/tests/shell/script.sh
index c1a603653662..5e080e40b390 100755
--- a/tools/perf/tests/shell/script.sh
+++ b/tools/perf/tests/shell/script.sh
@@ -7,6 +7,7 @@ set -e
 temp_dir=$(mktemp -d /tmp/perf-test-script.XXXXXXXXXX)
 
 perfdatafile="${temp_dir}/perf.data"
+scriptoutput="${temp_dir}/script"
 db_test="${temp_dir}/db_test.py"
 
 err=0
@@ -88,8 +89,21 @@ test_parallel_perf()
 	echo "parallel-perf test [Success]"
 }
 
+test_metric()
+{
+	echo "script metric test"
+	if ! perf list | grep -q cycles ; then return ; fi
+	if ! perf list | grep -q instructions ; then return ; fi
+	perf record -e '{cycles,instructions}' -o "${perfdatafile}" perf test -w noploop
+	perf script -i "${perfdatafile}" -F +metric  > $scriptoutput
+	test "`grep -c metric $scriptoutput`" -gt 5
+	grep metric $scriptoutput | head
+	echo "script metric test [Success]"
+}
+
 test_db
 test_parallel_perf
+test_metric
 
 cleanup
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/4] Create source symlink in perf object dir
  2024-07-24 19:01 [PATCH v7 1/4] Create source symlink in perf object dir Andi Kleen
                   ` (2 preceding siblings ...)
  2024-07-24 19:01 ` [PATCH v7 4/4] Add a test case for " Andi Kleen
@ 2024-07-24 20:29 ` Ian Rogers
  2024-07-24 21:48   ` Andi Kleen
  3 siblings, 1 reply; 18+ messages in thread
From: Ian Rogers @ 2024-07-24 20:29 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

On Wed, Jul 24, 2024 at 12:01 PM Andi Kleen <ak@linux.intel.com> wrote:
>
> Create a source symlink to the original source in the objdir.
> This is similar to what the main kernel build script does.
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>

For all patches:
Acked-by: Ian Rogers <irogers@google.com>

For patch 3/4 the aggregation logic doesn't look to work well with the
patches I sent:
https://lore.kernel.org/lkml/20240720074552.1915993-1-irogers@google.com/
I'm not a fan of it, but I don't really understand the aggregation
metric logic here - periods of different samples from potentially
different CPUs being combined as if they are counts, zeroing of the
counts.. I'm not going to rebase my changes on these, and if later
code removes the hard coded metrics for json metrics then this code
will break at which point it is reasonable I think to disable the
test.

Thanks,
Ian


> ---
>  tools/perf/Makefile.perf | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
> index 175e4c7898f0..d46892d8223b 100644
> --- a/tools/perf/Makefile.perf
> +++ b/tools/perf/Makefile.perf
> @@ -163,6 +163,8 @@ ifneq ($(OUTPUT),)
>  # for flex/bison parsers.
>  VPATH += $(OUTPUT)
>  export VPATH
> +# create symlink to the original source
> +SOURCE := $(shell ln -sf $(srctree)/tools/perf $(OUTPUT)/source)
>  endif
>
>  ifeq ($(V),1)
> --
> 2.45.2
>
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/4] Create source symlink in perf object dir
  2024-07-24 20:29 ` [PATCH v7 1/4] Create source symlink in perf object dir Ian Rogers
@ 2024-07-24 21:48   ` Andi Kleen
  2024-07-24 22:31     ` Ian Rogers
  0 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2024-07-24 21:48 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

> For patch 3/4 the aggregation logic doesn't look to work well with the
> patches I sent:
> https://lore.kernel.org/lkml/20240720074552.1915993-1-irogers@google.com/
> I'm not a fan of it, but I don't really understand the aggregation
> metric logic here - periods of different samples from potentially

It's a single contiguous group, so it is by definition only from one CPU.

> different CPUs being combined as if they are counts, zeroing of the
> counts.. I'm not going to rebase my changes on these, and if later

On some reflection your changes went into the wrong direction
because it ignored the (useful) single group property. Aggregation
on time really needs to be in perf report not here.

> code removes the hard coded metrics for json metrics then this code
> will break at which point it is reasonable I think to disable the
> test.

I don't think it's up to you to unilaterally deprecate features.

But that's a good point, the json metrics got broken too :-( It worked
in the original version. You really were a wrecking ball here with
all these regressions on stuff you don't use, Ian.

I guess that needs to be fixed too, but that can be a separate patch.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/4] Create source symlink in perf object dir
  2024-07-24 21:48   ` Andi Kleen
@ 2024-07-24 22:31     ` Ian Rogers
  2024-07-25  7:28       ` Andi Kleen
  0 siblings, 1 reply; 18+ messages in thread
From: Ian Rogers @ 2024-07-24 22:31 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

On Wed, Jul 24, 2024 at 2:49 PM Andi Kleen <ak@linux.intel.com> wrote:
>
> > For patch 3/4 the aggregation logic doesn't look to work well with the
> > patches I sent:
> > https://lore.kernel.org/lkml/20240720074552.1915993-1-irogers@google.com/
> > I'm not a fan of it, but I don't really understand the aggregation
> > metric logic here - periods of different samples from potentially
>
> It's a single contiguous group, so it is by definition only from one CPU.
>
> > different CPUs being combined as if they are counts, zeroing of the
> > counts.. I'm not going to rebase my changes on these, and if later
>
> On some reflection your changes went into the wrong direction
> because it ignored the (useful) single group property. Aggregation
> on time really needs to be in perf report not here.

Still don't understand this. You are sampling, that's why you use the
period, how can both samples happen at the same time? Why can't
something sample twice in the period where 1 thing samples? You assign
to the counter not accumulate in it, but the whole gnum would be
broken in that case. Why can't samples be interleaved? What I did was
read the samples with leader samples, which is clearly a correct way
to read counts at a moment in time and compute an accurate metric. But
hey, stuff everything in whatever is aggregated at index 0 and claim
success. Fingers crossed it doesn't break with the metric display
code. It pretty much seems to me that what you have done is unusable,
but it replicates the previous unusable thing so great.

> > code removes the hard coded metrics for json metrics then this code
> > will break at which point it is reasonable I think to disable the
> > test.
>
> I don't think it's up to you to unilaterally deprecate features.

Agreed. At the same time if the code never worked except to give some
kind of broken number - this is all your test is adding.

> But that's a good point, the json metrics got broken too :-( It worked
> in the original version. You really were a wrecking ball here with
> all these regressions on stuff you don't use, Ian.

Uh, when I started working on metrics Jiri had switched the double
values to ints and broken every metric and nobody noticed. Any idea
I've broken metrics is false except for the hard coded metrics where
aside from trivial IPC the metrics were wrong by the TMA definitions.
The hard coded metrics were also broken in areas like interval,
replay, .. as counts were duplicated for no benefit. We now have 100s
of functional metrics, support on hybrid, .. but if that's bringing a
wrecking ball then so be it. I've wanted to remove the hard coded
metrics as I get constant user complaints that they fire inaccurately
(ungrouped events), for no reason (if the events hit then the metric
fires) and break the json metric output (expect output from 1 metric
but get x).

Thanks,
Ian

> I guess that needs to be fixed too, but that can be a separate patch.
>
> -Andi
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/4] Create source symlink in perf object dir
  2024-07-24 22:31     ` Ian Rogers
@ 2024-07-25  7:28       ` Andi Kleen
  2024-07-25  9:18         ` Ian Rogers
  0 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2024-07-25  7:28 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

> > On some reflection your changes went into the wrong direction
> > because it ignored the (useful) single group property. Aggregation
> > on time really needs to be in perf report not here.
> 
> Still don't understand this. You are sampling, that's why you use the
> period, how can both samples happen at the same time? Why can't

It's leader sampling ({}:S) , so only one event samples and the others
in the group get collected/reset at the time the sample happens.
So all the events in the group measure the same time interval.

Think of it as a much faster and more efficient (but also more limited)
way to do perf stat -I ..., driven by the leader event.

> something sample twice in the period where 1 thing samples? You assign
> to the counter not accumulate in it, but the whole gnum would be
> broken in that case. Why can't samples be interleaved? What I did was

There's only one sample -- the leader -- each time.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/4] Create source symlink in perf object dir
  2024-07-25  7:28       ` Andi Kleen
@ 2024-07-25  9:18         ` Ian Rogers
  2024-07-25 22:50           ` Andi Kleen
  0 siblings, 1 reply; 18+ messages in thread
From: Ian Rogers @ 2024-07-25  9:18 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

On Thu, Jul 25, 2024 at 12:28 AM Andi Kleen <ak@linux.intel.com> wrote:
>
> > > On some reflection your changes went into the wrong direction
> > > because it ignored the (useful) single group property. Aggregation
> > > on time really needs to be in perf report not here.
> >
> > Still don't understand this. You are sampling, that's why you use the
> > period, how can both samples happen at the same time? Why can't
>
> It's leader sampling ({}:S) , so only one event samples and the others
> in the group get collected/reset at the time the sample happens.
> So all the events in the group measure the same time interval.

No, here is your test:

+       perf record -e '{cycles,instructions}' -o "${perfdatafile}"
perf test -w noploop
+       perf script -i "${perfdatafile}" -F +metric  > $scriptoutput

notice that you aren't using :S. You are grouping the events which has
no meaning other than for multiplexing, which for 2 events you won't
be doing unless something else is doing a lot of things with all the
other counters. Your code is trying to spot multiple samples across
multiple evsels, squirrel values away (the period, not counts) in
counts and possibly dump metrics. The samples may come from any CPU or
thread and so your IPC numbers could be instructions on CPU0 and
cycles on CPU1. If you see repeated instruction or cycles samples then
only the value of the last one holds and so the IPC number is somehow
interwoven with the period. Like I say the whole things reads as a
nonsense. If you were putting the counts into the correct CPU then
it'd make more sense. If you accumulated the counts instead of just
overwriting, then it'd make more sense. If you used
sample_read_group__for_each to read the counts from the group read by
leader sampling, as my patch does, then perhaps what you are trying to
do would make sense.

What you've done matches the apparently broken thing that was there
before, and you seem to think this carries great value, so I don't
have a problem with adding it back rather than a segv. It is just to
me the whole approach looks completely and utterly broken.

Thanks,
Ian

> Think of it as a much faster and more efficient (but also more limited)
> way to do perf stat -I ..., driven by the leader event.
>
> > something sample twice in the period where 1 thing samples? You assign
> > to the counter not accumulate in it, but the whole gnum would be
> > broken in that case. Why can't samples be interleaved? What I did was
>
> There's only one sample -- the leader -- each time.
>
> -Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 1/4] Create source symlink in perf object dir
  2024-07-25  9:18         ` Ian Rogers
@ 2024-07-25 22:50           ` Andi Kleen
  0 siblings, 0 replies; 18+ messages in thread
From: Andi Kleen @ 2024-07-25 22:50 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

On Thu, Jul 25, 2024 at 02:18:15AM -0700, Ian Rogers wrote:
> On Thu, Jul 25, 2024 at 12:28 AM Andi Kleen <ak@linux.intel.com> wrote:
> >
> > > > On some reflection your changes went into the wrong direction
> > > > because it ignored the (useful) single group property. Aggregation
> > > > on time really needs to be in perf report not here.
> > >
> > > Still don't understand this. You are sampling, that's why you use the
> > > period, how can both samples happen at the same time? Why can't
> >
> > It's leader sampling ({}:S) , so only one event samples and the others
> > in the group get collected/reset at the time the sample happens.
> > So all the events in the group measure the same time interval.
> 
> No, here is your test:
> 
> +       perf record -e '{cycles,instructions}' -o "${perfdatafile}"
> perf test -w noploop
> +       perf script -i "${perfdatafile}" -F +metric  > $scriptoutput

True. I don't think it affects the test functionality in this case, but it should
be using :S.

> 
> notice that you aren't using :S. You are grouping the events which has
> no meaning other than for multiplexing, which for 2 events you won't

Any use of {} means they are in the same group, so always contiguous in the
perf.data as the script metric code expects.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 2/4] perf test: Support external tests for separate objdir
  2024-07-24 19:01 ` [PATCH v7 2/4] perf test: Support external tests for separate objdir Andi Kleen
@ 2024-07-26  0:07   ` Namhyung Kim
  0 siblings, 0 replies; 18+ messages in thread
From: Namhyung Kim @ 2024-07-26  0:07 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

Hi Andi,

On Wed, Jul 24, 2024 at 12:01:35PM -0700, Andi Kleen wrote:
> Extend the searching for the test files so that it works
> when running perf from a separate objdir, and also when
> the perf executable is symlinked.
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
>  tools/perf/tests/tests-scripts.c | 31 ++++++++++++++++++++++++++++---
>  1 file changed, 28 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/tests/tests-scripts.c b/tools/perf/tests/tests-scripts.c
> index e2042b368269..63be17289ac3 100644
> --- a/tools/perf/tests/tests-scripts.c
> +++ b/tools/perf/tests/tests-scripts.c
> @@ -29,16 +29,41 @@
>  
>  static int shell_tests__dir_fd(void)
>  {
> -	char path[PATH_MAX], *exec_path;
> -	static const char * const devel_dirs[] = { "./tools/perf/tests/shell", "./tests/shell", };
> +	struct stat st;
> +	char path[PATH_MAX], path2[PATH_MAX], *exec_path;
> +	static const char * const devel_dirs[] = {
> +		"./tools/perf/tests/shell",
> +		"./tests/shell",
> +		"./source/tests/shell"
> +	};
> +	int fd;
> +	char *p;
>  
>  	for (size_t i = 0; i < ARRAY_SIZE(devel_dirs); ++i) {
> -		int fd = open(devel_dirs[i], O_PATH);
> +		fd = open(devel_dirs[i], O_PATH);
>  
>  		if (fd >= 0)
>  			return fd;
>  	}
>  
> +	/* Use directory of executable */
> +	if (readlink("/proc/self/exe", path2, sizeof path2) < 0)
> +		return -1;
> +	/* Follow another level of symlink if there */
> +	if (lstat(path2, &st) == 0 && (st.st_mode & S_IFMT) == S_IFLNK) {
> +		scnprintf(path, sizeof(path), path2);
> +		if (readlink(path, path2, sizeof path2) < 0)
> +			return -1;
> +	}
> +	/* Get directory */
> +	p = strrchr(path2, '/');
> +	if (*p)

Wouldn't it be 'if (p)' ?

Thanks,
Namhyung

> +		p[1] = 0;
> +	scnprintf(path, sizeof(path), "%s/tests/shell", path2);
> +	fd = open(path, O_PATH);
> +	if (fd >= 0)
> +		return fd;
> +
>  	/* Then installed path. */
>  	exec_path = get_argv_exec_path();
>  	scnprintf(path, sizeof(path), "%s/tests/shell", exec_path);
> -- 
> 2.45.2
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 3/4] perf script: Fix perf script -F +metric
  2024-07-24 19:01 ` [PATCH v7 3/4] perf script: Fix perf script -F +metric Andi Kleen
@ 2024-07-26  0:31   ` Namhyung Kim
  2024-07-26  3:13     ` Ian Rogers
  2024-07-31 19:32     ` Andi Kleen
  0 siblings, 2 replies; 18+ messages in thread
From: Namhyung Kim @ 2024-07-26  0:31 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

On Wed, Jul 24, 2024 at 12:01:36PM -0700, Andi Kleen wrote:
> This fixes a regression with perf script -F +metric originally caused by :
> 
> commit 37cc8ad77cf81f3ffd226856c367b0e15333a738
> Author: Ian Rogers <irogers@google.com>
> Date:   Sun Feb 19 01:28:46 2023 -0800
> 
>     perf metric: Directly use counts rather than saved_value
> 
> In the perf script environment the evsel wouldn't allocate an aggr
> values array, which led to a -1 reference because the metric
> evaluation would try to reference NULL - 1 (for aggr_idx)
> 
> Give the perf script evsels a single CPU aggr setup. That's
> enough because the groups are always contiguous, so no need
> to store more than one CPU's worth of values.
> 
> Before
> 
> % perf record -e '{cycles,instructions}:S' perf bench  mem memcpy
> % perf script -F +metric
> Segmentation fault (core dumped)
> 
> After:
> 
> % perf record -e '{cycles,instructions}:S' perf bench  mem memcpy
> ...
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.028 MB perf.data (90 samples) ]
> % perf script -F +metric
>        perf-exec 1847557 264658.180789:       3009       cycles:  ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms])
>        perf-exec 1847557 264658.180789:        382 instructions:  ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms])
>        perf-exec 1847557 264658.180789:         metric:    0.13  insn per cycle
> ...
> 
> Fixes: 37cc8ad77cf8 ("perf metric: Directly use counts rather ...")
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> ----
> 
> v2: Reformat code
> v3: Work around bogus warning
> v4: Set up aggr map only for metrics case to keep perf stat record
> working
> v5: Broken version
> v6: Only set up limited aggregation mode with -F +metric. Add conflict
> checks with perf stat record files.
> v7: Remove some unnecessary conflict checks. Fix buffer overflow. Minor cleanups.
> ---
>  tools/perf/builtin-script.c | 42 ++++++++++++++++++++++++++++++++-----
>  1 file changed, 37 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index c16224b1fef3..8058bb19a956 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -335,7 +335,6 @@ struct evsel_script {
>         FILE *fp;
>         u64  samples;
>         /* For metric output */
> -       u64  val;
>         int  gnum;
>  };
>  
> @@ -2132,13 +2131,17 @@ static void perf_sample__fprint_metric(struct perf_script *script,
>  		evlist__alloc_stats(&stat_config, script->session->evlist, /*alloc_raw=*/false);
>  	if (evsel_script(leader)->gnum++ == 0)
>  		perf_stat__reset_shadow_stats();
> -	val = sample->period * evsel->scale;
> -	evsel_script(evsel)->val = val;
> +	val = sample->period;
> +	/*
> +	 * Always use the first storage because the groups are contiguous

Without leader sampling we cannot guarantee groups events fire
together, right?


> +	 * and there's no need to handle multiple indexes for anything

Actually I think this is a behavior change that you changed the
aggregation mode from NONE to GLOBAL.

> +	 */
> +	evsel->stats->aggr[0].counts.val = val;
>  	if (evsel_script(leader)->gnum == leader->core.nr_members) {
>  		for_each_group_member (ev2, leader) {
>  			perf_stat__print_shadow_stats(&stat_config, ev2,
> -						      evsel_script(ev2)->val,
> -						      sample->cpu,
> +						      evsel->stats->aggr[0].counts.val,
> +						      0,

Like I said to Ian, we should pass a proper aggr_idx here not just 0 to
support correct aggregation.  For now I think only possible choice is
AGGR_NONE (for cpu-wide record) or AGGR_THREAD (for per-task record).
Then it should be an index to cpu or thread map.

I think existing sample->cpu can be incorrect for cpu-wide records too
in case of non-contiguous CPU list like `perf record -C 1,3,5 ...`.

>  						      &ctx,
>  						      NULL);
>  		}
> @@ -2325,6 +2328,20 @@ static void process_event(struct perf_script *script,
>  		fflush(fp);
>  }
>  
> +static void check_metric_conflict(void)
> +{
> +	int i;
> +	/*
> +	 * Avoid conflict with the aggregation mode used for the metric printing.
> +	 */
> +	for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
> +		if (output[i].fields & PERF_OUTPUT_METRIC) {
> +			fprintf(stderr, "perf stat record files are not supported with -F metric\n");
> +			exit(1);
> +		}
> +	}
> +}
> +
>  static struct scripting_ops	*scripting_ops;
>  
>  static void __process_stat(struct evsel *counter, u64 tstamp)
> @@ -2334,6 +2351,8 @@ static void __process_stat(struct evsel *counter, u64 tstamp)
>  	struct perf_cpu cpu;
>  	static int header_printed;
>  
> +	check_metric_conflict();
> +
>  	if (!header_printed) {
>  		printf("%3s %8s %15s %15s %15s %15s %s\n",
>  		       "CPU", "THREAD", "VAL", "ENA", "RUN", "TIME", "EVENT");
> @@ -3725,6 +3744,8 @@ static int process_stat_config_event(struct perf_session *session __maybe_unused
>  {
>  	perf_event__read_stat_config(&stat_config, &event->stat_config);
>  
> +	check_metric_conflict();
> +
>  	/*
>  	 * Aggregation modes are not used since post-processing scripts are
>  	 * supposed to take care of such requirements
> @@ -4088,6 +4109,17 @@ int cmd_script(int argc, const char **argv)
>  
>  	argc = parse_options_subcommand(argc, argv, options, script_subcommands, script_usage,
>  			     PARSE_OPT_STOP_AT_NON_OPTION);
> +	for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
> +		if (output[i].fields & PERF_OUTPUT_METRIC) {
> +			stat_config.aggr_map = cpu_aggr_map__empty_new(1);
> +			err = -ENOMEM;
> +			if (!stat_config.aggr_map)
> +				goto out;
> +			err = 0;
> +			stat_config.aggr_map->nr = 1;

It should be number of entries in the cpu map or thread map.

Thanks,
Namhyung


> +			break;
> +		}
> +	}
>  
>  	if (symbol_conf.guestmount ||
>  	    symbol_conf.default_guest_vmlinux_name ||
> -- 
> 2.45.2
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 4/4] Add a test case for perf script -F +metric
  2024-07-24 19:01 ` [PATCH v7 4/4] Add a test case for " Andi Kleen
@ 2024-07-26  0:32   ` Namhyung Kim
  0 siblings, 0 replies; 18+ messages in thread
From: Namhyung Kim @ 2024-07-26  0:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

On Wed, Jul 24, 2024 at 12:01:37PM -0700, Andi Kleen wrote:
> Just a simple test
> 
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> 
> ----
> 
> v2: Avoid bashisms. Use noploop
> v3: Avoid false positive in shellcheck
> ---
>  tools/perf/tests/shell/script.sh | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/tools/perf/tests/shell/script.sh b/tools/perf/tests/shell/script.sh
> index c1a603653662..5e080e40b390 100755
> --- a/tools/perf/tests/shell/script.sh
> +++ b/tools/perf/tests/shell/script.sh
> @@ -7,6 +7,7 @@ set -e
>  temp_dir=$(mktemp -d /tmp/perf-test-script.XXXXXXXXXX)
>  
>  perfdatafile="${temp_dir}/perf.data"
> +scriptoutput="${temp_dir}/script"
>  db_test="${temp_dir}/db_test.py"
>  
>  err=0
> @@ -88,8 +89,21 @@ test_parallel_perf()
>  	echo "parallel-perf test [Success]"
>  }
>  
> +test_metric()
> +{
> +	echo "script metric test"
> +	if ! perf list | grep -q cycles ; then return ; fi
> +	if ! perf list | grep -q instructions ; then return ; fi
> +	perf record -e '{cycles,instructions}' -o "${perfdatafile}" perf test -w noploop

Let's use the leader sampling here.

Thanks,
Namhyung


> +	perf script -i "${perfdatafile}" -F +metric  > $scriptoutput
> +	test "`grep -c metric $scriptoutput`" -gt 5
> +	grep metric $scriptoutput | head
> +	echo "script metric test [Success]"
> +}
> +
>  test_db
>  test_parallel_perf
> +test_metric
>  
>  cleanup
>  
> -- 
> 2.45.2
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 3/4] perf script: Fix perf script -F +metric
  2024-07-26  0:31   ` Namhyung Kim
@ 2024-07-26  3:13     ` Ian Rogers
  2024-07-31 19:32     ` Andi Kleen
  1 sibling, 0 replies; 18+ messages in thread
From: Ian Rogers @ 2024-07-26  3:13 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: Andi Kleen, linux-perf-users

On Thu, Jul 25, 2024 at 5:31 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Jul 24, 2024 at 12:01:36PM -0700, Andi Kleen wrote:
> > This fixes a regression with perf script -F +metric originally caused by :
> >
> > commit 37cc8ad77cf81f3ffd226856c367b0e15333a738
> > Author: Ian Rogers <irogers@google.com>
> > Date:   Sun Feb 19 01:28:46 2023 -0800
> >
> >     perf metric: Directly use counts rather than saved_value
> >
> > In the perf script environment the evsel wouldn't allocate an aggr
> > values array, which led to a -1 reference because the metric
> > evaluation would try to reference NULL - 1 (for aggr_idx)
> >
> > Give the perf script evsels a single CPU aggr setup. That's
> > enough because the groups are always contiguous, so no need
> > to store more than one CPU's worth of values.
> >
> > Before
> >
> > % perf record -e '{cycles,instructions}:S' perf bench  mem memcpy
> > % perf script -F +metric
> > Segmentation fault (core dumped)
> >
> > After:
> >
> > % perf record -e '{cycles,instructions}:S' perf bench  mem memcpy
> > ...
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.028 MB perf.data (90 samples) ]
> > % perf script -F +metric
> >        perf-exec 1847557 264658.180789:       3009       cycles:  ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms])
> >        perf-exec 1847557 264658.180789:        382 instructions:  ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms])
> >        perf-exec 1847557 264658.180789:         metric:    0.13  insn per cycle
> > ...
> >
> > Fixes: 37cc8ad77cf8 ("perf metric: Directly use counts rather ...")
> > Signed-off-by: Andi Kleen <ak@linux.intel.com>
> >
> > ----
> >
> > v2: Reformat code
> > v3: Work around bogus warning
> > v4: Set up aggr map only for metrics case to keep perf stat record
> > working
> > v5: Broken version
> > v6: Only set up limited aggregation mode with -F +metric. Add conflict
> > checks with perf stat record files.
> > v7: Remove some unnecessary conflict checks. Fix buffer overflow. Minor cleanups.
> > ---
> >  tools/perf/builtin-script.c | 42 ++++++++++++++++++++++++++++++++-----
> >  1 file changed, 37 insertions(+), 5 deletions(-)
> >
> > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> > index c16224b1fef3..8058bb19a956 100644
> > --- a/tools/perf/builtin-script.c
> > +++ b/tools/perf/builtin-script.c
> > @@ -335,7 +335,6 @@ struct evsel_script {
> >         FILE *fp;
> >         u64  samples;
> >         /* For metric output */
> > -       u64  val;
> >         int  gnum;
> >  };
> >
> > @@ -2132,13 +2131,17 @@ static void perf_sample__fprint_metric(struct perf_script *script,
> >               evlist__alloc_stats(&stat_config, script->session->evlist, /*alloc_raw=*/false);
> >       if (evsel_script(leader)->gnum++ == 0)
> >               perf_stat__reset_shadow_stats();
> > -     val = sample->period * evsel->scale;
> > -     evsel_script(evsel)->val = val;
> > +     val = sample->period;
> > +     /*
> > +      * Always use the first storage because the groups are contiguous
>
> Without leader sampling we cannot guarantee groups events fire
> together, right?

It is theoretically possible that all groups, and not just leader
sampling, when the leader fires you create samples with the same stack
trace but with varying periods - this would seem redundant/wasteful
with leader sampling as an equivalent, and possibly a way to get more
lost samples. Perhaps Andi is assuming they fire together because of
frequency mode?

When we sched_in siblings in a group there is no difference to
scheduling in non-siblings except in the error path:
https://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/kernel/events/core.c?h=perf-tools-next#n2593

The PMU may do something special, but I don't see it looking at
drivers/perf and in the arch events directory. Perhaps there is some
common code with the logic Andi mentions. If the behavior is because
of frequency mode then stating that assumption in at least a comment
would be good.

Thanks,
Ian

>
> > +      * and there's no need to handle multiple indexes for anything
>
> Actually I think this is a behavior change that you changed the
> aggregation mode from NONE to GLOBAL.
>
> > +      */
> > +     evsel->stats->aggr[0].counts.val = val;
> >       if (evsel_script(leader)->gnum == leader->core.nr_members) {
> >               for_each_group_member (ev2, leader) {
> >                       perf_stat__print_shadow_stats(&stat_config, ev2,
> > -                                                   evsel_script(ev2)->val,
> > -                                                   sample->cpu,
> > +                                                   evsel->stats->aggr[0].counts.val,
> > +                                                   0,
>
> Like I said to Ian, we should pass a proper aggr_idx here not just 0 to
> support correct aggregation.  For now I think only possible choice is
> AGGR_NONE (for cpu-wide record) or AGGR_THREAD (for per-task record).
> Then it should be an index to cpu or thread map.
>
> I think existing sample->cpu can be incorrect for cpu-wide records too
> in case of non-contiguous CPU list like `perf record -C 1,3,5 ...`.
>
> >                                                     &ctx,
> >                                                     NULL);
> >               }
> > @@ -2325,6 +2328,20 @@ static void process_event(struct perf_script *script,
> >               fflush(fp);
> >  }
> >
> > +static void check_metric_conflict(void)
> > +{
> > +     int i;
> > +     /*
> > +      * Avoid conflict with the aggregation mode used for the metric printing.
> > +      */
> > +     for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
> > +             if (output[i].fields & PERF_OUTPUT_METRIC) {
> > +                     fprintf(stderr, "perf stat record files are not supported with -F metric\n");
> > +                     exit(1);
> > +             }
> > +     }
> > +}
> > +
> >  static struct scripting_ops  *scripting_ops;
> >
> >  static void __process_stat(struct evsel *counter, u64 tstamp)
> > @@ -2334,6 +2351,8 @@ static void __process_stat(struct evsel *counter, u64 tstamp)
> >       struct perf_cpu cpu;
> >       static int header_printed;
> >
> > +     check_metric_conflict();
> > +
> >       if (!header_printed) {
> >               printf("%3s %8s %15s %15s %15s %15s %s\n",
> >                      "CPU", "THREAD", "VAL", "ENA", "RUN", "TIME", "EVENT");
> > @@ -3725,6 +3744,8 @@ static int process_stat_config_event(struct perf_session *session __maybe_unused
> >  {
> >       perf_event__read_stat_config(&stat_config, &event->stat_config);
> >
> > +     check_metric_conflict();
> > +
> >       /*
> >        * Aggregation modes are not used since post-processing scripts are
> >        * supposed to take care of such requirements
> > @@ -4088,6 +4109,17 @@ int cmd_script(int argc, const char **argv)
> >
> >       argc = parse_options_subcommand(argc, argv, options, script_subcommands, script_usage,
> >                            PARSE_OPT_STOP_AT_NON_OPTION);
> > +     for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
> > +             if (output[i].fields & PERF_OUTPUT_METRIC) {
> > +                     stat_config.aggr_map = cpu_aggr_map__empty_new(1);
> > +                     err = -ENOMEM;
> > +                     if (!stat_config.aggr_map)
> > +                             goto out;
> > +                     err = 0;
> > +                     stat_config.aggr_map->nr = 1;
>
> It should be number of entries in the cpu map or thread map.
>
> Thanks,
> Namhyung
>
>
> > +                     break;
> > +             }
> > +     }
> >
> >       if (symbol_conf.guestmount ||
> >           symbol_conf.default_guest_vmlinux_name ||
> > --
> > 2.45.2
> >
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 3/4] perf script: Fix perf script -F +metric
  2024-07-26  0:31   ` Namhyung Kim
  2024-07-26  3:13     ` Ian Rogers
@ 2024-07-31 19:32     ` Andi Kleen
  2024-08-02 18:26       ` Namhyung Kim
  1 sibling, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2024-07-31 19:32 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: linux-perf-users

> > @@ -2132,13 +2131,17 @@ static void perf_sample__fprint_metric(struct perf_script *script,
> >  		evlist__alloc_stats(&stat_config, script->session->evlist, /*alloc_raw=*/false);
> >  	if (evsel_script(leader)->gnum++ == 0)
> >  		perf_stat__reset_shadow_stats();
> > -	val = sample->period * evsel->scale;
> > -	evsel_script(evsel)->val = val;
> > +	val = sample->period;
> > +	/*
> > +	 * Always use the first storage because the groups are contiguous
> 
> Without leader sampling we cannot guarantee groups events fire
> together, right?

Yes. I'm adding an explicit check for this.

> 
> 
> > +	 * and there's no need to handle multiple indexes for anything
> 
> Actually I think this is a behavior change that you changed the
> aggregation mode from NONE to GLOBAL.

Not with leader sampling because the group is contiguous and output
after its end, with all the previous values forgotten then.
The other cases don't really work anyways as multiple people pointed
out.

> > +	 */
> > +	evsel->stats->aggr[0].counts.val = val;
> >  	if (evsel_script(leader)->gnum == leader->core.nr_members) {
> >  		for_each_group_member (ev2, leader) {
> >  			perf_stat__print_shadow_stats(&stat_config, ev2,
> > -						      evsel_script(ev2)->val,
> > -						      sample->cpu,
> > +						      evsel->stats->aggr[0].counts.val,
> > +						      0,
> 
> Like I said to Ian, we should pass a proper aggr_idx here not just 0 to
> support correct aggregation.  For now I think only possible choice is
> AGGR_NONE (for cpu-wide record) or AGGR_THREAD (for per-task record).
> Then it should be an index to cpu or thread map.

Given the above I think that is not needed.

Full aggregation for metrics over longer period would belong into perf report,
not here.  perf script is only to get non aggregated metrics.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 3/4] perf script: Fix perf script -F +metric
  2024-07-31 19:32     ` Andi Kleen
@ 2024-08-02 18:26       ` Namhyung Kim
  2024-08-02 20:58         ` Andi Kleen
  0 siblings, 1 reply; 18+ messages in thread
From: Namhyung Kim @ 2024-08-02 18:26 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

On Wed, Jul 31, 2024 at 12:32:28PM -0700, Andi Kleen wrote:
> > > @@ -2132,13 +2131,17 @@ static void perf_sample__fprint_metric(struct perf_script *script,
> > >  		evlist__alloc_stats(&stat_config, script->session->evlist, /*alloc_raw=*/false);
> > >  	if (evsel_script(leader)->gnum++ == 0)
> > >  		perf_stat__reset_shadow_stats();
> > > -	val = sample->period * evsel->scale;
> > > -	evsel_script(evsel)->val = val;
> > > +	val = sample->period;
> > > +	/*
> > > +	 * Always use the first storage because the groups are contiguous
> > 
> > Without leader sampling we cannot guarantee groups events fire
> > together, right?
> 
> Yes. I'm adding an explicit check for this.
> 
> > 
> > 
> > > +	 * and there's no need to handle multiple indexes for anything
> > 
> > Actually I think this is a behavior change that you changed the
> > aggregation mode from NONE to GLOBAL.
> 
> Not with leader sampling because the group is contiguous and output
> after its end, with all the previous values forgotten then.
> The other cases don't really work anyways as multiple people pointed
> out.

Right, but we cannot prevent people to do that..  Maybe we can disable
metric with a warning if leader sampling is not used?  Or at least add
a comment in the code that it's only intended for the use case.

Thanks,
Namhyung

> 
> > > +	 */
> > > +	evsel->stats->aggr[0].counts.val = val;
> > >  	if (evsel_script(leader)->gnum == leader->core.nr_members) {
> > >  		for_each_group_member (ev2, leader) {
> > >  			perf_stat__print_shadow_stats(&stat_config, ev2,
> > > -						      evsel_script(ev2)->val,
> > > -						      sample->cpu,
> > > +						      evsel->stats->aggr[0].counts.val,
> > > +						      0,
> > 
> > Like I said to Ian, we should pass a proper aggr_idx here not just 0 to
> > support correct aggregation.  For now I think only possible choice is
> > AGGR_NONE (for cpu-wide record) or AGGR_THREAD (for per-task record).
> > Then it should be an index to cpu or thread map.
> 
> Given the above I think that is not needed.
> 
> Full aggregation for metrics over longer period would belong into perf report,
> not here.  perf script is only to get non aggregated metrics.
> 
> -Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 3/4] perf script: Fix perf script -F +metric
  2024-08-02 18:26       ` Namhyung Kim
@ 2024-08-02 20:58         ` Andi Kleen
  2024-08-05 18:58           ` Namhyung Kim
  0 siblings, 1 reply; 18+ messages in thread
From: Andi Kleen @ 2024-08-02 20:58 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: linux-perf-users

> > Not with leader sampling because the group is contiguous and output
> > after its end, with all the previous values forgotten then.
> > The other cases don't really work anyways as multiple people pointed
> > out.
> 
> Right, but we cannot prevent people to do that..  Maybe we can disable
> metric with a warning if leader sampling is not used?  Or at least add
> a comment in the code that it's only intended for the use case.

Yes I did that in the version I posted yesterday. It prints a one
time warning and disables metric output.

-Andi

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v7 3/4] perf script: Fix perf script -F +metric
  2024-08-02 20:58         ` Andi Kleen
@ 2024-08-05 18:58           ` Namhyung Kim
  0 siblings, 0 replies; 18+ messages in thread
From: Namhyung Kim @ 2024-08-05 18:58 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

On Fri, Aug 02, 2024 at 01:58:21PM -0700, Andi Kleen wrote:
> > > Not with leader sampling because the group is contiguous and output
> > > after its end, with all the previous values forgotten then.
> > > The other cases don't really work anyways as multiple people pointed
> > > out.
> > 
> > Right, but we cannot prevent people to do that..  Maybe we can disable
> > metric with a warning if leader sampling is not used?  Or at least add
> > a comment in the code that it's only intended for the use case.
> 
> Yes I did that in the version I posted yesterday. It prints a one
> time warning and disables metric output.

Oh ok, I'll take a look at the new version.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2024-08-05 18:58 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-24 19:01 [PATCH v7 1/4] Create source symlink in perf object dir Andi Kleen
2024-07-24 19:01 ` [PATCH v7 2/4] perf test: Support external tests for separate objdir Andi Kleen
2024-07-26  0:07   ` Namhyung Kim
2024-07-24 19:01 ` [PATCH v7 3/4] perf script: Fix perf script -F +metric Andi Kleen
2024-07-26  0:31   ` Namhyung Kim
2024-07-26  3:13     ` Ian Rogers
2024-07-31 19:32     ` Andi Kleen
2024-08-02 18:26       ` Namhyung Kim
2024-08-02 20:58         ` Andi Kleen
2024-08-05 18:58           ` Namhyung Kim
2024-07-24 19:01 ` [PATCH v7 4/4] Add a test case for " Andi Kleen
2024-07-26  0:32   ` Namhyung Kim
2024-07-24 20:29 ` [PATCH v7 1/4] Create source symlink in perf object dir Ian Rogers
2024-07-24 21:48   ` Andi Kleen
2024-07-24 22:31     ` Ian Rogers
2024-07-25  7:28       ` Andi Kleen
2024-07-25  9:18         ` Ian Rogers
2024-07-25 22:50           ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).