linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 1/4] Create source symlink in perf object dir
@ 2024-07-23 20:48 Andi Kleen
  2024-07-23 20:48 ` [PATCH v6 2/4] perf test: Support external tests for separate objdir Andi Kleen
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Andi Kleen @ 2024-07-23 20:48 UTC (permalink / raw)
  To: linux-perf-users; +Cc: Andi Kleen

Create a source symlink to the original source in the objdir.
This is similar to what the main kernel build script does.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/Makefile.perf | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 175e4c7898f0..d46892d8223b 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -163,6 +163,8 @@ ifneq ($(OUTPUT),)
 # for flex/bison parsers.
 VPATH += $(OUTPUT)
 export VPATH
+# create symlink to the original source
+SOURCE := $(shell ln -sf $(srctree)/tools/perf $(OUTPUT)/source)
 endif
 
 ifeq ($(V),1)
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v6 2/4] perf test: Support external tests for separate objdir
  2024-07-23 20:48 [PATCH v6 1/4] Create source symlink in perf object dir Andi Kleen
@ 2024-07-23 20:48 ` Andi Kleen
  2024-07-23 20:48 ` [PATCH v6 3/4] perf script: Fix perf script -F +metric Andi Kleen
  2024-07-23 20:48 ` [PATCH v6 4/4] Add a test case for " Andi Kleen
  2 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2024-07-23 20:48 UTC (permalink / raw)
  To: linux-perf-users; +Cc: Andi Kleen

Extend the searching for the test files so that it works
when running perf from a separate objdir, and also when
the perf executable is symlinked.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
 tools/perf/tests/tests-scripts.c | 31 ++++++++++++++++++++++++++++---
 1 file changed, 28 insertions(+), 3 deletions(-)

diff --git a/tools/perf/tests/tests-scripts.c b/tools/perf/tests/tests-scripts.c
index e2042b368269..63be17289ac3 100644
--- a/tools/perf/tests/tests-scripts.c
+++ b/tools/perf/tests/tests-scripts.c
@@ -29,16 +29,41 @@
 
 static int shell_tests__dir_fd(void)
 {
-	char path[PATH_MAX], *exec_path;
-	static const char * const devel_dirs[] = { "./tools/perf/tests/shell", "./tests/shell", };
+	struct stat st;
+	char path[PATH_MAX], path2[PATH_MAX], *exec_path;
+	static const char * const devel_dirs[] = {
+		"./tools/perf/tests/shell",
+		"./tests/shell",
+		"./source/tests/shell"
+	};
+	int fd;
+	char *p;
 
 	for (size_t i = 0; i < ARRAY_SIZE(devel_dirs); ++i) {
-		int fd = open(devel_dirs[i], O_PATH);
+		fd = open(devel_dirs[i], O_PATH);
 
 		if (fd >= 0)
 			return fd;
 	}
 
+	/* Use directory of executable */
+	if (readlink("/proc/self/exe", path2, sizeof path2) < 0)
+		return -1;
+	/* Follow another level of symlink if there */
+	if (lstat(path2, &st) == 0 && (st.st_mode & S_IFMT) == S_IFLNK) {
+		scnprintf(path, sizeof(path), path2);
+		if (readlink(path, path2, sizeof path2) < 0)
+			return -1;
+	}
+	/* Get directory */
+	p = strrchr(path2, '/');
+	if (*p)
+		p[1] = 0;
+	scnprintf(path, sizeof(path), "%s/tests/shell", path2);
+	fd = open(path, O_PATH);
+	if (fd >= 0)
+		return fd;
+
 	/* Then installed path. */
 	exec_path = get_argv_exec_path();
 	scnprintf(path, sizeof(path), "%s/tests/shell", exec_path);
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v6 3/4] perf script: Fix perf script -F +metric
  2024-07-23 20:48 [PATCH v6 1/4] Create source symlink in perf object dir Andi Kleen
  2024-07-23 20:48 ` [PATCH v6 2/4] perf test: Support external tests for separate objdir Andi Kleen
@ 2024-07-23 20:48 ` Andi Kleen
  2024-07-23 21:32   ` Ian Rogers
  2024-07-23 20:48 ` [PATCH v6 4/4] Add a test case for " Andi Kleen
  2 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2024-07-23 20:48 UTC (permalink / raw)
  To: linux-perf-users; +Cc: Andi Kleen

This fixes a regression with perf script -F +metric originally caused by :

commit 37cc8ad77cf81f3ffd226856c367b0e15333a738
Author: Ian Rogers <irogers@google.com>
Date:   Sun Feb 19 01:28:46 2023 -0800

    perf metric: Directly use counts rather than saved_value

In the perf script environment the evsel wouldn't allocate an aggr
values array, which led to a -1 reference because the metric
evaluation would try to reference NULL - 1 (for aggr_idx)

Give the perf script evsels a single CPU aggr setup. That's
enough because the groups are always contiguous, so no need
to store more than one CPU's worth of values.

Before

% perf record -e '{cycles,instructions}:S' perf bench  mem memcpy
% perf script -F +metric
Segmentation fault (core dumped)

After:

% perf record -e '{cycles,instructions}:S' perf bench  mem memcpy
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.028 MB perf.data (90 samples) ]
% perf script -F +metric
       perf-exec 1847557 264658.180789:       3009       cycles:  ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms])
       perf-exec 1847557 264658.180789:        382 instructions:  ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms])
       perf-exec 1847557 264658.180789:         metric:    0.13  insn per cycle
...

Fixes: 37cc8ad77cf8 ("perf metric: Directly use counts rather ...")
Signed-off-by: Andi Kleen <ak@linux.intel.com>

----

v2: Reformat code
v3: Work around bogus warning
v4: Set up aggr map only for metrics case to keep perf stat record
working
v5: Broken version
v6: Only set up limited aggregation mode with -F +metric. Add conflict
checks with perf stat record files.
---
 tools/perf/builtin-script.c | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index c16224b1fef3..1a4b9b3d240d 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2133,12 +2133,17 @@ static void perf_sample__fprint_metric(struct perf_script *script,
 	if (evsel_script(leader)->gnum++ == 0)
 		perf_stat__reset_shadow_stats();
 	val = sample->period * evsel->scale;
+	/*
+	 * Always use CPU 0 storage because the groups are contiguous
+	 * and there's no need to handle multiple indexes for anything
+	 */
+	evsel->stats->aggr[0].counts.val = val;
 	evsel_script(evsel)->val = val;
 	if (evsel_script(leader)->gnum == leader->core.nr_members) {
 		for_each_group_member (ev2, leader) {
 			perf_stat__print_shadow_stats(&stat_config, ev2,
 						      evsel_script(ev2)->val,
-						      sample->cpu,
+						      0,
 						      &ctx,
 						      NULL);
 		}
@@ -2325,6 +2330,20 @@ static void process_event(struct perf_script *script,
 		fflush(fp);
 }
 
+static void check_metric_conflict(void)
+{
+	int i;
+	/*
+	 * Avoid conflict with the aggregation mode used for the metric printing.
+	 */
+	for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
+		if (output[i].fields & PERF_OUTPUT_METRIC) {
+			fprintf(stderr, "perf stat record files are not supported with -F metric\n");
+			exit(1);
+		}
+	}
+}
+
 static struct scripting_ops	*scripting_ops;
 
 static void __process_stat(struct evsel *counter, u64 tstamp)
@@ -2334,6 +2353,8 @@ static void __process_stat(struct evsel *counter, u64 tstamp)
 	struct perf_cpu cpu;
 	static int header_printed;
 
+	check_metric_conflict();
+
 	if (!header_printed) {
 		printf("%3s %8s %15s %15s %15s %15s %s\n",
 		       "CPU", "THREAD", "VAL", "ENA", "RUN", "TIME", "EVENT");
@@ -3725,6 +3746,8 @@ static int process_stat_config_event(struct perf_session *session __maybe_unused
 {
 	perf_event__read_stat_config(&stat_config, &event->stat_config);
 
+	check_metric_conflict();
+
 	/*
 	 * Aggregation modes are not used since post-processing scripts are
 	 * supposed to take care of such requirements
@@ -3760,6 +3783,8 @@ int process_thread_map_event(struct perf_session *session,
 	struct perf_tool *tool = session->tool;
 	struct perf_script *script = container_of(tool, struct perf_script, tool);
 
+	check_metric_conflict();
+
 	if (dump_trace)
 		perf_event__fprintf_thread_map(event, stdout);
 
@@ -3785,6 +3810,8 @@ int process_cpu_map_event(struct perf_session *session,
 	if (dump_trace)
 		perf_event__fprintf_cpu_map(event, stdout);
 
+	check_metric_conflict();
+
 	if (script->cpus) {
 		pr_warning("Extra cpu map event, ignoring.\n");
 		return 0;
@@ -4088,6 +4115,10 @@ int cmd_script(int argc, const char **argv)
 
 	argc = parse_options_subcommand(argc, argv, options, script_subcommands, script_usage,
 			     PARSE_OPT_STOP_AT_NON_OPTION);
+	for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
+		if (output[i].fields & PERF_OUTPUT_METRIC)
+			stat_config.aggr_map = &(struct cpu_aggr_map){ .nr = 1 };
+	}
 
 	if (symbol_conf.guestmount ||
 	    symbol_conf.default_guest_vmlinux_name ||
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v6 4/4] Add a test case for perf script -F +metric
  2024-07-23 20:48 [PATCH v6 1/4] Create source symlink in perf object dir Andi Kleen
  2024-07-23 20:48 ` [PATCH v6 2/4] perf test: Support external tests for separate objdir Andi Kleen
  2024-07-23 20:48 ` [PATCH v6 3/4] perf script: Fix perf script -F +metric Andi Kleen
@ 2024-07-23 20:48 ` Andi Kleen
  2 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2024-07-23 20:48 UTC (permalink / raw)
  To: linux-perf-users; +Cc: Andi Kleen

Just a simple test

Signed-off-by: Andi Kleen <ak@linux.intel.com>

----

v2: Avoid bashisms. Use noploop
v3: Avoid false positive in shellcheck
---
 tools/perf/tests/shell/script.sh | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/tools/perf/tests/shell/script.sh b/tools/perf/tests/shell/script.sh
index c1a603653662..5e080e40b390 100755
--- a/tools/perf/tests/shell/script.sh
+++ b/tools/perf/tests/shell/script.sh
@@ -7,6 +7,7 @@ set -e
 temp_dir=$(mktemp -d /tmp/perf-test-script.XXXXXXXXXX)
 
 perfdatafile="${temp_dir}/perf.data"
+scriptoutput="${temp_dir}/script"
 db_test="${temp_dir}/db_test.py"
 
 err=0
@@ -88,8 +89,21 @@ test_parallel_perf()
 	echo "parallel-perf test [Success]"
 }
 
+test_metric()
+{
+	echo "script metric test"
+	if ! perf list | grep -q cycles ; then return ; fi
+	if ! perf list | grep -q instructions ; then return ; fi
+	perf record -e '{cycles,instructions}' -o "${perfdatafile}" perf test -w noploop
+	perf script -i "${perfdatafile}" -F +metric  > $scriptoutput
+	test "`grep -c metric $scriptoutput`" -gt 5
+	grep metric $scriptoutput | head
+	echo "script metric test [Success]"
+}
+
 test_db
 test_parallel_perf
+test_metric
 
 cleanup
 
-- 
2.45.2


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v6 3/4] perf script: Fix perf script -F +metric
  2024-07-23 20:48 ` [PATCH v6 3/4] perf script: Fix perf script -F +metric Andi Kleen
@ 2024-07-23 21:32   ` Ian Rogers
  2024-07-23 23:29     ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Ian Rogers @ 2024-07-23 21:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

On Tue, Jul 23, 2024 at 1:48 PM Andi Kleen <ak@linux.intel.com> wrote:
>
> This fixes a regression with perf script -F +metric originally caused by :
>
> commit 37cc8ad77cf81f3ffd226856c367b0e15333a738
> Author: Ian Rogers <irogers@google.com>
> Date:   Sun Feb 19 01:28:46 2023 -0800
>
>     perf metric: Directly use counts rather than saved_value
>
> In the perf script environment the evsel wouldn't allocate an aggr
> values array, which led to a -1 reference because the metric
> evaluation would try to reference NULL - 1 (for aggr_idx)
>
> Give the perf script evsels a single CPU aggr setup. That's
> enough because the groups are always contiguous, so no need
> to store more than one CPU's worth of values.

I don't follow this. Samples have CPUs but you're associating all
values with CPU0. Why not just use counts and aggregation properly?

> Before
>
> % perf record -e '{cycles,instructions}:S' perf bench  mem memcpy
> % perf script -F +metric
> Segmentation fault (core dumped)
>
> After:
>
> % perf record -e '{cycles,instructions}:S' perf bench  mem memcpy
> ...
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.028 MB perf.data (90 samples) ]
> % perf script -F +metric
>        perf-exec 1847557 264658.180789:       3009       cycles:  ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms])
>        perf-exec 1847557 264658.180789:        382 instructions:  ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms])
>        perf-exec 1847557 264658.180789:         metric:    0.13  insn per cycle
> ...
>
> Fixes: 37cc8ad77cf8 ("perf metric: Directly use counts rather ...")
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
>
> ----
>
> v2: Reformat code
> v3: Work around bogus warning
> v4: Set up aggr map only for metrics case to keep perf stat record
> working
> v5: Broken version
> v6: Only set up limited aggregation mode with -F +metric. Add conflict
> checks with perf stat record files.
> ---
>  tools/perf/builtin-script.c | 33 ++++++++++++++++++++++++++++++++-
>  1 file changed, 32 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
> index c16224b1fef3..1a4b9b3d240d 100644
> --- a/tools/perf/builtin-script.c
> +++ b/tools/perf/builtin-script.c
> @@ -2133,12 +2133,17 @@ static void perf_sample__fprint_metric(struct perf_script *script,
>         if (evsel_script(leader)->gnum++ == 0)
>                 perf_stat__reset_shadow_stats();
>         val = sample->period * evsel->scale;
> +       /*
> +        * Always use CPU 0 storage because the groups are contiguous
> +        * and there's no need to handle multiple indexes for anything
> +        */
> +       evsel->stats->aggr[0].counts.val = val;

0 isn't necessary CPU0, it is what is the first aggregation
cpu_aggr_map which can be CPU1, it can be a thread map index. The
comment is confusing an index with a CPU number.

>         evsel_script(evsel)->val = val;

Why store val twice? Yes it is read below, but now you can also just
read it from the counts. Not that it matters in the cases that apply
but for json metrics all counts are unscaled.

>         if (evsel_script(leader)->gnum == leader->core.nr_members) {
>                 for_each_group_member (ev2, leader) {
>                         perf_stat__print_shadow_stats(&stat_config, ev2,
>                                                       evsel_script(ev2)->val,
> -                                                     sample->cpu,
> +                                                     0,
>                                                       &ctx,
>                                                       NULL);
>                 }
> @@ -2325,6 +2330,20 @@ static void process_event(struct perf_script *script,
>                 fflush(fp);
>  }
>
> +static void check_metric_conflict(void)
> +{
> +       int i;
> +       /*
> +        * Avoid conflict with the aggregation mode used for the metric printing.
> +        */
> +       for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
> +               if (output[i].fields & PERF_OUTPUT_METRIC) {
> +                       fprintf(stderr, "perf stat record files are not supported with -F metric\n");
> +                       exit(1);
> +               }
> +       }
> +}
> +

No idea what this is doing. What's conflicting with what?

>  static struct scripting_ops    *scripting_ops;
>
>  static void __process_stat(struct evsel *counter, u64 tstamp)
> @@ -2334,6 +2353,8 @@ static void __process_stat(struct evsel *counter, u64 tstamp)
>         struct perf_cpu cpu;
>         static int header_printed;
>
> +       check_metric_conflict();
> +
>         if (!header_printed) {
>                 printf("%3s %8s %15s %15s %15s %15s %s\n",
>                        "CPU", "THREAD", "VAL", "ENA", "RUN", "TIME", "EVENT");
> @@ -3725,6 +3746,8 @@ static int process_stat_config_event(struct perf_session *session __maybe_unused
>  {
>         perf_event__read_stat_config(&stat_config, &event->stat_config);
>
> +       check_metric_conflict();
> +
>         /*
>          * Aggregation modes are not used since post-processing scripts are
>          * supposed to take care of such requirements
> @@ -3760,6 +3783,8 @@ int process_thread_map_event(struct perf_session *session,
>         struct perf_tool *tool = session->tool;
>         struct perf_script *script = container_of(tool, struct perf_script, tool);
>
> +       check_metric_conflict();
> +
>         if (dump_trace)
>                 perf_event__fprintf_thread_map(event, stdout);
>
> @@ -3785,6 +3810,8 @@ int process_cpu_map_event(struct perf_session *session,
>         if (dump_trace)
>                 perf_event__fprintf_cpu_map(event, stdout);
>
> +       check_metric_conflict();
> +
>         if (script->cpus) {
>                 pr_warning("Extra cpu map event, ignoring.\n");
>                 return 0;
> @@ -4088,6 +4115,10 @@ int cmd_script(int argc, const char **argv)
>
>         argc = parse_options_subcommand(argc, argv, options, script_subcommands, script_usage,
>                              PARSE_OPT_STOP_AT_NON_OPTION);
> +       for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
> +               if (output[i].fields & PERF_OUTPUT_METRIC)
> +                       stat_config.aggr_map = &(struct cpu_aggr_map){ .nr = 1 };

Assigning the address a temporary rval to a global variable seems
wrong to the point I'm surprised it compiles. Accessing
stat_config.aggr_map->map[0] will lead to reading beyond the end of
the value and presumably read uninitialized memory.  Compiling with
EXTRA_CFLAGS="-fsanitize=address" should complain about all of this.

Thanks,
Ian

> +       }
>
>         if (symbol_conf.guestmount ||
>             symbol_conf.default_guest_vmlinux_name ||
> --
> 2.45.2
>
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v6 3/4] perf script: Fix perf script -F +metric
  2024-07-23 21:32   ` Ian Rogers
@ 2024-07-23 23:29     ` Andi Kleen
  2024-07-24  0:05       ` Ian Rogers
  0 siblings, 1 reply; 8+ messages in thread
From: Andi Kleen @ 2024-07-23 23:29 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

On Tue, Jul 23, 2024 at 02:32:33PM -0700, Ian Rogers wrote:
> On Tue, Jul 23, 2024 at 1:48 PM Andi Kleen <ak@linux.intel.com> wrote:
> >
> > This fixes a regression with perf script -F +metric originally caused by :
> >
> > commit 37cc8ad77cf81f3ffd226856c367b0e15333a738
> > Author: Ian Rogers <irogers@google.com>
> > Date:   Sun Feb 19 01:28:46 2023 -0800
> >
> >     perf metric: Directly use counts rather than saved_value
> >
> > In the perf script environment the evsel wouldn't allocate an aggr
> > values array, which led to a -1 reference because the metric
> > evaluation would try to reference NULL - 1 (for aggr_idx)
> >
> > Give the perf script evsels a single CPU aggr setup. That's
> > enough because the groups are always contiguous, so no need
> > to store more than one CPU's worth of values.
> 
> I don't follow this. Samples have CPUs but you're associating all
> values with CPU0. Why not just use counts and aggregation properly?

Why use something that is not needed?

It's not needed because the CPUs are not interleaved because the code
is just processing a single group which only has counts from 
a single CPU. And there is no need to output an extra index for the
count because the sample already has all the context.

If it was extended to multiple groups it would be needed, but it's not 
clear how useful it is. The benefit of the feature is that you 
can get the metric at a very fine grained level -- only for the time interval
since the last sample.

Doing it for multiple groups means you would do some level of
aggregation over longer time periods.. You could handle more complex metrics,
but would lose this very fine grain benefit. If you want that it's probably
better use perf report's time slicing feature instead of perf script.
That one currently doesn't support metrics though, but probably it
should.

> > @@ -2325,6 +2330,20 @@ static void process_event(struct perf_script *script,
> >                 fflush(fp);
> >  }
> >
> > +static void check_metric_conflict(void)
> > +{
> > +       int i;
> > +       /*
> > +        * Avoid conflict with the aggregation mode used for the metric printing.
> > +        */
> > +       for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
> > +               if (output[i].fields & PERF_OUTPUT_METRIC) {
> > +                       fprintf(stderr, "perf stat record files are not supported with -F metric\n");
> > +                       exit(1);
> > +               }
> > +       }
> > +}
> > +
> 
> No idea what this is doing. What's conflicting with what?

The conflict is between the -F +metric setup and processing STAT*
records in the perf.data (as generated by perf stat record) 
The later uses AGGR_NONE which is a conflict.

> >
> > @@ -3785,6 +3810,8 @@ int process_cpu_map_event(struct perf_session *session,
> >         if (dump_trace)
> >                 perf_event__fprintf_cpu_map(event, stdout);
> >
> > +       check_metric_conflict();
> > +
> >         if (script->cpus) {
> >                 pr_warning("Extra cpu map event, ignoring.\n");
> >                 return 0;
> > @@ -4088,6 +4115,10 @@ int cmd_script(int argc, const char **argv)
> >
> >         argc = parse_options_subcommand(argc, argv, options, script_subcommands, script_usage,
> >                              PARSE_OPT_STOP_AT_NON_OPTION);
> > +       for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
> > +               if (output[i].fields & PERF_OUTPUT_METRIC)
> > +                       stat_config.aggr_map = &(struct cpu_aggr_map){ .nr = 1 };
> 
> Assigning the address a temporary rval to a global variable seems
> wrong to the point I'm surprised it compiles. Accessing

AFAIK gcc keeps the local around for the function, but you're right 
it's not good code, especially with the buffer overrun.


> stat_config.aggr_map->map[0] will lead to reading beyond the end of
> the value and presumably read uninitialized memory.  Compiling with
> EXTRA_CFLAGS="-fsanitize=address" should complain about all of this.

Good point, but I doubt the address sanitizer would have caught 
it because it doesn't really track the stack.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v6 3/4] perf script: Fix perf script -F +metric
  2024-07-23 23:29     ` Andi Kleen
@ 2024-07-24  0:05       ` Ian Rogers
  2024-07-24  0:36         ` Andi Kleen
  0 siblings, 1 reply; 8+ messages in thread
From: Ian Rogers @ 2024-07-24  0:05 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-perf-users

On Tue, Jul 23, 2024 at 4:29 PM Andi Kleen <ak@linux.intel.com> wrote:
>
> On Tue, Jul 23, 2024 at 02:32:33PM -0700, Ian Rogers wrote:
> > On Tue, Jul 23, 2024 at 1:48 PM Andi Kleen <ak@linux.intel.com> wrote:
> > >
> > > This fixes a regression with perf script -F +metric originally caused by :
> > >
> > > commit 37cc8ad77cf81f3ffd226856c367b0e15333a738
> > > Author: Ian Rogers <irogers@google.com>
> > > Date:   Sun Feb 19 01:28:46 2023 -0800
> > >
> > >     perf metric: Directly use counts rather than saved_value
> > >
> > > In the perf script environment the evsel wouldn't allocate an aggr
> > > values array, which led to a -1 reference because the metric
> > > evaluation would try to reference NULL - 1 (for aggr_idx)
> > >
> > > Give the perf script evsels a single CPU aggr setup. That's
> > > enough because the groups are always contiguous, so no need
> > > to store more than one CPU's worth of values.
> >
> > I don't follow this. Samples have CPUs but you're associating all
> > values with CPU0. Why not just use counts and aggregation properly?
>
> Why use something that is not needed?

The main reason would be so that perf_stat_process_counter could work
and update the aggregation accordingly. Then you can dump out the
aggregation, be it per CPU, per core, per socket, per cache-level,
etc. as appropriate. If you follow the perf stat convention you are
also much less likely to be broken in the future, as perf stat would
also get broken. It is complicated spaghetti to work out how this
stuff works, but that's why I fixed in the patch I sent out.

> It's not needed because the CPUs are not interleaved because the code
> is just processing a single group which only has counts from
> a single CPU. And there is no need to output an extra index for the
> count because the sample already has all the context.
>
> If it was extended to multiple groups it would be needed, but it's not
> clear how useful it is. The benefit of the feature is that you
> can get the metric at a very fine grained level -- only for the time interval
> since the last sample.
>
> Doing it for multiple groups means you would do some level of
> aggregation over longer time periods.. You could handle more complex metrics,
> but would lose this very fine grain benefit. If you want that it's probably
> better use perf report's time slicing feature instead of perf script.
> That one currently doesn't support metrics though, but probably it
> should.
>
> > > @@ -2325,6 +2330,20 @@ static void process_event(struct perf_script *script,
> > >                 fflush(fp);
> > >  }
> > >
> > > +static void check_metric_conflict(void)
> > > +{
> > > +       int i;
> > > +       /*
> > > +        * Avoid conflict with the aggregation mode used for the metric printing.
> > > +        */
> > > +       for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
> > > +               if (output[i].fields & PERF_OUTPUT_METRIC) {
> > > +                       fprintf(stderr, "perf stat record files are not supported with -F metric\n");
> > > +                       exit(1);
> > > +               }
> > > +       }
> > > +}
> > > +
> >
> > No idea what this is doing. What's conflicting with what?
>
> The conflict is between the -F +metric setup and processing STAT*
> records in the perf.data (as generated by perf stat record)
> The later uses AGGR_NONE which is a conflict.

Or let it set the aggregation mode and just let the aggregation code
handle it when computing metrics? I think getting the STAT events
isn't typical, so this is an academic argument. I'm just thinking
about future me, wondering why on earth this exists.

Thanks,
Ian

> > >
> > > @@ -3785,6 +3810,8 @@ int process_cpu_map_event(struct perf_session *session,
> > >         if (dump_trace)
> > >                 perf_event__fprintf_cpu_map(event, stdout);
> > >
> > > +       check_metric_conflict();
> > > +
> > >         if (script->cpus) {
> > >                 pr_warning("Extra cpu map event, ignoring.\n");
> > >                 return 0;
> > > @@ -4088,6 +4115,10 @@ int cmd_script(int argc, const char **argv)
> > >
> > >         argc = parse_options_subcommand(argc, argv, options, script_subcommands, script_usage,
> > >                              PARSE_OPT_STOP_AT_NON_OPTION);
> > > +       for (i = 0; i < OUTPUT_TYPE_MAX; i++) {
> > > +               if (output[i].fields & PERF_OUTPUT_METRIC)
> > > +                       stat_config.aggr_map = &(struct cpu_aggr_map){ .nr = 1 };
> >
> > Assigning the address a temporary rval to a global variable seems
> > wrong to the point I'm surprised it compiles. Accessing
>
> AFAIK gcc keeps the local around for the function, but you're right
> it's not good code, especially with the buffer overrun.
>
>
> > stat_config.aggr_map->map[0] will lead to reading beyond the end of
> > the value and presumably read uninitialized memory.  Compiling with
> > EXTRA_CFLAGS="-fsanitize=address" should complain about all of this.
>
> Good point, but I doubt the address sanitizer would have caught
> it because it doesn't really track the stack.
>
> -Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v6 3/4] perf script: Fix perf script -F +metric
  2024-07-24  0:05       ` Ian Rogers
@ 2024-07-24  0:36         ` Andi Kleen
  0 siblings, 0 replies; 8+ messages in thread
From: Andi Kleen @ 2024-07-24  0:36 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-users

> The main reason would be so that perf_stat_process_counter could work
> and update the aggregation accordingly. Then you can dump out the
> aggregation, be it per CPU, per core, per socket, per cache-level,
> etc. as appropriate. 

But there's is none. It's only for a short time on a single CPU.

> If you follow the perf stat convention you are
> also much less likely to be broken in the future, as perf stat would
> also get broken. It is complicated spaghetti to work out how this
> stuff works, but that's why I fixed in the patch I sent out.

Hopefully the regression tests will prevent future breakage.

> Or let it set the aggregation mode and just let the aggregation code
> handle it when computing metrics? I think getting the STAT events
> isn't typical, so this is an academic argument.

I don't see why users shouldn't use perf stat record. It makes
sense for any large scale count collections. And it's tested by
the test suite at least.

-Andi

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-07-24  0:36 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-23 20:48 [PATCH v6 1/4] Create source symlink in perf object dir Andi Kleen
2024-07-23 20:48 ` [PATCH v6 2/4] perf test: Support external tests for separate objdir Andi Kleen
2024-07-23 20:48 ` [PATCH v6 3/4] perf script: Fix perf script -F +metric Andi Kleen
2024-07-23 21:32   ` Ian Rogers
2024-07-23 23:29     ` Andi Kleen
2024-07-24  0:05       ` Ian Rogers
2024-07-24  0:36         ` Andi Kleen
2024-07-23 20:48 ` [PATCH v6 4/4] Add a test case for " Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).