* [PATCH v6 1/4] Create source symlink in perf object dir
@ 2024-07-23 20:48 Andi Kleen
2024-07-23 20:48 ` [PATCH v6 2/4] perf test: Support external tests for separate objdir Andi Kleen
` (2 more replies)
0 siblings, 3 replies; 8+ messages in thread
From: Andi Kleen @ 2024-07-23 20:48 UTC (permalink / raw)
To: linux-perf-users; +Cc: Andi Kleen
Create a source symlink to the original source in the objdir.
This is similar to what the main kernel build script does.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
tools/perf/Makefile.perf | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 175e4c7898f0..d46892d8223b 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -163,6 +163,8 @@ ifneq ($(OUTPUT),)
# for flex/bison parsers.
VPATH += $(OUTPUT)
export VPATH
+# create symlink to the original source
+SOURCE := $(shell ln -sf $(srctree)/tools/perf $(OUTPUT)/source)
endif
ifeq ($(V),1)
--
2.45.2
^ permalink raw reply related [flat|nested] 8+ messages in thread* [PATCH v6 2/4] perf test: Support external tests for separate objdir 2024-07-23 20:48 [PATCH v6 1/4] Create source symlink in perf object dir Andi Kleen @ 2024-07-23 20:48 ` Andi Kleen 2024-07-23 20:48 ` [PATCH v6 3/4] perf script: Fix perf script -F +metric Andi Kleen 2024-07-23 20:48 ` [PATCH v6 4/4] Add a test case for " Andi Kleen 2 siblings, 0 replies; 8+ messages in thread From: Andi Kleen @ 2024-07-23 20:48 UTC (permalink / raw) To: linux-perf-users; +Cc: Andi Kleen Extend the searching for the test files so that it works when running perf from a separate objdir, and also when the perf executable is symlinked. Signed-off-by: Andi Kleen <ak@linux.intel.com> --- tools/perf/tests/tests-scripts.c | 31 ++++++++++++++++++++++++++++--- 1 file changed, 28 insertions(+), 3 deletions(-) diff --git a/tools/perf/tests/tests-scripts.c b/tools/perf/tests/tests-scripts.c index e2042b368269..63be17289ac3 100644 --- a/tools/perf/tests/tests-scripts.c +++ b/tools/perf/tests/tests-scripts.c @@ -29,16 +29,41 @@ static int shell_tests__dir_fd(void) { - char path[PATH_MAX], *exec_path; - static const char * const devel_dirs[] = { "./tools/perf/tests/shell", "./tests/shell", }; + struct stat st; + char path[PATH_MAX], path2[PATH_MAX], *exec_path; + static const char * const devel_dirs[] = { + "./tools/perf/tests/shell", + "./tests/shell", + "./source/tests/shell" + }; + int fd; + char *p; for (size_t i = 0; i < ARRAY_SIZE(devel_dirs); ++i) { - int fd = open(devel_dirs[i], O_PATH); + fd = open(devel_dirs[i], O_PATH); if (fd >= 0) return fd; } + /* Use directory of executable */ + if (readlink("/proc/self/exe", path2, sizeof path2) < 0) + return -1; + /* Follow another level of symlink if there */ + if (lstat(path2, &st) == 0 && (st.st_mode & S_IFMT) == S_IFLNK) { + scnprintf(path, sizeof(path), path2); + if (readlink(path, path2, sizeof path2) < 0) + return -1; + } + /* Get directory */ + p = strrchr(path2, '/'); + if (*p) + p[1] = 0; + scnprintf(path, sizeof(path), "%s/tests/shell", path2); + fd = open(path, O_PATH); + if (fd >= 0) + return fd; + /* Then installed path. */ exec_path = get_argv_exec_path(); scnprintf(path, sizeof(path), "%s/tests/shell", exec_path); -- 2.45.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v6 3/4] perf script: Fix perf script -F +metric 2024-07-23 20:48 [PATCH v6 1/4] Create source symlink in perf object dir Andi Kleen 2024-07-23 20:48 ` [PATCH v6 2/4] perf test: Support external tests for separate objdir Andi Kleen @ 2024-07-23 20:48 ` Andi Kleen 2024-07-23 21:32 ` Ian Rogers 2024-07-23 20:48 ` [PATCH v6 4/4] Add a test case for " Andi Kleen 2 siblings, 1 reply; 8+ messages in thread From: Andi Kleen @ 2024-07-23 20:48 UTC (permalink / raw) To: linux-perf-users; +Cc: Andi Kleen This fixes a regression with perf script -F +metric originally caused by : commit 37cc8ad77cf81f3ffd226856c367b0e15333a738 Author: Ian Rogers <irogers@google.com> Date: Sun Feb 19 01:28:46 2023 -0800 perf metric: Directly use counts rather than saved_value In the perf script environment the evsel wouldn't allocate an aggr values array, which led to a -1 reference because the metric evaluation would try to reference NULL - 1 (for aggr_idx) Give the perf script evsels a single CPU aggr setup. That's enough because the groups are always contiguous, so no need to store more than one CPU's worth of values. Before % perf record -e '{cycles,instructions}:S' perf bench mem memcpy % perf script -F +metric Segmentation fault (core dumped) After: % perf record -e '{cycles,instructions}:S' perf bench mem memcpy ... [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.028 MB perf.data (90 samples) ] % perf script -F +metric perf-exec 1847557 264658.180789: 3009 cycles: ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms]) perf-exec 1847557 264658.180789: 382 instructions: ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms]) perf-exec 1847557 264658.180789: metric: 0.13 insn per cycle ... Fixes: 37cc8ad77cf8 ("perf metric: Directly use counts rather ...") Signed-off-by: Andi Kleen <ak@linux.intel.com> ---- v2: Reformat code v3: Work around bogus warning v4: Set up aggr map only for metrics case to keep perf stat record working v5: Broken version v6: Only set up limited aggregation mode with -F +metric. Add conflict checks with perf stat record files. --- tools/perf/builtin-script.c | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index c16224b1fef3..1a4b9b3d240d 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -2133,12 +2133,17 @@ static void perf_sample__fprint_metric(struct perf_script *script, if (evsel_script(leader)->gnum++ == 0) perf_stat__reset_shadow_stats(); val = sample->period * evsel->scale; + /* + * Always use CPU 0 storage because the groups are contiguous + * and there's no need to handle multiple indexes for anything + */ + evsel->stats->aggr[0].counts.val = val; evsel_script(evsel)->val = val; if (evsel_script(leader)->gnum == leader->core.nr_members) { for_each_group_member (ev2, leader) { perf_stat__print_shadow_stats(&stat_config, ev2, evsel_script(ev2)->val, - sample->cpu, + 0, &ctx, NULL); } @@ -2325,6 +2330,20 @@ static void process_event(struct perf_script *script, fflush(fp); } +static void check_metric_conflict(void) +{ + int i; + /* + * Avoid conflict with the aggregation mode used for the metric printing. + */ + for (i = 0; i < OUTPUT_TYPE_MAX; i++) { + if (output[i].fields & PERF_OUTPUT_METRIC) { + fprintf(stderr, "perf stat record files are not supported with -F metric\n"); + exit(1); + } + } +} + static struct scripting_ops *scripting_ops; static void __process_stat(struct evsel *counter, u64 tstamp) @@ -2334,6 +2353,8 @@ static void __process_stat(struct evsel *counter, u64 tstamp) struct perf_cpu cpu; static int header_printed; + check_metric_conflict(); + if (!header_printed) { printf("%3s %8s %15s %15s %15s %15s %s\n", "CPU", "THREAD", "VAL", "ENA", "RUN", "TIME", "EVENT"); @@ -3725,6 +3746,8 @@ static int process_stat_config_event(struct perf_session *session __maybe_unused { perf_event__read_stat_config(&stat_config, &event->stat_config); + check_metric_conflict(); + /* * Aggregation modes are not used since post-processing scripts are * supposed to take care of such requirements @@ -3760,6 +3783,8 @@ int process_thread_map_event(struct perf_session *session, struct perf_tool *tool = session->tool; struct perf_script *script = container_of(tool, struct perf_script, tool); + check_metric_conflict(); + if (dump_trace) perf_event__fprintf_thread_map(event, stdout); @@ -3785,6 +3810,8 @@ int process_cpu_map_event(struct perf_session *session, if (dump_trace) perf_event__fprintf_cpu_map(event, stdout); + check_metric_conflict(); + if (script->cpus) { pr_warning("Extra cpu map event, ignoring.\n"); return 0; @@ -4088,6 +4115,10 @@ int cmd_script(int argc, const char **argv) argc = parse_options_subcommand(argc, argv, options, script_subcommands, script_usage, PARSE_OPT_STOP_AT_NON_OPTION); + for (i = 0; i < OUTPUT_TYPE_MAX; i++) { + if (output[i].fields & PERF_OUTPUT_METRIC) + stat_config.aggr_map = &(struct cpu_aggr_map){ .nr = 1 }; + } if (symbol_conf.guestmount || symbol_conf.default_guest_vmlinux_name || -- 2.45.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v6 3/4] perf script: Fix perf script -F +metric 2024-07-23 20:48 ` [PATCH v6 3/4] perf script: Fix perf script -F +metric Andi Kleen @ 2024-07-23 21:32 ` Ian Rogers 2024-07-23 23:29 ` Andi Kleen 0 siblings, 1 reply; 8+ messages in thread From: Ian Rogers @ 2024-07-23 21:32 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-perf-users On Tue, Jul 23, 2024 at 1:48 PM Andi Kleen <ak@linux.intel.com> wrote: > > This fixes a regression with perf script -F +metric originally caused by : > > commit 37cc8ad77cf81f3ffd226856c367b0e15333a738 > Author: Ian Rogers <irogers@google.com> > Date: Sun Feb 19 01:28:46 2023 -0800 > > perf metric: Directly use counts rather than saved_value > > In the perf script environment the evsel wouldn't allocate an aggr > values array, which led to a -1 reference because the metric > evaluation would try to reference NULL - 1 (for aggr_idx) > > Give the perf script evsels a single CPU aggr setup. That's > enough because the groups are always contiguous, so no need > to store more than one CPU's worth of values. I don't follow this. Samples have CPUs but you're associating all values with CPU0. Why not just use counts and aggregation properly? > Before > > % perf record -e '{cycles,instructions}:S' perf bench mem memcpy > % perf script -F +metric > Segmentation fault (core dumped) > > After: > > % perf record -e '{cycles,instructions}:S' perf bench mem memcpy > ... > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.028 MB perf.data (90 samples) ] > % perf script -F +metric > perf-exec 1847557 264658.180789: 3009 cycles: ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms]) > perf-exec 1847557 264658.180789: 382 instructions: ffffffff990a579a native_write_msr+0xa ([kernel.kallsyms]) > perf-exec 1847557 264658.180789: metric: 0.13 insn per cycle > ... > > Fixes: 37cc8ad77cf8 ("perf metric: Directly use counts rather ...") > Signed-off-by: Andi Kleen <ak@linux.intel.com> > > ---- > > v2: Reformat code > v3: Work around bogus warning > v4: Set up aggr map only for metrics case to keep perf stat record > working > v5: Broken version > v6: Only set up limited aggregation mode with -F +metric. Add conflict > checks with perf stat record files. > --- > tools/perf/builtin-script.c | 33 ++++++++++++++++++++++++++++++++- > 1 file changed, 32 insertions(+), 1 deletion(-) > > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c > index c16224b1fef3..1a4b9b3d240d 100644 > --- a/tools/perf/builtin-script.c > +++ b/tools/perf/builtin-script.c > @@ -2133,12 +2133,17 @@ static void perf_sample__fprint_metric(struct perf_script *script, > if (evsel_script(leader)->gnum++ == 0) > perf_stat__reset_shadow_stats(); > val = sample->period * evsel->scale; > + /* > + * Always use CPU 0 storage because the groups are contiguous > + * and there's no need to handle multiple indexes for anything > + */ > + evsel->stats->aggr[0].counts.val = val; 0 isn't necessary CPU0, it is what is the first aggregation cpu_aggr_map which can be CPU1, it can be a thread map index. The comment is confusing an index with a CPU number. > evsel_script(evsel)->val = val; Why store val twice? Yes it is read below, but now you can also just read it from the counts. Not that it matters in the cases that apply but for json metrics all counts are unscaled. > if (evsel_script(leader)->gnum == leader->core.nr_members) { > for_each_group_member (ev2, leader) { > perf_stat__print_shadow_stats(&stat_config, ev2, > evsel_script(ev2)->val, > - sample->cpu, > + 0, > &ctx, > NULL); > } > @@ -2325,6 +2330,20 @@ static void process_event(struct perf_script *script, > fflush(fp); > } > > +static void check_metric_conflict(void) > +{ > + int i; > + /* > + * Avoid conflict with the aggregation mode used for the metric printing. > + */ > + for (i = 0; i < OUTPUT_TYPE_MAX; i++) { > + if (output[i].fields & PERF_OUTPUT_METRIC) { > + fprintf(stderr, "perf stat record files are not supported with -F metric\n"); > + exit(1); > + } > + } > +} > + No idea what this is doing. What's conflicting with what? > static struct scripting_ops *scripting_ops; > > static void __process_stat(struct evsel *counter, u64 tstamp) > @@ -2334,6 +2353,8 @@ static void __process_stat(struct evsel *counter, u64 tstamp) > struct perf_cpu cpu; > static int header_printed; > > + check_metric_conflict(); > + > if (!header_printed) { > printf("%3s %8s %15s %15s %15s %15s %s\n", > "CPU", "THREAD", "VAL", "ENA", "RUN", "TIME", "EVENT"); > @@ -3725,6 +3746,8 @@ static int process_stat_config_event(struct perf_session *session __maybe_unused > { > perf_event__read_stat_config(&stat_config, &event->stat_config); > > + check_metric_conflict(); > + > /* > * Aggregation modes are not used since post-processing scripts are > * supposed to take care of such requirements > @@ -3760,6 +3783,8 @@ int process_thread_map_event(struct perf_session *session, > struct perf_tool *tool = session->tool; > struct perf_script *script = container_of(tool, struct perf_script, tool); > > + check_metric_conflict(); > + > if (dump_trace) > perf_event__fprintf_thread_map(event, stdout); > > @@ -3785,6 +3810,8 @@ int process_cpu_map_event(struct perf_session *session, > if (dump_trace) > perf_event__fprintf_cpu_map(event, stdout); > > + check_metric_conflict(); > + > if (script->cpus) { > pr_warning("Extra cpu map event, ignoring.\n"); > return 0; > @@ -4088,6 +4115,10 @@ int cmd_script(int argc, const char **argv) > > argc = parse_options_subcommand(argc, argv, options, script_subcommands, script_usage, > PARSE_OPT_STOP_AT_NON_OPTION); > + for (i = 0; i < OUTPUT_TYPE_MAX; i++) { > + if (output[i].fields & PERF_OUTPUT_METRIC) > + stat_config.aggr_map = &(struct cpu_aggr_map){ .nr = 1 }; Assigning the address a temporary rval to a global variable seems wrong to the point I'm surprised it compiles. Accessing stat_config.aggr_map->map[0] will lead to reading beyond the end of the value and presumably read uninitialized memory. Compiling with EXTRA_CFLAGS="-fsanitize=address" should complain about all of this. Thanks, Ian > + } > > if (symbol_conf.guestmount || > symbol_conf.default_guest_vmlinux_name || > -- > 2.45.2 > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v6 3/4] perf script: Fix perf script -F +metric 2024-07-23 21:32 ` Ian Rogers @ 2024-07-23 23:29 ` Andi Kleen 2024-07-24 0:05 ` Ian Rogers 0 siblings, 1 reply; 8+ messages in thread From: Andi Kleen @ 2024-07-23 23:29 UTC (permalink / raw) To: Ian Rogers; +Cc: linux-perf-users On Tue, Jul 23, 2024 at 02:32:33PM -0700, Ian Rogers wrote: > On Tue, Jul 23, 2024 at 1:48 PM Andi Kleen <ak@linux.intel.com> wrote: > > > > This fixes a regression with perf script -F +metric originally caused by : > > > > commit 37cc8ad77cf81f3ffd226856c367b0e15333a738 > > Author: Ian Rogers <irogers@google.com> > > Date: Sun Feb 19 01:28:46 2023 -0800 > > > > perf metric: Directly use counts rather than saved_value > > > > In the perf script environment the evsel wouldn't allocate an aggr > > values array, which led to a -1 reference because the metric > > evaluation would try to reference NULL - 1 (for aggr_idx) > > > > Give the perf script evsels a single CPU aggr setup. That's > > enough because the groups are always contiguous, so no need > > to store more than one CPU's worth of values. > > I don't follow this. Samples have CPUs but you're associating all > values with CPU0. Why not just use counts and aggregation properly? Why use something that is not needed? It's not needed because the CPUs are not interleaved because the code is just processing a single group which only has counts from a single CPU. And there is no need to output an extra index for the count because the sample already has all the context. If it was extended to multiple groups it would be needed, but it's not clear how useful it is. The benefit of the feature is that you can get the metric at a very fine grained level -- only for the time interval since the last sample. Doing it for multiple groups means you would do some level of aggregation over longer time periods.. You could handle more complex metrics, but would lose this very fine grain benefit. If you want that it's probably better use perf report's time slicing feature instead of perf script. That one currently doesn't support metrics though, but probably it should. > > @@ -2325,6 +2330,20 @@ static void process_event(struct perf_script *script, > > fflush(fp); > > } > > > > +static void check_metric_conflict(void) > > +{ > > + int i; > > + /* > > + * Avoid conflict with the aggregation mode used for the metric printing. > > + */ > > + for (i = 0; i < OUTPUT_TYPE_MAX; i++) { > > + if (output[i].fields & PERF_OUTPUT_METRIC) { > > + fprintf(stderr, "perf stat record files are not supported with -F metric\n"); > > + exit(1); > > + } > > + } > > +} > > + > > No idea what this is doing. What's conflicting with what? The conflict is between the -F +metric setup and processing STAT* records in the perf.data (as generated by perf stat record) The later uses AGGR_NONE which is a conflict. > > > > @@ -3785,6 +3810,8 @@ int process_cpu_map_event(struct perf_session *session, > > if (dump_trace) > > perf_event__fprintf_cpu_map(event, stdout); > > > > + check_metric_conflict(); > > + > > if (script->cpus) { > > pr_warning("Extra cpu map event, ignoring.\n"); > > return 0; > > @@ -4088,6 +4115,10 @@ int cmd_script(int argc, const char **argv) > > > > argc = parse_options_subcommand(argc, argv, options, script_subcommands, script_usage, > > PARSE_OPT_STOP_AT_NON_OPTION); > > + for (i = 0; i < OUTPUT_TYPE_MAX; i++) { > > + if (output[i].fields & PERF_OUTPUT_METRIC) > > + stat_config.aggr_map = &(struct cpu_aggr_map){ .nr = 1 }; > > Assigning the address a temporary rval to a global variable seems > wrong to the point I'm surprised it compiles. Accessing AFAIK gcc keeps the local around for the function, but you're right it's not good code, especially with the buffer overrun. > stat_config.aggr_map->map[0] will lead to reading beyond the end of > the value and presumably read uninitialized memory. Compiling with > EXTRA_CFLAGS="-fsanitize=address" should complain about all of this. Good point, but I doubt the address sanitizer would have caught it because it doesn't really track the stack. -Andi ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v6 3/4] perf script: Fix perf script -F +metric 2024-07-23 23:29 ` Andi Kleen @ 2024-07-24 0:05 ` Ian Rogers 2024-07-24 0:36 ` Andi Kleen 0 siblings, 1 reply; 8+ messages in thread From: Ian Rogers @ 2024-07-24 0:05 UTC (permalink / raw) To: Andi Kleen; +Cc: linux-perf-users On Tue, Jul 23, 2024 at 4:29 PM Andi Kleen <ak@linux.intel.com> wrote: > > On Tue, Jul 23, 2024 at 02:32:33PM -0700, Ian Rogers wrote: > > On Tue, Jul 23, 2024 at 1:48 PM Andi Kleen <ak@linux.intel.com> wrote: > > > > > > This fixes a regression with perf script -F +metric originally caused by : > > > > > > commit 37cc8ad77cf81f3ffd226856c367b0e15333a738 > > > Author: Ian Rogers <irogers@google.com> > > > Date: Sun Feb 19 01:28:46 2023 -0800 > > > > > > perf metric: Directly use counts rather than saved_value > > > > > > In the perf script environment the evsel wouldn't allocate an aggr > > > values array, which led to a -1 reference because the metric > > > evaluation would try to reference NULL - 1 (for aggr_idx) > > > > > > Give the perf script evsels a single CPU aggr setup. That's > > > enough because the groups are always contiguous, so no need > > > to store more than one CPU's worth of values. > > > > I don't follow this. Samples have CPUs but you're associating all > > values with CPU0. Why not just use counts and aggregation properly? > > Why use something that is not needed? The main reason would be so that perf_stat_process_counter could work and update the aggregation accordingly. Then you can dump out the aggregation, be it per CPU, per core, per socket, per cache-level, etc. as appropriate. If you follow the perf stat convention you are also much less likely to be broken in the future, as perf stat would also get broken. It is complicated spaghetti to work out how this stuff works, but that's why I fixed in the patch I sent out. > It's not needed because the CPUs are not interleaved because the code > is just processing a single group which only has counts from > a single CPU. And there is no need to output an extra index for the > count because the sample already has all the context. > > If it was extended to multiple groups it would be needed, but it's not > clear how useful it is. The benefit of the feature is that you > can get the metric at a very fine grained level -- only for the time interval > since the last sample. > > Doing it for multiple groups means you would do some level of > aggregation over longer time periods.. You could handle more complex metrics, > but would lose this very fine grain benefit. If you want that it's probably > better use perf report's time slicing feature instead of perf script. > That one currently doesn't support metrics though, but probably it > should. > > > > @@ -2325,6 +2330,20 @@ static void process_event(struct perf_script *script, > > > fflush(fp); > > > } > > > > > > +static void check_metric_conflict(void) > > > +{ > > > + int i; > > > + /* > > > + * Avoid conflict with the aggregation mode used for the metric printing. > > > + */ > > > + for (i = 0; i < OUTPUT_TYPE_MAX; i++) { > > > + if (output[i].fields & PERF_OUTPUT_METRIC) { > > > + fprintf(stderr, "perf stat record files are not supported with -F metric\n"); > > > + exit(1); > > > + } > > > + } > > > +} > > > + > > > > No idea what this is doing. What's conflicting with what? > > The conflict is between the -F +metric setup and processing STAT* > records in the perf.data (as generated by perf stat record) > The later uses AGGR_NONE which is a conflict. Or let it set the aggregation mode and just let the aggregation code handle it when computing metrics? I think getting the STAT events isn't typical, so this is an academic argument. I'm just thinking about future me, wondering why on earth this exists. Thanks, Ian > > > > > > @@ -3785,6 +3810,8 @@ int process_cpu_map_event(struct perf_session *session, > > > if (dump_trace) > > > perf_event__fprintf_cpu_map(event, stdout); > > > > > > + check_metric_conflict(); > > > + > > > if (script->cpus) { > > > pr_warning("Extra cpu map event, ignoring.\n"); > > > return 0; > > > @@ -4088,6 +4115,10 @@ int cmd_script(int argc, const char **argv) > > > > > > argc = parse_options_subcommand(argc, argv, options, script_subcommands, script_usage, > > > PARSE_OPT_STOP_AT_NON_OPTION); > > > + for (i = 0; i < OUTPUT_TYPE_MAX; i++) { > > > + if (output[i].fields & PERF_OUTPUT_METRIC) > > > + stat_config.aggr_map = &(struct cpu_aggr_map){ .nr = 1 }; > > > > Assigning the address a temporary rval to a global variable seems > > wrong to the point I'm surprised it compiles. Accessing > > AFAIK gcc keeps the local around for the function, but you're right > it's not good code, especially with the buffer overrun. > > > > stat_config.aggr_map->map[0] will lead to reading beyond the end of > > the value and presumably read uninitialized memory. Compiling with > > EXTRA_CFLAGS="-fsanitize=address" should complain about all of this. > > Good point, but I doubt the address sanitizer would have caught > it because it doesn't really track the stack. > > -Andi ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v6 3/4] perf script: Fix perf script -F +metric 2024-07-24 0:05 ` Ian Rogers @ 2024-07-24 0:36 ` Andi Kleen 0 siblings, 0 replies; 8+ messages in thread From: Andi Kleen @ 2024-07-24 0:36 UTC (permalink / raw) To: Ian Rogers; +Cc: linux-perf-users > The main reason would be so that perf_stat_process_counter could work > and update the aggregation accordingly. Then you can dump out the > aggregation, be it per CPU, per core, per socket, per cache-level, > etc. as appropriate. But there's is none. It's only for a short time on a single CPU. > If you follow the perf stat convention you are > also much less likely to be broken in the future, as perf stat would > also get broken. It is complicated spaghetti to work out how this > stuff works, but that's why I fixed in the patch I sent out. Hopefully the regression tests will prevent future breakage. > Or let it set the aggregation mode and just let the aggregation code > handle it when computing metrics? I think getting the STAT events > isn't typical, so this is an academic argument. I don't see why users shouldn't use perf stat record. It makes sense for any large scale count collections. And it's tested by the test suite at least. -Andi ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH v6 4/4] Add a test case for perf script -F +metric 2024-07-23 20:48 [PATCH v6 1/4] Create source symlink in perf object dir Andi Kleen 2024-07-23 20:48 ` [PATCH v6 2/4] perf test: Support external tests for separate objdir Andi Kleen 2024-07-23 20:48 ` [PATCH v6 3/4] perf script: Fix perf script -F +metric Andi Kleen @ 2024-07-23 20:48 ` Andi Kleen 2 siblings, 0 replies; 8+ messages in thread From: Andi Kleen @ 2024-07-23 20:48 UTC (permalink / raw) To: linux-perf-users; +Cc: Andi Kleen Just a simple test Signed-off-by: Andi Kleen <ak@linux.intel.com> ---- v2: Avoid bashisms. Use noploop v3: Avoid false positive in shellcheck --- tools/perf/tests/shell/script.sh | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/tools/perf/tests/shell/script.sh b/tools/perf/tests/shell/script.sh index c1a603653662..5e080e40b390 100755 --- a/tools/perf/tests/shell/script.sh +++ b/tools/perf/tests/shell/script.sh @@ -7,6 +7,7 @@ set -e temp_dir=$(mktemp -d /tmp/perf-test-script.XXXXXXXXXX) perfdatafile="${temp_dir}/perf.data" +scriptoutput="${temp_dir}/script" db_test="${temp_dir}/db_test.py" err=0 @@ -88,8 +89,21 @@ test_parallel_perf() echo "parallel-perf test [Success]" } +test_metric() +{ + echo "script metric test" + if ! perf list | grep -q cycles ; then return ; fi + if ! perf list | grep -q instructions ; then return ; fi + perf record -e '{cycles,instructions}' -o "${perfdatafile}" perf test -w noploop + perf script -i "${perfdatafile}" -F +metric > $scriptoutput + test "`grep -c metric $scriptoutput`" -gt 5 + grep metric $scriptoutput | head + echo "script metric test [Success]" +} + test_db test_parallel_perf +test_metric cleanup -- 2.45.2 ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2024-07-24 0:36 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-07-23 20:48 [PATCH v6 1/4] Create source symlink in perf object dir Andi Kleen 2024-07-23 20:48 ` [PATCH v6 2/4] perf test: Support external tests for separate objdir Andi Kleen 2024-07-23 20:48 ` [PATCH v6 3/4] perf script: Fix perf script -F +metric Andi Kleen 2024-07-23 21:32 ` Ian Rogers 2024-07-23 23:29 ` Andi Kleen 2024-07-24 0:05 ` Ian Rogers 2024-07-24 0:36 ` Andi Kleen 2024-07-23 20:48 ` [PATCH v6 4/4] Add a test case for " Andi Kleen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).