* [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes
@ 2026-06-16 1:27 Ian Rogers
2026-06-16 1:27 ` [PATCH v1 01/12] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers
` (12 more replies)
0 siblings, 13 replies; 43+ messages in thread
From: Ian Rogers @ 2026-06-16 1:27 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark,
Thomas Falcon, Leo Yan, Thomas Richter, linux-kernel,
linux-perf-users
This patch series introduces a comprehensive set of improvements to the perf
test suite, focusing on execution speed, reliability, and correctness.
It contains the following key updates:
1. Speed optimizations for shell tests (lock contention, off-cpu profiling,
metrics validation) by scaling down workload durations and using lighter
system-wide defaults.
2. Fixes for test flakiness in branch stack sampling, BPF counters on hybrid
systems, Python JIT dump profiling, and trace record/replay.
3. A new robust retry helper (perf_record_with_retry) to ensure reliable
recording of intermittent events in slow or virtualized environments.
4. Support for sub-second durations in noploop and thloop workloads to
accelerate tests that only require brief sampling windows.
5. Fixes for terminal width truncation during parallel test execution, ensuring
the progress output does not wrap or corrupt the screen when test names
exceed the terminal width (including skipped tests).
6. Restrictions on core PMU bypass parsing to correctly enforce the
--pmu-filter option.
The combined changes drastically reduce the total runtime of the test suite
while improving the stability of hardware-dependent and sampling-based tests.
Ian Rogers (12):
perf parse-events: Restrict core PMU bypass to --cputype option
perf test: Truncate test description to fit terminal width
perf tests workloads: Support sub-second durations in noploop and
thloop
perf tests: Add robust record retry helper and use subsecond workloads
perf tests: Skip metrics validation if system-wide recording lacks
permission
perf tests: Fix Python JIT dump profiling test failure
perf tests: Fix flakiness in trace record and replay test
perf tests: Fix flakiness in BPF counters test on hybrid systems
perf tests: Fix flakiness in branch stack sampling tests
perf tests: Speed up off-cpu profiling tests
perf tests: Speed up lock contention analysis shell test
perf tests: Speed up metrics checking shell tests
tools/perf/builtin-stat.c | 2 +
tools/perf/tests/builtin-test.c | 95 +++++++---
tools/perf/tests/parse-events.c | 11 +-
tools/perf/tests/pmu-events.c | 6 +-
tools/perf/tests/shell/jitdump-python.sh | 66 ++++---
tools/perf/tests/shell/kvm.sh | 61 ++++---
.../tests/shell/lib/perf_metric_validation.py | 11 +-
tools/perf/tests/shell/lib/perf_record.sh | 52 ++++++
tools/perf/tests/shell/lock_contention.sh | 30 +--
tools/perf/tests/shell/pipe_test.sh | 4 +-
tools/perf/tests/shell/record.sh | 172 +++++++++---------
tools/perf/tests/shell/record_lbr.sh | 50 +++--
tools/perf/tests/shell/record_offcpu.sh | 12 +-
tools/perf/tests/shell/stat_all_metrics.sh | 65 ++++---
tools/perf/tests/shell/stat_all_pfm.sh | 2 +-
tools/perf/tests/shell/stat_bpf_counters.sh | 27 ++-
tools/perf/tests/shell/stat_metrics_values.sh | 9 +-
tools/perf/tests/shell/test_brstack.sh | 6 +-
tools/perf/tests/shell/trace_record_replay.sh | 18 +-
tools/perf/tests/workloads/noploop.c | 15 +-
tools/perf/tests/workloads/thloop.c | 14 +-
tools/perf/util/metricgroup.c | 4 +-
tools/perf/util/parse-events.c | 22 ++-
tools/perf/util/parse-events.h | 17 +-
24 files changed, 494 insertions(+), 277 deletions(-)
create mode 100644 tools/perf/tests/shell/lib/perf_record.sh
--
2.54.0.1136.gdb2ca164c4-goog
^ permalink raw reply [flat|nested] 43+ messages in thread* [PATCH v1 01/12] perf parse-events: Restrict core PMU bypass to --cputype option 2026-06-16 1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers @ 2026-06-16 1:27 ` Ian Rogers 2026-06-16 1:27 ` [PATCH v1 02/12] perf test: Truncate test description to fit terminal width Ian Rogers ` (11 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 1:27 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon, Leo Yan, Thomas Richter, linux-kernel, linux-perf-users Commit b1c5efbfd92e ("perf parse-events: Remove hard coded legacy hardware and cache parsing") introduced a bypass to PMU filtering to prevent uncore PMUs from being filtered out during event parsing, which was required for resolving `duration_time` and `uncore_freq` when running with `--cputype`. However, this bypass was active whenever `pmu_filter` was set, which also incorrectly bypassed filtering for the `--pmu-filter` option. Introduce a `cputype_filter` boolean flag in `parse_events_state` and `parse_events_option_args` to distinguish filtering initiated by `--cputype` from that initiated by `--pmu-filter`. Restrict the core-only check in `parse_events__filter_pmu()` to when `cputype_filter` is true. Fixes: b1c5efbfd92e ("perf parse-events: Remove hard coded legacy hardware and cache parsing") Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/builtin-stat.c | 2 ++ tools/perf/tests/parse-events.c | 11 +++++++---- tools/perf/tests/pmu-events.c | 6 ++++-- tools/perf/util/metricgroup.c | 4 +++- tools/perf/util/parse-events.c | 22 ++++++++++++++-------- tools/perf/util/parse-events.h | 17 +++++++++++------ 6 files changed, 41 insertions(+), 21 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index a04466ea3b0a..cc682740dccb 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -1210,6 +1210,7 @@ static int parse_cputype(const struct option *opt, return -1; } parse_events_option_args.pmu_filter = pmu->name; + parse_events_option_args.cputype_filter = true; return 0; } @@ -1226,6 +1227,7 @@ static int parse_pmu_filter(const struct option *opt, } parse_events_option_args.pmu_filter = str; + parse_events_option_args.cputype_filter = false; return 0; } diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c index 05c3e899b425..2cbe81b9c886 100644 --- a/tools/perf/tests/parse-events.c +++ b/tools/perf/tests/parse-events.c @@ -2556,8 +2556,10 @@ static int test_event(const struct evlist_test *e) return TEST_FAIL; } parse_events_error__init(&err); - ret = __parse_events(evlist, e->name, /*pmu_filter=*/NULL, &err, /*fake_pmu=*/false, - /*warn_if_reordered=*/true, /*fake_tp=*/true); + ret = __parse_events(evlist, e->name, /*pmu_filter=*/NULL, + /*cputype_filter=*/false, &err, /*fake_pmu=*/false, + /*warn_if_reordered=*/true, + /*fake_tp=*/true); if (ret) { pr_debug("failed to parse event '%s', err %d\n", e->name, ret); parse_events_error__print(&err, e->name); @@ -2584,8 +2586,9 @@ static int test_event_fake_pmu(const char *str) return -ENOMEM; parse_events_error__init(&err); - ret = __parse_events(evlist, str, /*pmu_filter=*/NULL, &err, - /*fake_pmu=*/true, /*warn_if_reordered=*/true, + ret = __parse_events(evlist, str, /*pmu_filter=*/NULL, + /*cputype_filter=*/false, &err, /*fake_pmu=*/true, + /*warn_if_reordered=*/true, /*fake_tp=*/true); if (ret) { pr_debug("failed to parse event '%s', err %d\n", diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c index fd5630f0a13c..962bc101e967 100644 --- a/tools/perf/tests/pmu-events.c +++ b/tools/perf/tests/pmu-events.c @@ -794,8 +794,10 @@ static int check_parse_id(const char *id, struct parse_events_error *error) for (cur = strchr(dup, '@') ; cur; cur = strchr(++cur, '@')) *cur = '/'; - ret = __parse_events(evlist, dup, /*pmu_filter=*/NULL, error, /*fake_pmu=*/true, - /*warn_if_reordered=*/true, /*fake_tp=*/false); + ret = __parse_events(evlist, dup, /*pmu_filter=*/NULL, + /*cputype_filter=*/false, error, /*fake_pmu=*/true, + /*warn_if_reordered=*/true, + /*fake_tp=*/false); free(dup); evlist__delete(evlist); diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index c2ce3e53aaee..e9854d7bf07a 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -1317,7 +1317,9 @@ static int parse_ids(bool metric_no_merge, bool fake_pmu, pr_debug("Parsing metric events '%s'\n", events.buf); parse_events_error__init(&parse_error); ret = __parse_events(parsed_evlist, events.buf, filter_pmu, - &parse_error, fake_pmu, /*warn_if_reordered=*/false, + /*cputype_filter=*/filter_pmu != NULL, + &parse_error, fake_pmu, + /*warn_if_reordered=*/false, /*fake_tp=*/false); if (ret) { parse_events_error__print(&parse_error, events.buf); diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 943569e82b82..9f64d5197b8a 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -429,6 +429,9 @@ bool parse_events__filter_pmu(const struct parse_events_state *parse_state, if (parse_state->pmu_filter == NULL) return false; + if (parse_state->cputype_filter && !pmu->is_core) + return false; + return perf_pmu__wildcard_match(pmu, parse_state->pmu_filter) == 0; } @@ -2288,18 +2291,20 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list) return (idx_changed || num_leaders != orig_num_leaders) ? 1 : 0; } -int __parse_events(struct evlist *evlist, const char *str, const char *pmu_filter, +int __parse_events(struct evlist *evlist, const char *str, + const char *pmu_filter, bool cputype_filter, struct parse_events_error *err, bool fake_pmu, bool warn_if_reordered, bool fake_tp) { struct parse_events_state parse_state = { - .list = LIST_HEAD_INIT(parse_state.list), - .idx = evlist->core.nr_entries, - .error = err, - .stoken = PE_START_EVENTS, + .list = LIST_HEAD_INIT(parse_state.list), + .idx = evlist->core.nr_entries, + .error = err, + .stoken = PE_START_EVENTS, .fake_pmu = fake_pmu, - .fake_tp = fake_tp, + .fake_tp = fake_tp, .pmu_filter = pmu_filter, + .cputype_filter = cputype_filter, .match_legacy_cache_terms = true, }; int ret, ret2; @@ -2518,8 +2523,9 @@ int parse_events_option(const struct option *opt, const char *str, int ret; parse_events_error__init(&err); - ret = __parse_events(*args->evlistp, str, args->pmu_filter, &err, - /*fake_pmu=*/false, /*warn_if_reordered=*/true, + ret = __parse_events(*args->evlistp, str, args->pmu_filter, + args->cputype_filter, &err, /*fake_pmu=*/false, + /*warn_if_reordered=*/true, /*fake_tp=*/false); if (ret) { diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h index 3577ab213730..b14c832b03a1 100644 --- a/tools/perf/util/parse-events.h +++ b/tools/perf/util/parse-events.h @@ -26,20 +26,23 @@ const char *event_type(size_t type); struct parse_events_option_args { struct evlist **evlistp; const char *pmu_filter; + bool cputype_filter; }; int parse_events_option(const struct option *opt, const char *str, int unset); int parse_events_option_new_evlist(const struct option *opt, const char *str, int unset); -__attribute__((nonnull(1, 2, 4))) -int __parse_events(struct evlist *evlist, const char *str, const char *pmu_filter, - struct parse_events_error *error, bool fake_pmu, - bool warn_if_reordered, bool fake_tp); +__attribute__((nonnull(1, 2, 5))) int +__parse_events(struct evlist *evlist, const char *str, const char *pmu_filter, + bool cputype_filter, struct parse_events_error *error, + bool fake_pmu, bool warn_if_reordered, bool fake_tp); __attribute__((nonnull(1, 2, 3))) static inline int parse_events(struct evlist *evlist, const char *str, struct parse_events_error *err) { - return __parse_events(evlist, str, /*pmu_filter=*/NULL, err, /*fake_pmu=*/false, - /*warn_if_reordered=*/true, /*fake_tp=*/false); + return __parse_events(evlist, str, /*pmu_filter=*/NULL, + /*cputype_filter=*/false, err, /*fake_pmu=*/false, + /*warn_if_reordered=*/true, + /*fake_tp=*/false); } int parse_event(struct evlist *evlist, const char *str); @@ -161,6 +164,8 @@ struct parse_events_state { bool fake_tp; /* If non-null, when wildcard matching only match the given PMU. */ const char *pmu_filter; + /* If true, the pmu_filter was set by --cputype option. */ + bool cputype_filter; /* Should PE_LEGACY_NAME tokens be generated for config terms? */ bool match_legacy_cache_terms; /* Were multiple PMUs scanned to find events? */ -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v1 02/12] perf test: Truncate test description to fit terminal width 2026-06-16 1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers 2026-06-16 1:27 ` [PATCH v1 01/12] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers @ 2026-06-16 1:27 ` Ian Rogers 2026-06-16 1:27 ` [PATCH v1 03/12] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers ` (10 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 1:27 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon, Leo Yan, Thomas Richter, linux-kernel, linux-perf-users The parallel test harness uses the carriage return delete escape sequence `PERF_COLOR_DELETE_LINE` ("\033[A\33[2K\r") to erase and update the "Running (X active)" progress lines. However, if a test description is longer than the terminal width, the line wraps around. When this happens, the cursor up escape sequence `\033[A` only moves the cursor to the last wrapped row, leaving the top half of the description printed on the previous line. This leads to name duplication and output corruption spilling over multiple rows on consoles narrower than the maximum description length (e.g., 101 columns wide). Fix this by dynamically querying the terminal width using `get_term_dimensions` and truncating the printed test descriptions using the `%-*.*s` printf format. We reserve 35 characters for prefix, status, and spacing metrics to guarantee the progress line never wraps. Fixes: 0e036dcad4e6 ("perf test: Display number of active running tests") Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/builtin-test.c | 95 ++++++++++++++++++++++++--------- 1 file changed, 71 insertions(+), 24 deletions(-) diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c index afc06cec4954..51484e84d7c1 100644 --- a/tools/perf/tests/builtin-test.c +++ b/tools/perf/tests/builtin-test.c @@ -10,35 +10,39 @@ #ifdef HAVE_BACKTRACE_SUPPORT #include <execinfo.h> #endif -#include <poll.h> -#include <unistd.h> #include <setjmp.h> -#include <string.h> #include <stdlib.h> -#include <sys/types.h> +#include <string.h> + #include <dirent.h> -#include <sys/wait.h> +#include <linux/kernel.h> +#include <linux/string.h> +#include <linux/zalloc.h> +#include <poll.h> +#include <sys/ioctl.h> #include <sys/stat.h> #include <sys/time.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <unistd.h> + +#include <subcmd/exec-cmd.h> +#include <subcmd/parse-options.h> +#include <subcmd/run-command.h> + #include "builtin.h" +#include "color.h" #include "config.h" +#include "debug.h" #include "hist.h" #include "intlist.h" -#include "tests.h" -#include "debug.h" -#include "color.h" -#include <subcmd/parse-options.h> -#include <subcmd/run-command.h> #include "string2.h" #include "symbol.h" +#include "tests-scripts.h" +#include "tests.h" #include "util/rlimit.h" #include "util/strbuf.h" -#include <linux/kernel.h> -#include <linux/string.h> -#include <subcmd/exec-cmd.h> -#include <linux/zalloc.h> - -#include "tests-scripts.h" +#include "util/term.h" static const char *junit_filename; static struct strbuf junit_xml_buf = STRBUF_INIT; @@ -413,19 +417,52 @@ static char *xml_escape(const char *str) return res ? res : strdup(""); } +static int get_max_desc_width(int width) +{ + struct winsize ws; + int cols = 80; + int max_desc_width; + + get_term_dimensions(&ws); + if (ws.ws_col > 0) + cols = ws.ws_col; + + /* + * Limit description width to fit on a single line. We subtract 35 + * columns of headroom to allocate space for: + * - The suite index prefix: e.g. " 10.100:" (9 characters). + * - The colon separator and spaces: " : " (3 characters). + * - The longest status results: e.g. "Skip (some metrics failed)" (26 characters) + * or "Running (XX active)" (20 characters). + * + * A minimum description width of 10 is enforced to ensure names are + * legible even on very narrow consoles. + */ + max_desc_width = cols - 35; + if (max_desc_width < 10) + max_desc_width = 10; + + return width > max_desc_width ? max_desc_width : width; +} + static int print_test_result(struct test_suite *t, int curr_suite, int curr_test_case, int result, int width, int running, const char *err_output, double elapsed) { + width = get_max_desc_width(width); + if (test_suite__num_test_cases(t) > 1) { char prefix[32]; int len = snprintf(prefix, sizeof(prefix), "%3d.%1d:", curr_suite + 1, curr_test_case + 1); int subw = len >= 4 ? width + 4 - len : width; - pr_info("%s %-*s:", prefix, subw, test_description(t, curr_test_case)); - } else - pr_info("%3d: %-*s:", curr_suite + 1, width, test_description(t, curr_test_case)); + pr_info("%s %-*.*s:", prefix, subw, subw, + test_description(t, curr_test_case)); + } else { + pr_info("%3d: %-*.*s:", curr_suite + 1, width, width, + test_description(t, curr_test_case)); + } switch (result) { case TEST_RUNNING: @@ -695,6 +732,7 @@ static void finish_test(struct child_test **child_tests, int running_test, int c int ret; struct timespec end_time; double elapsed; + width = get_max_desc_width(width); if (child_test == NULL) { /* Test wasn't started. */ @@ -709,7 +747,8 @@ static void finish_test(struct child_test **child_tests, int running_test, int c * sub test names. */ if (test_suite__num_test_cases(t) > 1 && curr_test_case == 0) - pr_info("%3d: %-*s:\n", curr_suite + 1, width, test_description(t, -1)); + pr_info("%3d: %-*.*s:\n", curr_suite + 1, width, width, + test_description(t, -1)); /* * Busy loop reading from the child's stdout/stderr that are set to be @@ -917,6 +956,8 @@ static int finish_tests_parallel(struct child_test **child_tests, size_t num_tes int last_suite_printed = -1; sigset_t set, oldset; + width = get_max_desc_width(width); + sigemptyset(&set); sigaddset(&set, SIGINT); sigaddset(&set, SIGTERM); @@ -985,8 +1026,11 @@ static int finish_tests_parallel(struct child_test **child_tests, size_t num_tes if (next_child) { if (test_suite__num_test_cases(next_child->test) > 1 && last_suite_printed != next_child->suite_num) { - pr_info("%3d: %-*s:\n", next_child->suite_num + 1, width, - test_description(next_child->test, -1)); + pr_info("%3d: %-*.*s:\n", + next_child->suite_num + 1, + width, width, + test_description( + next_child->test, -1)); last_suite_printed = next_child->suite_num; } print_test_result(next_child->test, next_child->suite_num, @@ -1049,7 +1093,8 @@ static int finish_tests_parallel(struct child_test **child_tests, size_t num_tes if (test_suite__num_test_cases(child->test) > 1 && last_suite_printed != child->suite_num) { - pr_info("%3d: %-*s:\n", child->suite_num + 1, width, + pr_info("%3d: %-*.*s:\n", child->suite_num + 1, + width, width, test_description(child->test, -1)); last_suite_printed = child->suite_num; } @@ -1296,7 +1341,9 @@ static int __cmd_test(struct test_suite **suites, int argc, const char *argv[], if (intlist__find(skiplist, curr_suite + 1)) { if (pass == 1) { - pr_info("%3d: %-*s:", curr_suite + 1, width, + int max_width = get_max_desc_width(width); + + pr_info("%3d: %-*.*s:", curr_suite + 1, max_width, max_width, test_description(*t, -1)); color_fprintf(stderr, PERF_COLOR_YELLOW, " Skip (user override)\n"); -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v1 03/12] perf tests workloads: Support sub-second durations in noploop and thloop 2026-06-16 1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers 2026-06-16 1:27 ` [PATCH v1 01/12] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers 2026-06-16 1:27 ` [PATCH v1 02/12] perf test: Truncate test description to fit terminal width Ian Rogers @ 2026-06-16 1:27 ` Ian Rogers 2026-06-16 1:27 ` [PATCH v1 04/12] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers ` (9 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 1:27 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon, Leo Yan, Thomas Richter, linux-kernel, linux-perf-users Currently, the noploop and thloop workloads only support sleep durations in integer seconds because they parse the argument using atoi() and use alarm() for timer signaling. To support much shorter execution times in tests (speeding up test suites and allowing faster retries), change the input parsing to use atof() for double floating-point seconds. Use ualarm() for fractional durations less than 1.0 seconds, and fall back to alarm() for durations of 1.0 second or more. Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/workloads/noploop.c | 15 ++++++++++++--- tools/perf/tests/workloads/thloop.c | 14 +++++++++----- 2 files changed, 21 insertions(+), 8 deletions(-) diff --git a/tools/perf/tests/workloads/noploop.c b/tools/perf/tests/workloads/noploop.c index 656e472e6188..3fcba5ceaa3d 100644 --- a/tools/perf/tests/workloads/noploop.c +++ b/tools/perf/tests/workloads/noploop.c @@ -15,15 +15,24 @@ static void sighandler(int sig __maybe_unused) static int noploop(int argc, const char **argv) { - int sec = 1; + double sec = 1.0; pthread_setname_np(pthread_self(), "perf-noploop"); if (argc > 0) - sec = atoi(argv[0]); + sec = atof(argv[0]); + + if (sec <= 0.0) { + fprintf(stderr, "Error: seconds (%f) must be > 0\n", sec); + return 1; + } signal(SIGINT, sighandler); signal(SIGALRM, sighandler); - alarm(sec); + + if (sec < 1.0) + ualarm((useconds_t)(sec * 1000000.0), 0); + else + alarm((unsigned int)sec); while (!done) continue; diff --git a/tools/perf/tests/workloads/thloop.c b/tools/perf/tests/workloads/thloop.c index bd8168f883fb..bed3047fe9c2 100644 --- a/tools/perf/tests/workloads/thloop.c +++ b/tools/perf/tests/workloads/thloop.c @@ -31,14 +31,15 @@ static void *thfunc(void *arg) static int thloop(int argc, const char **argv) { - int nt = 2, sec = 1, err = 1; + int nt = 2, err = 1; + double sec = 1.0; pthread_t *thread_list = NULL; if (argc > 0) - sec = atoi(argv[0]); + sec = atof(argv[0]); - if (sec <= 0) { - fprintf(stderr, "Error: seconds (%d) must be >= 1\n", sec); + if (sec <= 0.0) { + fprintf(stderr, "Error: seconds (%f) must be > 0\n", sec); return 1; } @@ -67,7 +68,10 @@ static int thloop(int argc, const char **argv) goto out; } } - alarm(sec); + if (sec < 1.0) + ualarm((useconds_t)(sec * 1000000.0), 0); + else + alarm((unsigned int)sec); test_loop(); err = 0; out: -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v1 04/12] perf tests: Add robust record retry helper and use subsecond workloads 2026-06-16 1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers ` (2 preceding siblings ...) 2026-06-16 1:27 ` [PATCH v1 03/12] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers @ 2026-06-16 1:27 ` Ian Rogers 2026-06-16 1:27 ` [PATCH v1 05/12] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers ` (8 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 1:27 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon, Leo Yan, Thomas Richter, linux-kernel, linux-perf-users Introduce `perf_record_with_retry` and `perf_record_cleanup` in a shared library `tests/shell/lib/perf_record.sh` to prevent record test failures caused by transient recording or workload delays. Update `record.sh`, `record_lbr.sh`, `pipe_test.sh`, `kvm.sh`, and `stat_all_pfm.sh` to use this robust record retry logic. These tests now start with very short durations (e.g. 0.01 seconds) and scale up if the initial recording failed to capture samples, significantly improving test execution speed on success while remaining resilient to slow systems. Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/kvm.sh | 61 +++++--- tools/perf/tests/shell/lib/perf_record.sh | 47 ++++++ tools/perf/tests/shell/pipe_test.sh | 4 +- tools/perf/tests/shell/record.sh | 172 +++++++++++----------- tools/perf/tests/shell/record_lbr.sh | 50 +++++-- tools/perf/tests/shell/stat_all_pfm.sh | 2 +- 6 files changed, 208 insertions(+), 128 deletions(-) create mode 100644 tools/perf/tests/shell/lib/perf_record.sh diff --git a/tools/perf/tests/shell/kvm.sh b/tools/perf/tests/shell/kvm.sh index f88e859025c4..255250ddba5c 100755 --- a/tools/perf/tests/shell/kvm.sh +++ b/tools/perf/tests/shell/kvm.sh @@ -39,17 +39,26 @@ skip() { test_kvm_stat() { echo "Testing perf kvm stat" - echo "Recording kvm events for pid ${qemu_pid}..." - if ! perf kvm stat record -p "${qemu_pid}" -o "${perfdata}" sleep 1; then - echo "Failed to record kvm events" - err=1 - return - fi + local duration + local success=false + for duration in 1 2 4 8; do + echo "Recording kvm events for pid ${qemu_pid} (duration ${duration}s)..." + rm -f "${perfdata}" "${perfdata}".old + if ! perf kvm stat record -p "${qemu_pid}" -o "${perfdata}" sleep ${duration} >/dev/null 2>&1; then + echo "perf kvm stat record failed, retrying..." + continue + fi - echo "Reporting kvm events..." - if ! perf kvm -i "${perfdata}" stat report 2>&1 | grep -q "VM-EXIT"; then + if [ -e "${perfdata}" ] && perf kvm -i "${perfdata}" stat report 2>&1 | grep -q "VM-EXIT"; then + success=true + break + fi + echo "No VM-EXIT events found, retrying..." + done + + if [ "$success" = false ]; then echo "Failed to find VM-EXIT in report" - perf kvm -i "${perfdata}" stat report 2>&1 + perf kvm -i "${perfdata}" stat report 2>&1 || true err=1 return fi @@ -60,22 +69,28 @@ test_kvm_stat() { test_kvm_record_report() { echo "Testing perf kvm record/report" - echo "Recording kvm profile for pid ${qemu_pid}..." - # Use --host to avoid needing guest symbols/mounts for this simple test - # We just want to verify the command runs and produces data - # We run in background and kill it because 'perf kvm record' appends options - # after the command, which breaks 'sleep' (e.g. it gets '-e cycles'). - perf kvm --host record -p "${qemu_pid}" -o "${perfdata}" & - rec_pid=$! - sleep 1 - kill -INT "${rec_pid}" - wait "${rec_pid}" || true + local duration + local success=false + for duration in 1 2 4 8; do + echo "Recording kvm profile for pid ${qemu_pid} (duration ${duration}s)..." + rm -f "${perfdata}" "${perfdata}".old + + perf kvm --host record -p "${qemu_pid}" -o "${perfdata}" & + local rec_pid=$! + sleep ${duration} + kill -INT "${rec_pid}" + wait "${rec_pid}" || true + + if [ -e "${perfdata}" ] && perf kvm -i "${perfdata}" report --stdio 2>&1 | grep -q "Event count"; then + success=true + break + fi + echo "No samples or report failed, retrying..." + done - echo "Reporting kvm profile..." - # Check for some standard output from report - if ! perf kvm -i "${perfdata}" report --stdio 2>&1 | grep -q "Event count"; then + if [ "$success" = false ]; then echo "Failed to report kvm profile" - perf kvm -i "${perfdata}" report --stdio 2>&1 + perf kvm -i "${perfdata}" report --stdio 2>&1 || true err=1 return fi diff --git a/tools/perf/tests/shell/lib/perf_record.sh b/tools/perf/tests/shell/lib/perf_record.sh new file mode 100644 index 000000000000..c5de25244770 --- /dev/null +++ b/tools/perf/tests/shell/lib/perf_record.sh @@ -0,0 +1,47 @@ +# SPDX-License-Identifier: GPL-2.0 + +perf_record_with_retry() { + local perfdata="$1" + local check_cmd="$2" + local testprog_base="$3" + shift 3 + + local logfile + logfile="/tmp/__perf_record_retry.$(id -u).$BASHPID.log" + + # Save the e flag state and disable it + local save_e + if [[ $- == *e* ]]; then + save_e="set -e" + else + save_e="set +e" + fi + set +e + + local duration + local first_run=true + local ret=1 + for duration in 0.01 0.1 0.3 1.0 2.0; do + rm -f "${perfdata}" "${perfdata}".old + perf record "$@" -o "${perfdata}" ${testprog_base} ${duration} > "$logfile" 2>&1 + local record_exit=$? + + if [ "$first_run" = true ] && [ $record_exit -ne 0 ]; then + ret=2 + break + fi + first_run=false + + if [ -e "${perfdata}" ] && eval "${check_cmd}"; then + ret=0 + break + fi + done + + eval "$save_e" + return $ret +} + +perf_record_cleanup() { + rm -f /tmp/__perf_record_retry.$(id -u).$BASHPID.log +} diff --git a/tools/perf/tests/shell/pipe_test.sh b/tools/perf/tests/shell/pipe_test.sh index e459aa99a951..ce68d850c983 100755 --- a/tools/perf/tests/shell/pipe_test.sh +++ b/tools/perf/tests/shell/pipe_test.sh @@ -12,8 +12,8 @@ skip_test_missing_symbol ${sym} data=$(mktemp /tmp/perf.data.XXXXXX) data2=$(mktemp /tmp/perf.data2.XXXXXX) -prog="perf test -w noploop" -[ "$(uname -m)" = "s390x" ] && prog="$prog 3" +prog="perf test -w noploop 0.1" +[ "$(uname -m)" = "s390x" ] && prog="perf test -w noploop 3" err=0 set -e diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh index 7cb81cf3444a..382e7ee02eb8 100755 --- a/tools/perf/tests/shell/record.sh +++ b/tools/perf/tests/shell/record.sh @@ -1,10 +1,13 @@ #!/bin/bash -# perf record tests (exclusive) +# perf record tests # SPDX-License-Identifier: GPL-2.0 set -e shelldir=$(dirname "$0") +. "${shelldir}"/lib/perf_record.sh + + # shellcheck source=lib/waiting.sh . "${shelldir}"/lib/waiting.sh @@ -39,6 +42,7 @@ cleanup() { rm -f "${perfdata}" rm -f "${perfdata}".old rm -f "${script_output}" + perf_record_cleanup trap - EXIT TERM INT } @@ -50,22 +54,19 @@ trap_cleanup() { } trap trap_cleanup EXIT TERM INT +check_per_thread() { + perf report -i "${perfdata}" -q | grep -q "${testsym}" +} + test_per_thread() { echo "Basic --per-thread mode test" - if ! perf record -o /dev/null --quiet ${testprog} 2> /dev/null - then + local ret=0 + perf_record_with_retry "${perfdata}" "check_per_thread" "perf test -w thloop" --per-thread || ret=$? + if [ $ret -eq 2 ]; then echo "Per-thread record [Skipped event not supported]" return - fi - if ! perf record --per-thread -o "${perfdata}" ${testprog} 2> /dev/null - then - echo "Per-thread record [Failed record]" - err=1 - return - fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then - echo "Per-thread record [Failed missing output]" + elif [ $ret -eq 1 ]; then + echo "Per-thread record [Failed record or missing output]" err=1 return fi @@ -96,6 +97,10 @@ test_per_thread() { echo "Basic --per-thread mode test [Success]" } +check_register_capture() { + perf script -F ip,sym,iregs -i "${perfdata}" 2>/dev/null | grep -q "DI:" +} + test_register_capture() { echo "Register capture test" if ! perf list pmu | grep -q 'br_inst_retired.near_call' @@ -108,11 +113,12 @@ test_register_capture() { echo "Register capture test [Skipped missing registers]" return fi - if ! perf record -o - --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call \ - -c 1000 --per-thread ${testprog} 2> /dev/null \ - | perf script -F ip,sym,iregs -i - 2> /dev/null \ - | grep -q "DI:" - then + + local ret=0 + perf_record_with_retry "${perfdata}" "check_register_capture" "perf test -w thloop" \ + --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call -c 1000 --per-thread || ret=$? + + if [ $ret -ne 0 ]; then echo "Register capture test [Failed missing output]" err=1 return @@ -120,65 +126,65 @@ test_register_capture() { echo "Register capture test [Success]" } +check_system_wide() { + perf report -i "${perfdata}" -q | grep -q "${testsym}" +} + test_system_wide() { echo "Basic --system-wide mode test" - if ! perf record -aB --synth=no -o "${perfdata}" ${testprog} 2> /dev/null - then + local ret=0 + perf_record_with_retry "${perfdata}" "check_system_wide" "perf test -w thloop" -aB --synth=no || ret=$? + if [ $ret -eq 2 ]; then echo "System-wide record [Skipped not supported]" return - fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then + elif [ $ret -eq 1 ]; then echo "System-wide record [Failed missing output]" err=1 return fi - if ! perf record -aB --synth=no -e cpu-clock,cs --threads=cpu \ - -o "${perfdata}" ${testprog} 2> /dev/null - then - echo "System-wide record [Failed record --threads option]" - err=1 - return - fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then - echo "System-wide record [Failed --threads missing output]" + + ret=0 + perf_record_with_retry "${perfdata}" "check_system_wide" "perf test -w thloop" \ + -aB --synth=no -e cpu-clock,cs --threads=cpu || ret=$? + if [ $ret -ne 0 ]; then + echo "System-wide record [Failed record --threads option or missing output]" err=1 return fi echo "Basic --system-wide mode test [Success]" } +check_workload() { + perf report -i "${perfdata}" -q | grep -q "${testsym}" +} + test_workload() { echo "Basic target workload test" - if ! perf record -o "${perfdata}" ${testprog} 2> /dev/null - then - echo "Workload record [Failed record]" + local ret=0 + perf_record_with_retry "${perfdata}" "check_workload" "perf test -w thloop" || ret=$? + if [ $ret -ne 0 ]; then + echo "Workload record [Failed record or missing output]" err=1 return fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then - echo "Workload record [Failed missing output]" - err=1 - return - fi - if ! perf record -e cpu-clock,cs --threads=package \ - -o "${perfdata}" ${testprog} 2> /dev/null - then - echo "Workload record [Failed record --threads option]" - err=1 - return - fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then - echo "Workload record [Failed --threads missing output]" + + ret=0 + perf_record_with_retry "${perfdata}" "check_workload" "perf test -w thloop" \ + -e cpu-clock,cs --threads=package || ret=$? + if [ $ret -ne 0 ]; then + echo "Workload record [Failed record --threads option or missing output]" err=1 return fi echo "Basic target workload test [Success]" } +check_branch_counter() { + perf report -i "${perfdata}" -D -q 2>/dev/null | grep -q "$br_cntr_output" && \ + perf script -i "${perfdata}" -F +brstackinsn,+brcntr 2>/dev/null | \ + grep -q "$br_cntr_script_output" +} + test_branch_counter() { echo "Branch counter test" # Check if the branch counter feature is supported @@ -190,67 +196,61 @@ test_branch_counter() { return fi done - if ! perf record -o "${perfdata}" -e "{branches:p,instructions}" -j any,counter ${testprog} 2> /dev/null - then - echo "Branch counter record test [Failed record]" - err=1 - return - fi - if ! perf report -i "${perfdata}" -D -q | grep -q "$br_cntr_output" - then - echo "Branch counter report test [Failed missing output]" - err=1 - return - fi - if ! perf script -i "${perfdata}" -F +brstackinsn,+brcntr | grep -q "$br_cntr_script_output" - then - echo " Branch counter script test [Failed missing output]" + local ret=0 + perf_record_with_retry "${perfdata}" "check_branch_counter" "perf test -w thloop" \ + -e "{branches:p,instructions}" -j any,counter || ret=$? + if [ $ret -ne 0 ]; then + echo "Branch counter test [Failed record or missing output]" err=1 return fi echo "Branch counter test [Success]" } +check_cgroup() { + perf report -i "${perfdata}" -D 2>/dev/null | grep -q "CGROUP" && \ + perf script -i "${perfdata}" -F cgroup 2>/dev/null | grep -q -v "unknown" +} + test_cgroup() { echo "Cgroup sampling test" - if ! perf record -aB --synth=cgroup --all-cgroups -o "${perfdata}" ${testprog} 2> /dev/null - then + local ret=0 + perf_record_with_retry "${perfdata}" "check_cgroup" "perf test -w thloop" \ + -aB --synth=cgroup --all-cgroups || ret=$? + if [ $ret -eq 2 ]; then echo "Cgroup sampling [Skipped not supported]" return - fi - if ! perf report -i "${perfdata}" -D | grep -q "CGROUP" - then + elif [ $ret -eq 1 ]; then echo "Cgroup sampling [Failed missing output]" err=1 return fi - if ! perf script -i "${perfdata}" -F cgroup | grep -q -v "unknown" - then - echo "Cgroup sampling [Failed cannot resolve cgroup names]" - err=1 - return - fi echo "Cgroup sampling test [Success]" } +check_uid() { + perf report -i "${perfdata}" -q | grep -q "${testsym}" +} + test_uid() { echo "Uid sampling test" - if ! perf record -aB --synth=no --uid "$(id -u)" -o "${perfdata}" ${testprog} \ - > "${script_output}" 2>&1 - then - if grep -q "libbpf.*EPERM" "${script_output}" + local logfile + logfile="/tmp/__perf_record_retry.$(id -u).$BASHPID.log" + local ret=0 + perf_record_with_retry "${perfdata}" "check_uid" "perf test -w thloop" \ + -aB --synth=no --uid "$(id -u)" || ret=$? + if [ $ret -eq 2 ]; then + if grep -q -E "libbpf.*EPERM|Access to performance monitoring|Permission denied|Failure to open any events" \ + "$logfile" then echo "Uid sampling [Skipped permissions]" return else echo "Uid sampling [Failed to record]" err=1 - # cat "${script_output}" return fi - fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then + elif [ $ret -eq 1 ]; then echo "Uid sampling [Failed missing output]" err=1 return diff --git a/tools/perf/tests/shell/record_lbr.sh b/tools/perf/tests/shell/record_lbr.sh index 78a02e90ece1..78899a54589c 100755 --- a/tools/perf/tests/shell/record_lbr.sh +++ b/tools/perf/tests/shell/record_lbr.sh @@ -1,9 +1,12 @@ #!/bin/bash -# perf record LBR tests (exclusive) +# perf record LBR tests # SPDX-License-Identifier: GPL-2.0 set -e +shelldir=$(dirname "$0") +. "${shelldir}"/lib/perf_record.sh + ParanoidAndNotRoot() { [ "$(id -u)" != 0 ] && [ "$(cat /proc/sys/kernel/perf_event_paranoid)" -gt $1 ] } @@ -22,6 +25,7 @@ cleanup() { rm -rf "${perfdata}" rm -rf "${perfdata}".old rm -rf "${perfdata}".txt + perf_record_cleanup trap - EXIT TERM INT } @@ -34,22 +38,28 @@ trap_cleanup() { trap trap_cleanup EXIT TERM INT +check_lbr_callgraph() { + perf report --stitch-lbr -i "${perfdata}" > "${perfdata}".txt 2>&1 +} + lbr_callgraph_test() { test="LBR callgraph" echo "$test" - if ! perf record -e cycles --call-graph lbr -o "${perfdata}" perf test -w thloop - then + set +e + perf_record_with_retry "${perfdata}" "check_lbr_callgraph" "perf test -w thloop" \ + -e cycles --call-graph lbr + local ret=$? + set -e + + if [ $ret -eq 2 ]; then echo "$test [Failed support missing]" if [ $err -eq 0 ] then err=2 fi return - fi - - if ! perf report --stitch-lbr -i "${perfdata}" > "${perfdata}".txt - then + elif [ $ret -eq 1 ]; then cat "${perfdata}".txt echo "$test [Failed in perf report]" err=1 @@ -59,6 +69,12 @@ lbr_callgraph_test() { echo "$test [Success]" } +check_lbr_samples() { + local out + out=$(perf report -D -i "${perfdata}" 2> /dev/null | grep -A1 'PERF_RECORD_SAMPLE') + [ "$(echo "$out" | grep -c 'PERF_RECORD_SAMPLE' || true)" -gt 0 ] +} + lbr_test() { local branch_flags=$1 local test="LBR $2 test" @@ -70,25 +86,27 @@ lbr_test() { local r echo "$test" - if ! perf record -e cycles $branch_flags -o "${perfdata}" perf test -w thloop - then + set +e + perf_record_with_retry "${perfdata}" "check_lbr_samples" "perf test -w thloop" \ + -e cycles $branch_flags + local ret=$? + set -e + + if [ $ret -eq 2 ]; then echo "$test [Failed support missing]" - perf record -e cycles $branch_flags -o "${perfdata}" perf test -w thloop || true if [ $err -eq 0 ] then err=2 fi return - fi - - out=$(perf report -D -i "${perfdata}" 2> /dev/null | grep -A1 'PERF_RECORD_SAMPLE') - sam_nr=$(echo "$out" | grep -c 'PERF_RECORD_SAMPLE' || true) - if [ $sam_nr -eq 0 ] - then + elif [ $ret -eq 1 ]; then echo "$test [Failed no samples captured]" err=1 return fi + + out=$(perf report -D -i "${perfdata}" 2> /dev/null | grep -A1 'PERF_RECORD_SAMPLE') + sam_nr=$(echo "$out" | grep -c 'PERF_RECORD_SAMPLE' || true) echo "$test: $sam_nr samples" bs_nr=$(echo "$out" | grep -c 'branch stack: nr:' || true) diff --git a/tools/perf/tests/shell/stat_all_pfm.sh b/tools/perf/tests/shell/stat_all_pfm.sh index c08c186af2c4..ec262c6b1674 100755 --- a/tools/perf/tests/shell/stat_all_pfm.sh +++ b/tools/perf/tests/shell/stat_all_pfm.sh @@ -32,7 +32,7 @@ do then # We failed to see the event and it is supported. Possibly the workload was # too small so retry with something longer. - result=$(perf stat --pfm-events "$p" perf bench internals synthesize 2>&1) + result=$(perf stat --pfm-events "$p" perf test -w noploop 0.1 2>&1) x=$? if test "$x" -ne "0" then -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v1 05/12] perf tests: Skip metrics validation if system-wide recording lacks permission 2026-06-16 1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers ` (3 preceding siblings ...) 2026-06-16 1:27 ` [PATCH v1 04/12] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers @ 2026-06-16 1:27 ` Ian Rogers 2026-06-16 1:27 ` [PATCH v1 06/12] perf tests: Fix Python JIT dump profiling test failure Ian Rogers ` (7 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 1:27 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon, Leo Yan, Thomas Richter, linux-kernel, linux-perf-users The metrics value validation test requires system-wide recording (`-a`), which can fail on systems without root permissions or where paranoid levels restrict tracing. Add a check to skip the test if `-a` is not supported. Also fix false negatives during validation by updating parse error string patterns and resolving issues in metric list generation. Fixes: 3ad7092f5145 ("perf test: Add metric value validation test") Signed-off-by: Ian Rogers <irogers@google.com> --- .../tests/shell/lib/perf_metric_validation.py | 11 ++-- tools/perf/tests/shell/stat_all_metrics.sh | 63 ++++++++++--------- tools/perf/tests/shell/stat_metrics_values.sh | 7 +++ 3 files changed, 48 insertions(+), 33 deletions(-) diff --git a/tools/perf/tests/shell/lib/perf_metric_validation.py b/tools/perf/tests/shell/lib/perf_metric_validation.py index dea8ef1977bf..3d52f94f22b9 100644 --- a/tools/perf/tests/shell/lib/perf_metric_validation.py +++ b/tools/perf/tests/shell/lib/perf_metric_validation.py @@ -383,10 +383,13 @@ class Validator: wl = workload.split() command.extend(wl) print(" ".join(command)) - cmd = subprocess.run(command, stderr=subprocess.PIPE, encoding='utf-8') - data = [x+'}' for x in cmd.stderr.split('}\n') if x] - if data[0][0] != '{': - data[0] = data[0][data[0].find('{'):] + cmd = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, encoding='utf-8') + lines = cmd.stderr.splitlines() + cmd.stdout.splitlines() + data = [] + for line in lines: + line = line.strip() + if line.startswith('{') and line.endswith('}'): + data.append(line) return data def collect_perf(self, workload: str): diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh index b582d23f28c9..5ee869452ee7 100755 --- a/tools/perf/tests/shell/stat_all_metrics.sh +++ b/tools/perf/tests/shell/stat_all_metrics.sh @@ -12,38 +12,53 @@ system_wide_flag="-a" if ParanoidAndNotRoot 0 then system_wide_flag="" - test_prog="perf test -w noploop" + test_prog="perf test -w noploop 0.01" fi +check_metric() { + local output="$1" + local status="$2" + local metric="$3" + + if [[ $status -ne 0 || ! "$output" =~ ${metric:0:50} ]]; then + return 1 + fi + + if [[ "$output" =~ "<not counted>" || "$output" =~ "<not supported>" ]]; then + return 1 + fi + + return 0 +} + skip=0 err=3 for m in $(perf list --raw-dump metrics); do echo "Testing $m" result=$(perf stat -M "$m" $system_wide_flag -- $test_prog 2>&1) result_err=$? - if [[ $result_err -eq 0 && "$result" =~ ${m:0:50} ]] - then - # No error result and metric shown. + + if check_metric "$result" $result_err "$m"; then if [[ "$err" -ne 1 ]] then err=0 fi continue fi - if [[ "$result" =~ "Cannot resolve IDs for" || "$result" =~ "No supported events found" ]] - then - if [[ $(perf list --raw-dump $m) == "Default"* ]] + + result=$(perf stat -M "$m" $system_wide_flag -- perf test -w noploop 0.1 2>&1) + result_err=$? + + if check_metric "$result" $result_err "$m"; then + if [[ "$err" -ne 1 ]] then - echo "[Ignored $m] failed but as a Default metric this can be expected" - echo $result - continue + err=0 fi - echo "[Failed $m] Metric contains missing events" - echo $result - err=1 # Fail continue - elif [[ "$result" =~ \ - "Access to performance monitoring and observability operations is limited" ]] + fi + + # If retry also failed, determine if we skip, ignore, or fail + if [[ "$result" =~ "Access to performance monitoring and observability operations is limited" ]] then echo "[Skipped $m] Permission failure" echo $result @@ -61,7 +76,9 @@ for m in $(perf list --raw-dump metrics); do skip=1 fi continue - elif [[ "$result" =~ "<not supported>" ]] + elif [[ "$result" =~ "<not supported>" || \ + "$result" =~ "Cannot resolve IDs for" || \ + "$result" =~ "No supported events found" ]] then if [[ $(perf list --raw-dump $m) == "Default"* ]] then @@ -105,19 +122,7 @@ for m in $(perf list --raw-dump metrics); do continue fi - # Failed, possibly the workload was too small so retry with something longer. - result=$(perf stat -M "$m" $system_wide_flag -- perf bench internals synthesize 2>&1) - result_err=$? - if [[ $result_err -eq 0 && "$result" =~ ${m:0:50} ]] - then - # No error result and metric shown. - if [[ "$err" -ne 1 ]] - then - err=0 - fi - continue - fi - echo "[Failed $m] has non-zero error '$result_err' or not printed in:" + echo "[Failed $m] has non-zero error '$result_err' or not printed/counted in:" echo "$result" err=1 done diff --git a/tools/perf/tests/shell/stat_metrics_values.sh b/tools/perf/tests/shell/stat_metrics_values.sh index 30566f0b5427..76f1e99d1273 100755 --- a/tools/perf/tests/shell/stat_metrics_values.sh +++ b/tools/perf/tests/shell/stat_metrics_values.sh @@ -8,6 +8,13 @@ shelldir=$(dirname "$0") grep -q GenuineIntel /proc/cpuinfo || { echo Skipping non-Intel; exit 2; } +# Skip if no permission to record system-wide events +if ! perf stat -a -e instructions sleep 0.01 >/dev/null 2>&1; then + echo "Skipping: no permission to record system-wide events (-a)" + exit 2 +fi + + pythonvalidator=$(dirname $0)/lib/perf_metric_validation.py rulefile=$(dirname $0)/lib/perf_metric_validation_rules.json tmpdir=$(mktemp -d /tmp/__perf_test.program.XXXXX) -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v1 06/12] perf tests: Fix Python JIT dump profiling test failure 2026-06-16 1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers ` (4 preceding siblings ...) 2026-06-16 1:27 ` [PATCH v1 05/12] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers @ 2026-06-16 1:27 ` Ian Rogers 2026-06-16 1:27 ` [PATCH v1 07/12] perf tests: Fix flakiness in trace record and replay test Ian Rogers ` (6 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 1:27 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon, Leo Yan, Thomas Richter, linux-kernel, linux-perf-users The `python profiling with jitdump` test failed due to: 1. Target PID extraction resolving to duplicate space-separated values, which broke the buildid-cache loops. 2. The default workload duration being too short to capture JIT stack trampoline samples, resulting in 0 matching JIT symbols. Fix the PID parsing by sorting and retrieving a unique single-line value. Implement a robust retry loop starting at 1M python loop iterations and scaling up to 100M iterations until JIT symbols are successfully captured and verified. Fixes: c9cd0c7e529e ("perf test: Add python JIT dump test") Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/jitdump-python.sh | 66 +++++++++++++++--------- 1 file changed, 41 insertions(+), 25 deletions(-) diff --git a/tools/perf/tests/shell/jitdump-python.sh b/tools/perf/tests/shell/jitdump-python.sh index ae86203b14a2..6f3d1edd6f04 100755 --- a/tools/perf/tests/shell/jitdump-python.sh +++ b/tools/perf/tests/shell/jitdump-python.sh @@ -20,7 +20,10 @@ PERF_DATA=$(mktemp /tmp/__perf_test.perf.data.XXXXXX) cleanup() { echo "Cleaning up files..." - rm -f ${PERF_DATA} ${PERF_DATA}.jit /tmp/jit-${PID}.dump /tmp/jitted-${PID}-*.so 2> /dev/null + rm -f ${PERF_DATA} ${PERF_DATA}.jit 2> /dev/null + for p in ${ALL_PIDS}; do + rm -f /tmp/jit-${p}.dump /tmp/jitted-${p}-*.so 2> /dev/null + done trap - EXIT TERM INT } @@ -33,9 +36,11 @@ trap_cleanup() { trap trap_cleanup EXIT TERM INT -echo "Run python with -Xperf_jit" -cat <<EOF | perf record -k 1 -g --call-graph dwarf -o "${PERF_DATA}" \ - -- ${PYTHON} -Xperf_jit +ALL_PIDS="" +NUM=0 +for iterations in 1000000 10000000 50000000 100000000; do + echo "Running with $iterations iterations..." + cat <<EOF | perf record -k 1 -g --call-graph dwarf -o "${PERF_DATA}" -- ${PYTHON} -Xperf_jit def foo(n): result = 0 for _ in range(n): @@ -49,29 +54,40 @@ def baz(n): bar(n) if __name__ == "__main__": - baz(1000000) + baz($iterations) EOF -# extract PID of the target process from the data -_PID=$(perf report -i "${PERF_DATA}" -F pid -q -g none | cut -d: -f1 -s) -PID=$(echo -n $_PID) # remove newlines - -echo "Generate JIT-ed DSOs using perf inject" -DEBUGINFOD_URLS='' perf inject -i "${PERF_DATA}" -j -o "${PERF_DATA}.jit" - -echo "Add JIT-ed DSOs to the build-ID cache" -for F in /tmp/jitted-${PID}-*.so; do - perf buildid-cache -a "${F}" -done - -echo "Check the symbol containing the function/module name" -NUM=$(perf report -i "${PERF_DATA}.jit" -s sym | grep -cE 'py::(foo|bar|baz):<stdin>') - -echo "Found ${NUM} matching lines" - -echo "Remove JIT-ed DSOs from the build-ID cache" -for F in /tmp/jitted-${PID}-*.so; do - perf buildid-cache -r "${F}" + # extract PID of the target process from the data + PID=$(perf report -i "${PERF_DATA}" --stdio -F pid -q -g none | \ + cut -d: -f1 -s | sort -u | head -n 1 | tr -d ' ') + if [ -z "${PID}" ]; then + echo "Failed to get PID, retrying..." + continue + fi + ALL_PIDS="${ALL_PIDS} ${PID}" + + echo "Generate JIT-ed DSOs using perf inject" + DEBUGINFOD_URLS='' perf inject -i "${PERF_DATA}" -j -o "${PERF_DATA}.jit" + + echo "Add JIT-ed DSOs to the build-ID cache" + for F in /tmp/jitted-${PID}-*.so; do + perf buildid-cache -a "${F}" + done + + echo "Check the symbol containing the function/module name" + NUM=$(perf report -i "${PERF_DATA}.jit" -s sym --stdio | grep -cE 'py::(foo|bar|baz):<stdin>') + + echo "Remove JIT-ed DSOs from the build-ID cache" + for F in /tmp/jitted-${PID}-*.so; do + perf buildid-cache -r "${F}" + done + rm -f /tmp/jitted-${PID}-*.so /tmp/jit-${PID}.dump 2>/dev/null + + if [ "${NUM}" -gt 0 ]; then + echo "Success: found ${NUM} matching lines" + break + fi + echo "No matching lines found, retrying with more iterations..." done cleanup -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v1 07/12] perf tests: Fix flakiness in trace record and replay test 2026-06-16 1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers ` (5 preceding siblings ...) 2026-06-16 1:27 ` [PATCH v1 06/12] perf tests: Fix Python JIT dump profiling test failure Ian Rogers @ 2026-06-16 1:27 ` Ian Rogers 2026-06-16 1:27 ` [PATCH v1 08/12] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers ` (5 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 1:27 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon, Leo Yan, Thomas Richter, linux-kernel, linux-perf-users The `perf trace record and replay` test fails intermittently on slow or virtualized hosts because the default recording workload (`sleep 1`) occasionally completes without scheduling the target `nanosleep` or `clock_nanosleep` system calls inside the recorded sample window, resulting in the error: `Failed: cannot find *nanosleep syscall`. Generalize the `perf_record_with_retry` helper in `tests/shell/lib/perf_record.sh` to support a custom record command prefix via the `PERF_RECORD_CMD` environment variable (defaulting to "perf record"). Update `trace_record_replay.sh` to use this robust retry loop running with `PERF_RECORD_CMD="perf trace record"` and a base workload of `sleep`. The test will automatically retry with scaled sleep durations (from 0.01s up to 2.0s) until the required `nanosleep` event is successfully captured. Fixes: 15bcfb96d0dd ("perf test: Add trace record and replay test") Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/lib/perf_record.sh | 7 ++++++- tools/perf/tests/shell/trace_record_replay.sh | 18 ++++++++++++++---- 2 files changed, 20 insertions(+), 5 deletions(-) diff --git a/tools/perf/tests/shell/lib/perf_record.sh b/tools/perf/tests/shell/lib/perf_record.sh index c5de25244770..5c7feb294ebb 100644 --- a/tools/perf/tests/shell/lib/perf_record.sh +++ b/tools/perf/tests/shell/lib/perf_record.sh @@ -21,9 +21,14 @@ perf_record_with_retry() { local duration local first_run=true local ret=1 + local cmd_prefix="perf record" + if [ -n "${PERF_RECORD_CMD}" ]; then + cmd_prefix="${PERF_RECORD_CMD}" + fi + for duration in 0.01 0.1 0.3 1.0 2.0; do rm -f "${perfdata}" "${perfdata}".old - perf record "$@" -o "${perfdata}" ${testprog_base} ${duration} > "$logfile" 2>&1 + ${cmd_prefix} "$@" -o "${perfdata}" ${testprog_base} ${duration} > "$logfile" 2>&1 local record_exit=$? if [ "$first_run" = true ] && [ $record_exit -ne 0 ]; then diff --git a/tools/perf/tests/shell/trace_record_replay.sh b/tools/perf/tests/shell/trace_record_replay.sh index 88d30a03dcec..f27e32b18697 100755 --- a/tools/perf/tests/shell/trace_record_replay.sh +++ b/tools/perf/tests/shell/trace_record_replay.sh @@ -6,16 +6,26 @@ # shellcheck source=lib/probe.sh . "$(dirname $0)"/lib/probe.sh +# shellcheck source=lib/perf_record.sh +. "$(dirname $0)"/lib/perf_record.sh skip_if_no_perf_trace || exit 2 [ "$(id -u)" = 0 ] || exit 2 file=$(mktemp /tmp/temporary_file.XXXXX) -perf trace record -o ${file} sleep 1 || exit 1 -if ! perf trace -i ${file} 2>&1 | grep nanosleep; then +check_nanosleep() { + perf trace -i "${file}" 2>&1 | grep -q nanosleep +} + +PERF_RECORD_CMD="perf trace record" perf_record_with_retry "${file}" "check_nanosleep" "sleep" +err=$? + +perf_record_cleanup +rm -f ${file} + +if [ $err -ne 0 ]; then echo "Failed: cannot find *nanosleep syscall" exit 1 fi - -rm -f ${file} +exit 0 -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v1 08/12] perf tests: Fix flakiness in BPF counters test on hybrid systems 2026-06-16 1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers ` (6 preceding siblings ...) 2026-06-16 1:27 ` [PATCH v1 07/12] perf tests: Fix flakiness in trace record and replay test Ian Rogers @ 2026-06-16 1:27 ` Ian Rogers 2026-06-16 1:27 ` [PATCH v1 09/12] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers ` (4 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 1:27 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon, Leo Yan, Thomas Richter, linux-kernel, linux-perf-users The `perf stat --bpf-counters test` fails intermittently on hybrid architectures or systems with dynamic frequency scaling (DVFS). This happens because the test workload (`sqrtloop`) runs for a fixed 1-second duration, and the CPU frequency can scale dynamically between idle and maximum frequency. As the first run runs on a cold CPU and the second run runs on a warmed-up CPU (or vice versa), the number of instructions executed in 1 second differs by up to 2.2x, violating the comparison tolerance. Also, when running as root, BPF tracepoints and scheduling programs trigger frequently. Since standard `perf stat -e instructions` measures both user and kernel space instructions, it counts BPF helper and program execution overheads, whereas the BPF counters themselves do not self-measure. This introduces a large kernel-space instruction count discrepancy between standard and BPF counters. Fix these issues by: 1. Switching the workload to a strictly deterministic, iteration-based workload: `awk 'BEGIN { for (i=0; i<10000000; i++) sum+=i }'`. We pin the workload to a single random allowed CPU using `taskset -c $CPU` via a bash array. 2. Restricting the counted event to user-space only (`instructions:u` or `/u`). 3. Tightening the comparison tolerance from 20% to 15%. These modifications isolate the measurements to user-space instructions of the deterministic loop, which executes a virtually identical number of instructions on both runs (with less than 0.001% variation), eliminating Dynamic Frequency Scaling (DVFS), kernel scheduling noise, and BPF helper self-measurement overheads. Fixes: 2c0cb9f56020 ("perf test: Add a shell test for 'perf stat --bpf-counters' new option") Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/stat_bpf_counters.sh | 27 +++++++++++++-------- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/tools/perf/tests/shell/stat_bpf_counters.sh b/tools/perf/tests/shell/stat_bpf_counters.sh index 35463358b273..2f01608c95e3 100755 --- a/tools/perf/tests/shell/stat_bpf_counters.sh +++ b/tools/perf/tests/shell/stat_bpf_counters.sh @@ -4,21 +4,25 @@ set -e -workload="perf test -w sqrtloop" +CPU=0 +if command -v shuf >/dev/null 2>&1; then + CPU=$(shuf -i 0-$(($(nproc) - 1)) -n 1 2>/dev/null || echo 0) +fi +workload=(taskset -c "$CPU" awk 'BEGIN { for (i=0; i<10000000; i++) sum+=i }') -# check whether $2 is within +/- 20% of $1 +# check whether $2 is within +/- 15% of $1 compare_number() { first_num=$1 second_num=$2 - # upper bound is first_num * 120% - upper=$(expr $first_num + $first_num / 5 ) - # lower bound is first_num * 80% - lower=$(expr $first_num - $first_num / 5 ) + # upper bound is first_num * 115% + upper=$(expr $first_num + $first_num \* 3 / 20 ) + # lower bound is first_num * 85% + lower=$(expr $first_num - $first_num \* 3 / 20 ) if [ $second_num -gt $upper ] || [ $second_num -lt $lower ]; then - echo "The difference between $first_num and $second_num are greater than 20%." + echo "The difference between $first_num and $second_num are greater than 15%." exit 1 fi } @@ -41,11 +45,12 @@ check_counts() test_bpf_counters() { printf "Testing --bpf-counters " - base_instructions=$(perf stat --no-big-num -e instructions -- $workload 2>&1 | \ + base_instructions=$(perf stat --no-big-num -e instructions:u -- "${workload[@]}" 2>&1 | \ awk -v i=0 -v c=0 '/instructions/ { \ if ($1 != "<not") { i++; c += $1 } \ } END { if (i > 0) printf "%.0f", c; else print "<not" }') - bpf_instructions=$(perf stat --no-big-num --bpf-counters -e instructions -- $workload 2>&1 | \ + bpf_instructions=$(perf stat --no-big-num --bpf-counters -e instructions:u \ + -- "${workload[@]}" 2>&1 | \ awk -v i=0 -v c=0 '/instructions/ { \ if ($1 != "<not") { i++; c += $1 } \ } END { if (i > 0) printf "%.0f", c; else print "<not" }') @@ -57,7 +62,9 @@ test_bpf_counters() test_bpf_modifier() { printf "Testing bpf event modifier " - stat_output=$(perf stat --no-big-num -e instructions/name=base_instructions/,instructions/name=bpf_instructions/b -- $workload 2>&1) + stat_output=$(perf stat --no-big-num \ + -e instructions/name=base_instructions/u,instructions/name=bpf_instructions/bu \ + -- "${workload[@]}" 2>&1) base_instructions=$(echo "$stat_output"| \ awk -v i=0 -v c=0 '/base_instructions/ { \ if ($1 != "<not") { i++; c += $1 } \ -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v1 09/12] perf tests: Fix flakiness in branch stack sampling tests 2026-06-16 1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers ` (7 preceding siblings ...) 2026-06-16 1:27 ` [PATCH v1 08/12] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers @ 2026-06-16 1:27 ` Ian Rogers 2026-06-16 1:27 ` [PATCH v1 10/12] perf tests: Speed up off-cpu profiling tests Ian Rogers ` (3 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 1:27 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon, Leo Yan, Thomas Richter, linux-kernel, linux-perf-users The branch stack sampling test (test 130) runs short iteration-based workloads to verify syscall, kernel, and trap branch stack sampling. Specifically, `test_syscall()` and `test_kernel_branches()` run `perf bench syscall basic` with loop counts of 8000 and 1000, and `test_trap_eret_branches()` runs `traploop` with 1000 iterations. Because these loop limits are extremely small, the total benchmark runtimes last only a few milliseconds (or less). Under high load, virtualization, or coarse sampling conditions, PMU cycle sampling fails to capture enough samples inside the brief benchmark loops. This leads to false negatives where the script output lacks the expected syscall, kernel, or trap branch entries (e.g. "ERROR: Branches missing getppid[^ ]*/SYSCALL/"). Fix this by increasing the workload loop counts to 100,000 across all three test sections. Running 100,000 loops still finishes virtually instantaneously (less than 0.1 seconds), but generates enough iterations to guarantee robust branch stack capture. Fixes: b55878c90ab9 ("perf test: Add test for branch stack sampling") Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/test_brstack.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/perf/tests/shell/test_brstack.sh b/tools/perf/tests/shell/test_brstack.sh index eb5837f82e39..be46a68daf76 100755 --- a/tools/perf/tests/shell/test_brstack.sh +++ b/tools/perf/tests/shell/test_brstack.sh @@ -112,7 +112,7 @@ test_trap_eret_branches() { start_err=$err err=0 perf record -o $TMPDIR/perf.data --branch-filter any,save_type,u,k -- \ - perf test -w traploop 1000 > "$TMPDIR/record.txt" 2>&1 + perf test -w traploop 100000 > "$TMPDIR/record.txt" 2>&1 perf script -i $TMPDIR/perf.data --fields brstacksym | \ tr ' ' '\n' > $TMPDIR/perf.script @@ -137,7 +137,7 @@ test_kernel_branches() { start_err=$err err=0 perf record -o $TMPDIR/perf.data --branch-filter any,k -- \ - perf bench syscall basic --loop 1000 > "$TMPDIR/record.txt" 2>&1 + perf bench syscall basic --loop 100000 > "$TMPDIR/record.txt" 2>&1 perf script -i $TMPDIR/perf.data --fields brstack | \ tr ' ' '\n' > $TMPDIR/perf.script @@ -209,7 +209,7 @@ test_syscall() { err=0 perf record -o $TMPDIR/perf.data --branch-filter \ any_call,save_type,u,k -c 10007 -- \ - perf bench syscall basic --loop 8000 > "$TMPDIR/record.txt" 2>&1 + perf bench syscall basic --loop 100000 > "$TMPDIR/record.txt" 2>&1 perf script -i $TMPDIR/perf.data --fields brstacksym | \ tr ' ' '\n' > $TMPDIR/perf.script -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v1 10/12] perf tests: Speed up off-cpu profiling tests 2026-06-16 1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers ` (8 preceding siblings ...) 2026-06-16 1:27 ` [PATCH v1 09/12] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers @ 2026-06-16 1:27 ` Ian Rogers 2026-06-16 1:27 ` [PATCH v1 11/12] perf tests: Speed up lock contention analysis shell test Ian Rogers ` (2 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 1:27 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon, Leo Yan, Thomas Richter, linux-kernel, linux-perf-users The off-cpu profiling test suite runs multiple recording commands with a default workload of `sleep 1` to test the off-cpu threshold configurations (specifically, above 999ms and below 1200ms). This adds a mandatory 3.0 seconds of sleep overhead. Optimize this by scaling down the thresholds and workload durations by a factor of 10: - Use `sleep 0.1` as the workload duration. - Change the above-threshold test to use `--off-cpu-thresh 99` and `sleep 0.1`. - Change the below-threshold test to use `--off-cpu-thresh 120` and `sleep 0.1`. - Update the awk period check in the above-threshold test to look for a period greater than 99,000,000 ns (99ms) instead of 999,000,000 ns (999ms). This reduces raw test sleep overhead from 3.0s down to 0.3s, yielding a ~2.7 second speedup for this test. Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/record_offcpu.sh | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/perf/tests/shell/record_offcpu.sh b/tools/perf/tests/shell/record_offcpu.sh index 860a2d6f4b75..39319437b8f9 100755 --- a/tools/perf/tests/shell/record_offcpu.sh +++ b/tools/perf/tests/shell/record_offcpu.sh @@ -45,7 +45,7 @@ test_offcpu_priv() { test_offcpu_basic() { echo "Basic off-cpu test" - if ! perf record --off-cpu -e dummy -o ${perfdata} sleep 1 2> /dev/null + if ! perf record --off-cpu -e dummy -o ${perfdata} sleep 0.1 2> /dev/null then echo "Basic off-cpu test [Failed record]" err=1 @@ -98,8 +98,8 @@ test_offcpu_child() { test_offcpu_above_thresh() { echo "${test_above_thresh}" - # collect direct off-cpu samples for tasks blocked for more than 999ms - if ! perf record -e dummy --off-cpu --off-cpu-thresh 999 -o ${perfdata} -- sleep 1 2> /dev/null + # collect direct off-cpu samples for tasks blocked for more than 99ms + if ! perf record -e dummy --off-cpu --off-cpu-thresh 99 -o ${perfdata} -- sleep 0.1 2> /dev/null then echo "${test_above_thresh} [Failed record]" err=1 @@ -115,7 +115,7 @@ test_offcpu_above_thresh() { fi # there should only be one direct sample, and its period should be higher than off-cpu-thresh if ! perf script --time "0, ${dummy_timestamp}" -i ${perfdata} -F period | \ - awk '{ if (int($1) > 999000000) exit 0; else exit 1; }' + awk '{ if (int($1) > 99000000) exit 0; else exit 1; }' then echo "${test_above_thresh} [Failed off-cpu time too short]" err=1 @@ -128,8 +128,8 @@ test_offcpu_above_thresh() { test_offcpu_below_thresh() { echo "${test_below_thresh}" - # collect direct off-cpu samples for tasks blocked for more than 1.2s - if ! perf record -e dummy --off-cpu --off-cpu-thresh 1200 -o ${perfdata} -- sleep 1 2> /dev/null + # collect direct off-cpu samples for tasks blocked for more than 120ms + if ! perf record -e dummy --off-cpu --off-cpu-thresh 120 -o ${perfdata} -- sleep 0.1 2> /dev/null then echo "${test_below_thresh} [Failed record]" err=1 -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v1 11/12] perf tests: Speed up lock contention analysis shell test 2026-06-16 1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers ` (9 preceding siblings ...) 2026-06-16 1:27 ` [PATCH v1 10/12] perf tests: Speed up off-cpu profiling tests Ian Rogers @ 2026-06-16 1:27 ` Ian Rogers 2026-06-16 1:27 ` [PATCH v1 12/12] perf tests: Speed up metrics checking shell tests Ian Rogers 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 1:27 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon, Leo Yan, Thomas Richter, linux-kernel, linux-perf-users The lock contention analysis test suite (`lock_contention.sh`) performs a series of 13 separate profiling checks to verify various aggregation and filtering parameters of `perf lock contention`. Each of these checks runs the `perf bench sched messaging` messaging benchmark as its workload. By default, `sched messaging` runs 10 groups of 40 processes (400 processes total) generating substantial task scheduling, context switching, and IPC message passing. When traced system-wide for lock events, the tracing overhead (handling millions of lock acquisitions and releases) slows execution down significantly, causing the test suite to take over 80 seconds. Optimize this by introducing a scaled-down messaging benchmark workload: `perf bench sched messaging -g 1 -p`. Running 1 group (40 processes) takes only 0.01 seconds natively (instead of 0.08 seconds), drastically reduces the sheer volume of lock acquire/release trace events, and reduces CPU context switching during tracing while still generating sufficient lock events to fully exercise the BPF/record filters. Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/lock_contention.sh | 30 +++++++++++++---------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/tools/perf/tests/shell/lock_contention.sh b/tools/perf/tests/shell/lock_contention.sh index 52e8b9db9fbd..f589d387f6b3 100755 --- a/tools/perf/tests/shell/lock_contention.sh +++ b/tools/perf/tests/shell/lock_contention.sh @@ -9,6 +9,10 @@ perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX) result=$(mktemp /tmp/__perf_test.result.XXXXX) errout=$(mktemp /tmp/__perf_test.errout.XXXXX) +# Workload to generate lock contention. +# Using 1 group (-g 1) keeps runtime low while generating sufficient lock events. +msg_workload="perf bench sched messaging -g 1 -p" + cleanup() { rm -f ${perfdata} rm -f ${result} @@ -50,7 +54,7 @@ check() { test_record() { echo "Testing perf lock record and perf lock contention" - perf lock record -o ${perfdata} -- perf bench sched messaging -p > /dev/null 2>&1 + perf lock record -o ${perfdata} -- ${msg_workload} > /dev/null 2>&1 # the output goes to the stderr and we expect only 1 output (-E 1) perf lock contention -i ${perfdata} -E 1 -q 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then @@ -70,7 +74,7 @@ test_bpf() fi # the perf lock contention output goes to the stderr - perf lock con -a -b -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)" err=1 @@ -81,7 +85,7 @@ test_bpf() test_record_concurrent() { echo "Testing perf lock record and perf lock contention at the same time" - perf lock record -o- -- perf bench sched messaging -p 2> ${errout} | \ + perf lock record -o- -- ${msg_workload} 2> ${errout} | \ perf lock contention -i- -E 1 -q 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] Recorded result count is not 1:" "$(cat "${result}" | wc -l)" @@ -107,7 +111,7 @@ test_aggr_task() fi # the perf lock contention output goes to the stderr - perf lock con -a -b -t -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -t -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)" err=1 @@ -130,7 +134,7 @@ test_aggr_addr() fi # the perf lock contention output goes to the stderr - perf lock con -a -b -l -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -l -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)" err=1 @@ -148,7 +152,7 @@ test_aggr_cgroup() fi # the perf lock contention output goes to the stderr - perf lock con -a -b --lock-cgroup -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b --lock-cgroup -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)" err=1 @@ -170,7 +174,7 @@ test_type_filter() return fi - perf lock con -a -b -Y spinlock -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -Y spinlock -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(grep -c -v spinlock "${result}")" != "0" ]; then echo "[Fail] BPF result should not have non-spinlocks:" "$(cat "${result}")" err=1 @@ -202,7 +206,7 @@ test_lock_filter() return fi - perf lock con -a -b -L tasklist_lock -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -L tasklist_lock -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(grep -c -v "${test_lock_filter_type}" "${result}")" != "0" ]; then echo "[Fail] BPF result should not have non-${test_lock_filter_type} locks:" "$(cat "${result}")" err=1 @@ -241,7 +245,7 @@ test_stack_filter() return fi - perf lock con -a -b -S unix_stream -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -S unix_stream -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result should have a lock from unix_stream:" "$(cat "${result}")" err=1 @@ -269,7 +273,7 @@ test_aggr_task_stack_filter() return fi - perf lock con -a -b -t -S unix_stream -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -t -S unix_stream -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result should have a task from unix_stream:" "$(cat "${result}")" err=1 @@ -285,7 +289,7 @@ test_cgroup_filter() return fi - perf lock con -a -b --lock-cgroup -E 1 -F wait_total -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b --lock-cgroup -E 1 -F wait_total -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result should have a cgroup result:" "$(cat "${result}")" err=1 @@ -293,7 +297,7 @@ test_cgroup_filter() fi cgroup=$(cat "${result}" | awk '{ print $3 }') - perf lock con -a -b --lock-cgroup -E 1 -G "${cgroup}" -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b --lock-cgroup -E 1 -G "${cgroup}" -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result should have a result with cgroup filter:" "$(cat "${cgroup}")" err=1 @@ -328,7 +332,7 @@ test_csv_output() fi # the perf lock contention output goes to the stderr - perf lock con -a -b -E 1 -x , --output ${result} -- perf bench sched messaging -p > /dev/null 2>&1 + perf lock con -a -b -E 1 -x , --output ${result} -- ${msg_workload} > /dev/null 2>&1 output=$(grep -v "^#" ${result} | tr -d -c , | wc -c) if [ "${header}" != "${output}" ]; then echo "[Fail] BPF result does not match the number of commas: ${header} != ${output}" -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v1 12/12] perf tests: Speed up metrics checking shell tests 2026-06-16 1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers ` (10 preceding siblings ...) 2026-06-16 1:27 ` [PATCH v1 11/12] perf tests: Speed up lock contention analysis shell test Ian Rogers @ 2026-06-16 1:27 ` Ian Rogers 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 1:27 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Jiri Olsa, Ian Rogers, Adrian Hunter, James Clark, Thomas Falcon, Leo Yan, Thomas Richter, linux-kernel, linux-perf-users Optimize the execution of the metric validation and metric listing shell test suites: 1. `stat_metrics_values.sh`: The Python metric validator runs the `perf bench futex hash` workload for each validated metric relationship. Reduce the benchmark runtime limit from `-r 2` (2 seconds) to `-r 1` (1 second). This cuts the workload duration in half while still generating sufficient PMU events to satisfy non-zero threshold metric validations. 2. `stat_all_metrics.sh`: The metric checking test runs `perf stat` sequentially across all 433+ listed metrics. Change the default workload for system-wide runs from `sleep 0.01` to `true`. This avoids the 10ms sleep delay on each sequential metric invocation, saving over 4 seconds of total wall time during full test suite runs. Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/stat_all_metrics.sh | 2 +- tools/perf/tests/shell/stat_metrics_values.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh index 5ee869452ee7..066e6cfd427b 100755 --- a/tools/perf/tests/shell/stat_all_metrics.sh +++ b/tools/perf/tests/shell/stat_all_metrics.sh @@ -7,7 +7,7 @@ ParanoidAndNotRoot() [ "$(id -u)" != 0 ] && [ "$(cat /proc/sys/kernel/perf_event_paranoid)" -gt $1 ] } -test_prog="sleep 0.01" +test_prog="true" system_wide_flag="-a" if ParanoidAndNotRoot 0 then diff --git a/tools/perf/tests/shell/stat_metrics_values.sh b/tools/perf/tests/shell/stat_metrics_values.sh index 76f1e99d1273..86c5c7f70933 100755 --- a/tools/perf/tests/shell/stat_metrics_values.sh +++ b/tools/perf/tests/shell/stat_metrics_values.sh @@ -18,7 +18,7 @@ fi pythonvalidator=$(dirname $0)/lib/perf_metric_validation.py rulefile=$(dirname $0)/lib/perf_metric_validation_rules.json tmpdir=$(mktemp -d /tmp/__perf_test.program.XXXXX) -workload="perf bench futex hash -r 2 -s" +workload="perf bench futex hash -r 1 -s" # Add -debug, save data file and full rule file echo "Launch python validation script $pythonvalidator" -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness 2026-06-16 1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers ` (11 preceding siblings ...) 2026-06-16 1:27 ` [PATCH v1 12/12] perf tests: Speed up metrics checking shell tests Ian Rogers @ 2026-06-16 6:13 ` Ian Rogers 2026-06-16 6:13 ` [PATCH v2 01/12] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers ` (12 more replies) 12 siblings, 13 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 6:13 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht This patch series introduces several robustness and performance improvements to the perf test suite, alongside fixes for long-standing flakiness. Key changes across the series: - Introduces robust retry logic (`perf_record_with_retry`) to prevent spurious test failures due to transient `EPERM` issues during system-wide background profiling. - Significantly speeds up test execution across multiple scripts (such as kvm, record, trace, off-cpu, and lock contention) by supporting sub-second durations in the `noploop` and `thloop` workloads, reducing unnecessary wait times. - Fixes flakiness in the BPF counters hybrid test by parsing `taskset` to determine valid CPUs rather than relying on `shuf`, which can fail under cgroups or missing binaries. - Fixes JIT dump file leaks by actively recording the injected Python PID to guarantee cleanup when zero samples are generated. - Decouples format alignments in `builtin-test.c` to gracefully truncate test descriptions without losing alignment or failing under non-TTY environments. - Restricts uncore PMU bypass in event parsing strictly to `--cputype` to prevent unintended metric evaluation impacts. v2 changes: - Plumbed `bool cputype_filter` in metric parsing (patch 1) to explicitly restrict the uncore bypass to `--cputype`. - Fixed output truncation on file redirection in `builtin-test.c` (patch 2) by guarding `get_term_dimensions()` with `isatty(1)`, and decoupled the format minimum padding from maximum precision limits to avoid hard-truncating long descriptions unnecessarily. Restored the 35-character headroom comment. - Clamped `ualarm()` microseconds calculation to `> 0 ? usecs : 1` (patch 3) to prevent an infinite loop DoS if the float parses to 0. - Hardened `perf_record.sh` (patch 4): used secure `mktemp` pathing, replaced unsafe `rm -f ${perfdata}` with `.old` rotation, and fixed bash globbing in the cleanup trap. - Handled the JIT dump Python PID leak (patch 6) by injecting `os.getpid()` output to a `${PERF_DATA}.pid` tracker file so the trap can robustly clean it up even if `perf report` is empty. - Avoided 32-bit `expr` math overflow in `stat_bpf_counters.sh` (patch 8) by ensuring division occurs before multiplication (`x / 20 * 3`). - Lowered off-cpu `sleep 0.1` thresholds (patch 10) to 50ms (above) and 500ms (below) for much higher robustness against jitter. - Fixed a severe performance regression in `stat_all_metrics.sh` (patch 5) caught by the sashiko reviewer, where missing structural events ("Cannot resolve IDs", "<not supported>") were unconditionally retrying with a 0.1s penalty. The script now correctly parses these errors and skips before the retry. Ian Rogers (12): perf parse-events: Restrict core PMU bypass to --cputype option perf test: Truncate test description to fit terminal width perf tests workloads: Support sub-second durations in noploop and thloop perf tests: Add robust record retry helper and use subsecond workloads perf tests: Skip metrics validation if system-wide recording lacks permission perf tests: Fix Python JIT dump profiling test failure perf tests: Fix flakiness in trace record and replay test perf tests: Fix flakiness in BPF counters test on hybrid systems perf tests: Fix flakiness in branch stack sampling tests perf tests: Speed up off-cpu profiling tests perf tests: Speed up lock contention analysis shell test perf tests: Speed up metrics checking shell tests tools/perf/builtin-script.c | 1 + tools/perf/builtin-stat.c | 12 +- tools/perf/tests/builtin-test.c | 110 ++++++++--- tools/perf/tests/expand-cgroup.c | 2 +- tools/perf/tests/parse-events.c | 11 +- tools/perf/tests/parse-metric.c | 2 +- tools/perf/tests/pmu-events.c | 8 +- tools/perf/tests/shell/jitdump-python.sh | 76 +++++--- tools/perf/tests/shell/kvm.sh | 61 ++++--- .../tests/shell/lib/perf_metric_validation.py | 11 +- tools/perf/tests/shell/lib/perf_record.sh | 52 ++++++ tools/perf/tests/shell/lock_contention.sh | 30 +-- tools/perf/tests/shell/pipe_test.sh | 4 +- tools/perf/tests/shell/record.sh | 172 +++++++++--------- tools/perf/tests/shell/record_lbr.sh | 50 +++-- tools/perf/tests/shell/record_offcpu.sh | 12 +- tools/perf/tests/shell/stat_all_metrics.sh | 77 +++++--- tools/perf/tests/shell/stat_all_pfm.sh | 2 +- tools/perf/tests/shell/stat_bpf_counters.sh | 28 ++- tools/perf/tests/shell/stat_metrics_values.sh | 9 +- tools/perf/tests/shell/test_brstack.sh | 6 +- tools/perf/tests/shell/trace_record_replay.sh | 18 +- tools/perf/tests/workloads/noploop.c | 17 +- tools/perf/tests/workloads/thloop.c | 16 +- tools/perf/util/metricgroup.c | 23 ++- tools/perf/util/metricgroup.h | 4 +- tools/perf/util/parse-events.c | 22 ++- tools/perf/util/parse-events.h | 17 +- tools/perf/util/python.c | 2 +- 29 files changed, 560 insertions(+), 295 deletions(-) create mode 100644 tools/perf/tests/shell/lib/perf_record.sh -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 01/12] perf parse-events: Restrict core PMU bypass to --cputype option 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers @ 2026-06-16 6:13 ` Ian Rogers 2026-06-16 15:14 ` Arnaldo Carvalho de Melo 2026-06-16 15:17 ` Arnaldo Carvalho de Melo 2026-06-16 6:13 ` [PATCH v2 02/12] perf test: Truncate test description to fit terminal width Ian Rogers ` (11 subsequent siblings) 12 siblings, 2 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 6:13 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht Commit b1c5efbfd92e ("perf parse-events: Remove hard coded legacy hardware and cache parsing") introduced a bypass to PMU filtering to prevent uncore PMUs from being filtered out during event parsing, which was required for resolving `duration_time` and `uncore_freq` when running with `--cputype`. However, this bypass was active whenever `pmu_filter` was set, which also incorrectly bypassed filtering for the `--pmu-filter` option. Introduce a `cputype_filter` boolean flag in `parse_events_state` and `parse_events_option_args` to distinguish filtering initiated by `--cputype` from that initiated by `--pmu-filter`. Restrict the core-only check in `parse_events__filter_pmu()` to when `cputype_filter` is true. Fixes: b1c5efbfd92e ("perf parse-events: Remove hard coded legacy hardware and cache parsing") Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/builtin-script.c | 1 + tools/perf/builtin-stat.c | 12 +++++++----- tools/perf/tests/expand-cgroup.c | 2 +- tools/perf/tests/parse-events.c | 11 +++++++---- tools/perf/tests/parse-metric.c | 2 +- tools/perf/tests/pmu-events.c | 8 +++++--- tools/perf/util/metricgroup.c | 23 +++++++++++++++-------- tools/perf/util/metricgroup.h | 4 +++- tools/perf/util/parse-events.c | 22 ++++++++++++++-------- tools/perf/util/parse-events.h | 17 +++++++++++------ tools/perf/util/python.c | 2 +- 11 files changed, 66 insertions(+), 38 deletions(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 9ac29bdc3cd5..6bd4d8627704 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -2174,6 +2174,7 @@ static int script_find_metrics(const struct pmu_metric *pm, struct evsel *metric_evsel; int ret = metricgroup__parse_groups(metric_evlist, /*pmu=*/"all", + /*cputype_filter=*/false, pm->metric_name, /*metric_no_group=*/false, /*metric_no_merge=*/false, diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index a04466ea3b0a..2505bc604e53 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -1210,6 +1210,7 @@ static int parse_cputype(const struct option *opt, return -1; } parse_events_option_args.pmu_filter = pmu->name; + parse_events_option_args.cputype_filter = true; return 0; } @@ -1226,6 +1227,7 @@ static int parse_pmu_filter(const struct option *opt, } parse_events_option_args.pmu_filter = str; + parse_events_option_args.cputype_filter = false; return 0; } @@ -1999,7 +2001,7 @@ static int add_default_events(void) ret = -1; goto out; } - ret = metricgroup__parse_groups(evlist, pmu, "transaction", + ret = metricgroup__parse_groups(evlist, pmu, false, "transaction", stat_config.metric_no_group, stat_config.metric_no_merge, stat_config.metric_no_threshold, @@ -2036,7 +2038,7 @@ static int add_default_events(void) if (!force_metric_only) stat_config.metric_only = true; - ret = metricgroup__parse_groups(evlist, pmu, "smi", + ret = metricgroup__parse_groups(evlist, pmu, false, "smi", stat_config.metric_no_group, stat_config.metric_no_merge, stat_config.metric_no_threshold, @@ -2073,7 +2075,7 @@ static int add_default_events(void) } str[8] = stat_config.topdown_level + '0'; if (metricgroup__parse_groups(evlist, - pmu, str, + pmu, false, str, /*metric_no_group=*/false, /*metric_no_merge=*/false, /*metric_no_threshold=*/true, @@ -2112,7 +2114,7 @@ static int add_default_events(void) ret = -ENOMEM; break; } - if (metricgroup__parse_groups(metric_evlist, pmu, default_metricgroup_names[i], + if (metricgroup__parse_groups(metric_evlist, pmu, false, default_metricgroup_names[i], /*metric_no_group=*/false, /*metric_no_merge=*/false, /*metric_no_threshold=*/true, @@ -2848,7 +2850,7 @@ int cmd_stat(int argc, const char **argv) */ if (metrics) { const char *pmu = parse_events_option_args.pmu_filter ?: "all"; - int ret = metricgroup__parse_groups(evsel_list, pmu, metrics, + int ret = metricgroup__parse_groups(evsel_list, pmu, parse_events_option_args.cputype_filter, metrics, stat_config.metric_no_group, stat_config.metric_no_merge, stat_config.metric_no_threshold, diff --git a/tools/perf/tests/expand-cgroup.c b/tools/perf/tests/expand-cgroup.c index dd547f2f77cc..be6d7feeb38e 100644 --- a/tools/perf/tests/expand-cgroup.c +++ b/tools/perf/tests/expand-cgroup.c @@ -179,7 +179,7 @@ static int expand_metric_events(void) TEST_ASSERT_VAL("failed to get evlist", evlist); pme_test = find_core_metrics_table("testarch", "testcpu"); - ret = metricgroup__parse_groups_test(evlist, pme_test, metric_str); + ret = metricgroup__parse_groups_test(evlist, pme_test, metric_str, false); if (ret < 0) { pr_debug("failed to parse '%s' metric\n", metric_str); goto out; diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c index 05c3e899b425..2cbe81b9c886 100644 --- a/tools/perf/tests/parse-events.c +++ b/tools/perf/tests/parse-events.c @@ -2556,8 +2556,10 @@ static int test_event(const struct evlist_test *e) return TEST_FAIL; } parse_events_error__init(&err); - ret = __parse_events(evlist, e->name, /*pmu_filter=*/NULL, &err, /*fake_pmu=*/false, - /*warn_if_reordered=*/true, /*fake_tp=*/true); + ret = __parse_events(evlist, e->name, /*pmu_filter=*/NULL, + /*cputype_filter=*/false, &err, /*fake_pmu=*/false, + /*warn_if_reordered=*/true, + /*fake_tp=*/true); if (ret) { pr_debug("failed to parse event '%s', err %d\n", e->name, ret); parse_events_error__print(&err, e->name); @@ -2584,8 +2586,9 @@ static int test_event_fake_pmu(const char *str) return -ENOMEM; parse_events_error__init(&err); - ret = __parse_events(evlist, str, /*pmu_filter=*/NULL, &err, - /*fake_pmu=*/true, /*warn_if_reordered=*/true, + ret = __parse_events(evlist, str, /*pmu_filter=*/NULL, + /*cputype_filter=*/false, &err, /*fake_pmu=*/true, + /*warn_if_reordered=*/true, /*fake_tp=*/true); if (ret) { pr_debug("failed to parse event '%s', err %d\n", diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c index 7c7f489a5eb0..fd9d0cae580e 100644 --- a/tools/perf/tests/parse-metric.c +++ b/tools/perf/tests/parse-metric.c @@ -92,7 +92,7 @@ static int __compute_metric(const char *name, struct value *vals, /* Parse the metric into metric_events list. */ pme_test = find_core_metrics_table("testarch", "testcpu"); - err = metricgroup__parse_groups_test(evlist, pme_test, name); + err = metricgroup__parse_groups_test(evlist, pme_test, name, false); if (err) goto out; diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c index fd5630f0a13c..df8806d8a8fd 100644 --- a/tools/perf/tests/pmu-events.c +++ b/tools/perf/tests/pmu-events.c @@ -794,8 +794,10 @@ static int check_parse_id(const char *id, struct parse_events_error *error) for (cur = strchr(dup, '@') ; cur; cur = strchr(++cur, '@')) *cur = '/'; - ret = __parse_events(evlist, dup, /*pmu_filter=*/NULL, error, /*fake_pmu=*/true, - /*warn_if_reordered=*/true, /*fake_tp=*/false); + ret = __parse_events(evlist, dup, /*pmu_filter=*/NULL, + /*cputype_filter=*/false, error, /*fake_pmu=*/true, + /*warn_if_reordered=*/true, + /*fake_tp=*/false); free(dup); evlist__delete(evlist); @@ -871,7 +873,7 @@ static int test__parsing_callback(const struct pmu_metric *pm, perf_evlist__set_maps(&evlist->core, cpus, NULL); - err = metricgroup__parse_groups_test(evlist, table, pm->metric_name); + err = metricgroup__parse_groups_test(evlist, table, pm->metric_name, false); if (err) { if (is_expected_broken_metric(pm)) { (*failures)--; diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index c2ce3e53aaee..9c9b04883841 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -1262,7 +1262,8 @@ static int parse_ids(bool metric_no_merge, bool fake_pmu, struct expr_parse_ctx *ids, const char *modifier, bool group_events, const bool tool_events[TOOL_PMU__EVENT_MAX], struct evlist **out_evlist, - const char *filter_pmu) + const char *filter_pmu, + bool cputype_filter) { struct parse_events_error parse_error; struct evlist *parsed_evlist; @@ -1317,7 +1318,9 @@ static int parse_ids(bool metric_no_merge, bool fake_pmu, pr_debug("Parsing metric events '%s'\n", events.buf); parse_events_error__init(&parse_error); ret = __parse_events(parsed_evlist, events.buf, filter_pmu, - &parse_error, fake_pmu, /*warn_if_reordered=*/false, + cputype_filter, + &parse_error, fake_pmu, + /*warn_if_reordered=*/false, /*fake_tp=*/false); if (ret) { parse_events_error__print(&parse_error, events.buf); @@ -1382,7 +1385,7 @@ static struct evsel *pick_display_evsel(struct list_head *metric_list, } static int parse_groups(struct evlist *perf_evlist, - const char *pmu, const char *str, + const char *pmu, bool cputype_filter, const char *str, bool metric_no_group, bool metric_no_merge, bool metric_no_threshold, @@ -1420,7 +1423,8 @@ static int parse_groups(struct evlist *perf_evlist, /*group_events=*/false, tool_events, &combined_evlist, - (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu); + (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu, + cputype_filter); } if (combined) expr__ctx_free(combined); @@ -1476,7 +1480,8 @@ static int parse_groups(struct evlist *perf_evlist, if (!metric_evlist) { ret = parse_ids(metric_no_merge, fake_pmu, m->pctx, m->modifier, m->group_events, tool_events, &m->evlist, - (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu); + (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu, + cputype_filter); if (ret) goto out; @@ -1557,6 +1562,7 @@ static int parse_groups(struct evlist *perf_evlist, int metricgroup__parse_groups(struct evlist *perf_evlist, const char *pmu, + bool cputype_filter, const char *str, bool metric_no_group, bool metric_no_merge, @@ -1570,16 +1576,17 @@ int metricgroup__parse_groups(struct evlist *perf_evlist, if (hardware_aware_grouping) pr_debug("Use hardware aware grouping instead of traditional metric grouping method\n"); - return parse_groups(perf_evlist, pmu, str, metric_no_group, metric_no_merge, + return parse_groups(perf_evlist, pmu, cputype_filter, str, metric_no_group, metric_no_merge, metric_no_threshold, user_requested_cpu_list, system_wide, /*fake_pmu=*/false, table); } int metricgroup__parse_groups_test(struct evlist *evlist, const struct pmu_metrics_table *table, - const char *str) + const char *str, + bool cputype_filter) { - return parse_groups(evlist, "all", str, + return parse_groups(evlist, "all", cputype_filter, str, /*metric_no_group=*/false, /*metric_no_merge=*/false, /*metric_no_threshold=*/false, diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h index 4be6bfc13c46..6a66f14dd01b 100644 --- a/tools/perf/util/metricgroup.h +++ b/tools/perf/util/metricgroup.h @@ -71,6 +71,7 @@ struct metric_event *metricgroup__lookup(struct rblist *metric_events, bool create); int metricgroup__parse_groups(struct evlist *perf_evlist, const char *pmu, + bool cputype_filter, const char *str, bool metric_no_group, bool metric_no_merge, @@ -80,7 +81,8 @@ int metricgroup__parse_groups(struct evlist *perf_evlist, bool hardware_aware_grouping); int metricgroup__parse_groups_test(struct evlist *evlist, const struct pmu_metrics_table *table, - const char *str); + const char *str, + bool cputype_filter); int metricgroup__for_each_metric(const struct pmu_metrics_table *table, pmu_metric_iter_fn fn, void *data); diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 943569e82b82..9f64d5197b8a 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -429,6 +429,9 @@ bool parse_events__filter_pmu(const struct parse_events_state *parse_state, if (parse_state->pmu_filter == NULL) return false; + if (parse_state->cputype_filter && !pmu->is_core) + return false; + return perf_pmu__wildcard_match(pmu, parse_state->pmu_filter) == 0; } @@ -2288,18 +2291,20 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list) return (idx_changed || num_leaders != orig_num_leaders) ? 1 : 0; } -int __parse_events(struct evlist *evlist, const char *str, const char *pmu_filter, +int __parse_events(struct evlist *evlist, const char *str, + const char *pmu_filter, bool cputype_filter, struct parse_events_error *err, bool fake_pmu, bool warn_if_reordered, bool fake_tp) { struct parse_events_state parse_state = { - .list = LIST_HEAD_INIT(parse_state.list), - .idx = evlist->core.nr_entries, - .error = err, - .stoken = PE_START_EVENTS, + .list = LIST_HEAD_INIT(parse_state.list), + .idx = evlist->core.nr_entries, + .error = err, + .stoken = PE_START_EVENTS, .fake_pmu = fake_pmu, - .fake_tp = fake_tp, + .fake_tp = fake_tp, .pmu_filter = pmu_filter, + .cputype_filter = cputype_filter, .match_legacy_cache_terms = true, }; int ret, ret2; @@ -2518,8 +2523,9 @@ int parse_events_option(const struct option *opt, const char *str, int ret; parse_events_error__init(&err); - ret = __parse_events(*args->evlistp, str, args->pmu_filter, &err, - /*fake_pmu=*/false, /*warn_if_reordered=*/true, + ret = __parse_events(*args->evlistp, str, args->pmu_filter, + args->cputype_filter, &err, /*fake_pmu=*/false, + /*warn_if_reordered=*/true, /*fake_tp=*/false); if (ret) { diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h index 3577ab213730..b14c832b03a1 100644 --- a/tools/perf/util/parse-events.h +++ b/tools/perf/util/parse-events.h @@ -26,20 +26,23 @@ const char *event_type(size_t type); struct parse_events_option_args { struct evlist **evlistp; const char *pmu_filter; + bool cputype_filter; }; int parse_events_option(const struct option *opt, const char *str, int unset); int parse_events_option_new_evlist(const struct option *opt, const char *str, int unset); -__attribute__((nonnull(1, 2, 4))) -int __parse_events(struct evlist *evlist, const char *str, const char *pmu_filter, - struct parse_events_error *error, bool fake_pmu, - bool warn_if_reordered, bool fake_tp); +__attribute__((nonnull(1, 2, 5))) int +__parse_events(struct evlist *evlist, const char *str, const char *pmu_filter, + bool cputype_filter, struct parse_events_error *error, + bool fake_pmu, bool warn_if_reordered, bool fake_tp); __attribute__((nonnull(1, 2, 3))) static inline int parse_events(struct evlist *evlist, const char *str, struct parse_events_error *err) { - return __parse_events(evlist, str, /*pmu_filter=*/NULL, err, /*fake_pmu=*/false, - /*warn_if_reordered=*/true, /*fake_tp=*/false); + return __parse_events(evlist, str, /*pmu_filter=*/NULL, + /*cputype_filter=*/false, err, /*fake_pmu=*/false, + /*warn_if_reordered=*/true, + /*fake_tp=*/false); } int parse_event(struct evlist *evlist, const char *str); @@ -161,6 +164,8 @@ struct parse_events_state { bool fake_tp; /* If non-null, when wildcard matching only match the given PMU. */ const char *pmu_filter; + /* If true, the pmu_filter was set by --cputype option. */ + bool cputype_filter; /* Should PE_LEGACY_NAME tokens be generated for config terms? */ bool match_legacy_cache_terms; /* Were multiple PMUs scanned to find events? */ diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c index cc1019d29a5d..fd9095aa4430 100644 --- a/tools/perf/util/python.c +++ b/tools/perf/util/python.c @@ -2104,7 +2104,7 @@ static PyObject *pyrf__parse_metrics(PyObject *self, PyObject *args) cpus = pcpus ? ((struct pyrf_cpu_map *)pcpus)->cpus : NULL; evlist__init(&evlist, cpus, threads); - ret = metricgroup__parse_groups(&evlist, pmu ?: "all", input, + ret = metricgroup__parse_groups(&evlist, pmu ?: "all", false, input, /*metric_no_group=*/ false, /*metric_no_merge=*/ false, /*metric_no_threshold=*/ true, -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v2 01/12] perf parse-events: Restrict core PMU bypass to --cputype option 2026-06-16 6:13 ` [PATCH v2 01/12] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers @ 2026-06-16 15:14 ` Arnaldo Carvalho de Melo 2026-06-16 15:17 ` Arnaldo Carvalho de Melo 1 sibling, 0 replies; 43+ messages in thread From: Arnaldo Carvalho de Melo @ 2026-06-16 15:14 UTC (permalink / raw) To: Ian Rogers Cc: namhyung, adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht On Mon, Jun 15, 2026 at 11:13:53PM -0700, Ian Rogers wrote: > Commit b1c5efbfd92e ("perf parse-events: Remove hard coded legacy hardware > and cache parsing") introduced a bypass to PMU filtering to prevent uncore PMUs > from being filtered out during event parsing, which was required for resolving > `duration_time` and `uncore_freq` when running with `--cputype`. However, this > bypass was active whenever `pmu_filter` was set, which also incorrectly bypassed > filtering for the `--pmu-filter` option. > > Introduce a `cputype_filter` boolean flag in `parse_events_state` and > `parse_events_option_args` to distinguish filtering initiated by `--cputype` > from that initiated by `--pmu-filter`. Restrict the core-only check in > `parse_events__filter_pmu()` to when `cputype_filter` is true. > > Fixes: b1c5efbfd92e ("perf parse-events: Remove hard coded legacy hardware and cache parsing") > Signed-off-by: Ian Rogers <irogers@google.com> > --- > tools/perf/builtin-script.c | 1 + > tools/perf/builtin-stat.c | 12 +++++++----- > tools/perf/tests/expand-cgroup.c | 2 +- > tools/perf/tests/parse-events.c | 11 +++++++---- > tools/perf/tests/parse-metric.c | 2 +- > tools/perf/tests/pmu-events.c | 8 +++++--- > tools/perf/util/metricgroup.c | 23 +++++++++++++++-------- > tools/perf/util/metricgroup.h | 4 +++- > tools/perf/util/parse-events.c | 22 ++++++++++++++-------- > tools/perf/util/parse-events.h | 17 +++++++++++------ > tools/perf/util/python.c | 2 +- > 11 files changed, 66 insertions(+), 38 deletions(-) > > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c > index 9ac29bdc3cd5..6bd4d8627704 100644 > --- a/tools/perf/builtin-script.c > +++ b/tools/perf/builtin-script.c > @@ -2174,6 +2174,7 @@ static int script_find_metrics(const struct pmu_metric *pm, > struct evsel *metric_evsel; > int ret = metricgroup__parse_groups(metric_evlist, > /*pmu=*/"all", > + /*cputype_filter=*/false, > pm->metric_name, > /*metric_no_group=*/false, > /*metric_no_merge=*/false, > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c > index a04466ea3b0a..2505bc604e53 100644 > --- a/tools/perf/builtin-stat.c > +++ b/tools/perf/builtin-stat.c > @@ -1210,6 +1210,7 @@ static int parse_cputype(const struct option *opt, > return -1; > } > parse_events_option_args.pmu_filter = pmu->name; > + parse_events_option_args.cputype_filter = true; > > return 0; > } > @@ -1226,6 +1227,7 @@ static int parse_pmu_filter(const struct option *opt, > } > > parse_events_option_args.pmu_filter = str; > + parse_events_option_args.cputype_filter = false; > return 0; > } > > @@ -1999,7 +2001,7 @@ static int add_default_events(void) > ret = -1; > goto out; > } > - ret = metricgroup__parse_groups(evlist, pmu, "transaction", > + ret = metricgroup__parse_groups(evlist, pmu, false, "transaction", Missing /*parm_name=*/false ? I'll turn this into a rule for my patches :-) > stat_config.metric_no_group, > stat_config.metric_no_merge, > stat_config.metric_no_threshold, > @@ -2036,7 +2038,7 @@ static int add_default_events(void) > if (!force_metric_only) > stat_config.metric_only = true; > > - ret = metricgroup__parse_groups(evlist, pmu, "smi", > + ret = metricgroup__parse_groups(evlist, pmu, false, "smi", Ditto > stat_config.metric_no_group, > stat_config.metric_no_merge, > stat_config.metric_no_threshold, > @@ -2073,7 +2075,7 @@ static int add_default_events(void) > } > str[8] = stat_config.topdown_level + '0'; > if (metricgroup__parse_groups(evlist, > - pmu, str, > + pmu, false, str, ditto > /*metric_no_group=*/false, > /*metric_no_merge=*/false, > /*metric_no_threshold=*/true, > @@ -2112,7 +2114,7 @@ static int add_default_events(void) > ret = -ENOMEM; > break; > } > - if (metricgroup__parse_groups(metric_evlist, pmu, default_metricgroup_names[i], > + if (metricgroup__parse_groups(metric_evlist, pmu, false, default_metricgroup_names[i], > /*metric_no_group=*/false, > /*metric_no_merge=*/false, > /*metric_no_threshold=*/true, > @@ -2848,7 +2850,7 @@ int cmd_stat(int argc, const char **argv) > */ > if (metrics) { > const char *pmu = parse_events_option_args.pmu_filter ?: "all"; > - int ret = metricgroup__parse_groups(evsel_list, pmu, metrics, > + int ret = metricgroup__parse_groups(evsel_list, pmu, parse_events_option_args.cputype_filter, metrics, > stat_config.metric_no_group, > stat_config.metric_no_merge, > stat_config.metric_no_threshold, > diff --git a/tools/perf/tests/expand-cgroup.c b/tools/perf/tests/expand-cgroup.c > index dd547f2f77cc..be6d7feeb38e 100644 > --- a/tools/perf/tests/expand-cgroup.c > +++ b/tools/perf/tests/expand-cgroup.c > @@ -179,7 +179,7 @@ static int expand_metric_events(void) > TEST_ASSERT_VAL("failed to get evlist", evlist); > > pme_test = find_core_metrics_table("testarch", "testcpu"); > - ret = metricgroup__parse_groups_test(evlist, pme_test, metric_str); > + ret = metricgroup__parse_groups_test(evlist, pme_test, metric_str, false); > if (ret < 0) { > pr_debug("failed to parse '%s' metric\n", metric_str); > goto out; > diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c > index 05c3e899b425..2cbe81b9c886 100644 > --- a/tools/perf/tests/parse-events.c > +++ b/tools/perf/tests/parse-events.c > @@ -2556,8 +2556,10 @@ static int test_event(const struct evlist_test *e) > return TEST_FAIL; > } > parse_events_error__init(&err); > - ret = __parse_events(evlist, e->name, /*pmu_filter=*/NULL, &err, /*fake_pmu=*/false, > - /*warn_if_reordered=*/true, /*fake_tp=*/true); > + ret = __parse_events(evlist, e->name, /*pmu_filter=*/NULL, > + /*cputype_filter=*/false, &err, /*fake_pmu=*/false, > + /*warn_if_reordered=*/true, > + /*fake_tp=*/true); > if (ret) { > pr_debug("failed to parse event '%s', err %d\n", e->name, ret); > parse_events_error__print(&err, e->name); > @@ -2584,8 +2586,9 @@ static int test_event_fake_pmu(const char *str) > return -ENOMEM; > > parse_events_error__init(&err); > - ret = __parse_events(evlist, str, /*pmu_filter=*/NULL, &err, > - /*fake_pmu=*/true, /*warn_if_reordered=*/true, > + ret = __parse_events(evlist, str, /*pmu_filter=*/NULL, > + /*cputype_filter=*/false, &err, /*fake_pmu=*/true, > + /*warn_if_reordered=*/true, > /*fake_tp=*/true); > if (ret) { > pr_debug("failed to parse event '%s', err %d\n", > diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c > index 7c7f489a5eb0..fd9d0cae580e 100644 > --- a/tools/perf/tests/parse-metric.c > +++ b/tools/perf/tests/parse-metric.c > @@ -92,7 +92,7 @@ static int __compute_metric(const char *name, struct value *vals, > > /* Parse the metric into metric_events list. */ > pme_test = find_core_metrics_table("testarch", "testcpu"); > - err = metricgroup__parse_groups_test(evlist, pme_test, name); > + err = metricgroup__parse_groups_test(evlist, pme_test, name, false); > if (err) > goto out; > > diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c > index fd5630f0a13c..df8806d8a8fd 100644 > --- a/tools/perf/tests/pmu-events.c > +++ b/tools/perf/tests/pmu-events.c > @@ -794,8 +794,10 @@ static int check_parse_id(const char *id, struct parse_events_error *error) > for (cur = strchr(dup, '@') ; cur; cur = strchr(++cur, '@')) > *cur = '/'; > > - ret = __parse_events(evlist, dup, /*pmu_filter=*/NULL, error, /*fake_pmu=*/true, > - /*warn_if_reordered=*/true, /*fake_tp=*/false); > + ret = __parse_events(evlist, dup, /*pmu_filter=*/NULL, > + /*cputype_filter=*/false, error, /*fake_pmu=*/true, > + /*warn_if_reordered=*/true, > + /*fake_tp=*/false); > free(dup); > > evlist__delete(evlist); > @@ -871,7 +873,7 @@ static int test__parsing_callback(const struct pmu_metric *pm, > > perf_evlist__set_maps(&evlist->core, cpus, NULL); > > - err = metricgroup__parse_groups_test(evlist, table, pm->metric_name); > + err = metricgroup__parse_groups_test(evlist, table, pm->metric_name, false); > if (err) { > if (is_expected_broken_metric(pm)) { > (*failures)--; > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c > index c2ce3e53aaee..9c9b04883841 100644 > --- a/tools/perf/util/metricgroup.c > +++ b/tools/perf/util/metricgroup.c > @@ -1262,7 +1262,8 @@ static int parse_ids(bool metric_no_merge, bool fake_pmu, > struct expr_parse_ctx *ids, const char *modifier, > bool group_events, const bool tool_events[TOOL_PMU__EVENT_MAX], > struct evlist **out_evlist, > - const char *filter_pmu) > + const char *filter_pmu, > + bool cputype_filter) > { > struct parse_events_error parse_error; > struct evlist *parsed_evlist; > @@ -1317,7 +1318,9 @@ static int parse_ids(bool metric_no_merge, bool fake_pmu, > pr_debug("Parsing metric events '%s'\n", events.buf); > parse_events_error__init(&parse_error); > ret = __parse_events(parsed_evlist, events.buf, filter_pmu, > - &parse_error, fake_pmu, /*warn_if_reordered=*/false, > + cputype_filter, > + &parse_error, fake_pmu, > + /*warn_if_reordered=*/false, > /*fake_tp=*/false); > if (ret) { > parse_events_error__print(&parse_error, events.buf); > @@ -1382,7 +1385,7 @@ static struct evsel *pick_display_evsel(struct list_head *metric_list, > } > > static int parse_groups(struct evlist *perf_evlist, > - const char *pmu, const char *str, > + const char *pmu, bool cputype_filter, const char *str, > bool metric_no_group, > bool metric_no_merge, > bool metric_no_threshold, > @@ -1420,7 +1423,8 @@ static int parse_groups(struct evlist *perf_evlist, > /*group_events=*/false, > tool_events, > &combined_evlist, > - (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu); > + (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu, > + cputype_filter); > } > if (combined) > expr__ctx_free(combined); > @@ -1476,7 +1480,8 @@ static int parse_groups(struct evlist *perf_evlist, > if (!metric_evlist) { > ret = parse_ids(metric_no_merge, fake_pmu, m->pctx, m->modifier, > m->group_events, tool_events, &m->evlist, > - (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu); > + (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu, > + cputype_filter); > if (ret) > goto out; > > @@ -1557,6 +1562,7 @@ static int parse_groups(struct evlist *perf_evlist, > > int metricgroup__parse_groups(struct evlist *perf_evlist, > const char *pmu, > + bool cputype_filter, > const char *str, > bool metric_no_group, > bool metric_no_merge, > @@ -1570,16 +1576,17 @@ int metricgroup__parse_groups(struct evlist *perf_evlist, > if (hardware_aware_grouping) > pr_debug("Use hardware aware grouping instead of traditional metric grouping method\n"); > > - return parse_groups(perf_evlist, pmu, str, metric_no_group, metric_no_merge, > + return parse_groups(perf_evlist, pmu, cputype_filter, str, metric_no_group, metric_no_merge, > metric_no_threshold, user_requested_cpu_list, system_wide, > /*fake_pmu=*/false, table); > } > > int metricgroup__parse_groups_test(struct evlist *evlist, > const struct pmu_metrics_table *table, > - const char *str) > + const char *str, > + bool cputype_filter) > { > - return parse_groups(evlist, "all", str, > + return parse_groups(evlist, "all", cputype_filter, str, > /*metric_no_group=*/false, > /*metric_no_merge=*/false, > /*metric_no_threshold=*/false, > diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h > index 4be6bfc13c46..6a66f14dd01b 100644 > --- a/tools/perf/util/metricgroup.h > +++ b/tools/perf/util/metricgroup.h > @@ -71,6 +71,7 @@ struct metric_event *metricgroup__lookup(struct rblist *metric_events, > bool create); > int metricgroup__parse_groups(struct evlist *perf_evlist, > const char *pmu, > + bool cputype_filter, > const char *str, > bool metric_no_group, > bool metric_no_merge, > @@ -80,7 +81,8 @@ int metricgroup__parse_groups(struct evlist *perf_evlist, > bool hardware_aware_grouping); > int metricgroup__parse_groups_test(struct evlist *evlist, > const struct pmu_metrics_table *table, > - const char *str); > + const char *str, > + bool cputype_filter); > > int metricgroup__for_each_metric(const struct pmu_metrics_table *table, pmu_metric_iter_fn fn, > void *data); > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c > index 943569e82b82..9f64d5197b8a 100644 > --- a/tools/perf/util/parse-events.c > +++ b/tools/perf/util/parse-events.c > @@ -429,6 +429,9 @@ bool parse_events__filter_pmu(const struct parse_events_state *parse_state, > if (parse_state->pmu_filter == NULL) > return false; > > + if (parse_state->cputype_filter && !pmu->is_core) > + return false; > + > return perf_pmu__wildcard_match(pmu, parse_state->pmu_filter) == 0; > } > > @@ -2288,18 +2291,20 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list) > return (idx_changed || num_leaders != orig_num_leaders) ? 1 : 0; > } > > -int __parse_events(struct evlist *evlist, const char *str, const char *pmu_filter, > +int __parse_events(struct evlist *evlist, const char *str, > + const char *pmu_filter, bool cputype_filter, > struct parse_events_error *err, bool fake_pmu, > bool warn_if_reordered, bool fake_tp) > { > struct parse_events_state parse_state = { > - .list = LIST_HEAD_INIT(parse_state.list), > - .idx = evlist->core.nr_entries, > - .error = err, > - .stoken = PE_START_EVENTS, > + .list = LIST_HEAD_INIT(parse_state.list), > + .idx = evlist->core.nr_entries, > + .error = err, > + .stoken = PE_START_EVENTS, > .fake_pmu = fake_pmu, > - .fake_tp = fake_tp, > + .fake_tp = fake_tp, > .pmu_filter = pmu_filter, > + .cputype_filter = cputype_filter, > .match_legacy_cache_terms = true, > }; > int ret, ret2; > @@ -2518,8 +2523,9 @@ int parse_events_option(const struct option *opt, const char *str, > int ret; > > parse_events_error__init(&err); > - ret = __parse_events(*args->evlistp, str, args->pmu_filter, &err, > - /*fake_pmu=*/false, /*warn_if_reordered=*/true, > + ret = __parse_events(*args->evlistp, str, args->pmu_filter, > + args->cputype_filter, &err, /*fake_pmu=*/false, > + /*warn_if_reordered=*/true, > /*fake_tp=*/false); > > if (ret) { > diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h > index 3577ab213730..b14c832b03a1 100644 > --- a/tools/perf/util/parse-events.h > +++ b/tools/perf/util/parse-events.h > @@ -26,20 +26,23 @@ const char *event_type(size_t type); > struct parse_events_option_args { > struct evlist **evlistp; > const char *pmu_filter; > + bool cputype_filter; > }; > int parse_events_option(const struct option *opt, const char *str, int unset); > int parse_events_option_new_evlist(const struct option *opt, const char *str, int unset); > -__attribute__((nonnull(1, 2, 4))) > -int __parse_events(struct evlist *evlist, const char *str, const char *pmu_filter, > - struct parse_events_error *error, bool fake_pmu, > - bool warn_if_reordered, bool fake_tp); > +__attribute__((nonnull(1, 2, 5))) int > +__parse_events(struct evlist *evlist, const char *str, const char *pmu_filter, > + bool cputype_filter, struct parse_events_error *error, > + bool fake_pmu, bool warn_if_reordered, bool fake_tp); > > __attribute__((nonnull(1, 2, 3))) > static inline int parse_events(struct evlist *evlist, const char *str, > struct parse_events_error *err) > { > - return __parse_events(evlist, str, /*pmu_filter=*/NULL, err, /*fake_pmu=*/false, > - /*warn_if_reordered=*/true, /*fake_tp=*/false); > + return __parse_events(evlist, str, /*pmu_filter=*/NULL, > + /*cputype_filter=*/false, err, /*fake_pmu=*/false, > + /*warn_if_reordered=*/true, > + /*fake_tp=*/false); > } > > int parse_event(struct evlist *evlist, const char *str); > @@ -161,6 +164,8 @@ struct parse_events_state { > bool fake_tp; > /* If non-null, when wildcard matching only match the given PMU. */ > const char *pmu_filter; > + /* If true, the pmu_filter was set by --cputype option. */ > + bool cputype_filter; > /* Should PE_LEGACY_NAME tokens be generated for config terms? */ > bool match_legacy_cache_terms; > /* Were multiple PMUs scanned to find events? */ > diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c > index cc1019d29a5d..fd9095aa4430 100644 > --- a/tools/perf/util/python.c > +++ b/tools/perf/util/python.c > @@ -2104,7 +2104,7 @@ static PyObject *pyrf__parse_metrics(PyObject *self, PyObject *args) > cpus = pcpus ? ((struct pyrf_cpu_map *)pcpus)->cpus : NULL; > > evlist__init(&evlist, cpus, threads); > - ret = metricgroup__parse_groups(&evlist, pmu ?: "all", input, > + ret = metricgroup__parse_groups(&evlist, pmu ?: "all", false, input, > /*metric_no_group=*/ false, > /*metric_no_merge=*/ false, > /*metric_no_threshold=*/ true, > -- > 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH v2 01/12] perf parse-events: Restrict core PMU bypass to --cputype option 2026-06-16 6:13 ` [PATCH v2 01/12] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers 2026-06-16 15:14 ` Arnaldo Carvalho de Melo @ 2026-06-16 15:17 ` Arnaldo Carvalho de Melo 1 sibling, 0 replies; 43+ messages in thread From: Arnaldo Carvalho de Melo @ 2026-06-16 15:17 UTC (permalink / raw) To: Ian Rogers Cc: namhyung, adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht On Mon, Jun 15, 2026 at 11:13:53PM -0700, Ian Rogers wrote: > Commit b1c5efbfd92e ("perf parse-events: Remove hard coded legacy hardware > and cache parsing") introduced a bypass to PMU filtering to prevent uncore PMUs > from being filtered out during event parsing, which was required for resolving > `duration_time` and `uncore_freq` when running with `--cputype`. However, this > bypass was active whenever `pmu_filter` was set, which also incorrectly bypassed > filtering for the `--pmu-filter` option. > > Introduce a `cputype_filter` boolean flag in `parse_events_state` and > `parse_events_option_args` to distinguish filtering initiated by `--cputype` > from that initiated by `--pmu-filter`. Restrict the core-only check in > `parse_events__filter_pmu()` to when `cputype_filter` is true. > > Fixes: b1c5efbfd92e ("perf parse-events: Remove hard coded legacy hardware and cache parsing") > Signed-off-by: Ian Rogers <irogers@google.com> > --- > tools/perf/builtin-script.c | 1 + > tools/perf/builtin-stat.c | 12 +++++++----- > tools/perf/tests/expand-cgroup.c | 2 +- > tools/perf/tests/parse-events.c | 11 +++++++---- > tools/perf/tests/parse-metric.c | 2 +- > tools/perf/tests/pmu-events.c | 8 +++++--- > tools/perf/util/metricgroup.c | 23 +++++++++++++++-------- > tools/perf/util/metricgroup.h | 4 +++- > tools/perf/util/parse-events.c | 22 ++++++++++++++-------- > tools/perf/util/parse-events.h | 17 +++++++++++------ > tools/perf/util/python.c | 2 +- > 11 files changed, 66 insertions(+), 38 deletions(-) > > diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c > index 9ac29bdc3cd5..6bd4d8627704 100644 > --- a/tools/perf/builtin-script.c > +++ b/tools/perf/builtin-script.c > @@ -2174,6 +2174,7 @@ static int script_find_metrics(const struct pmu_metric *pm, > struct evsel *metric_evsel; > int ret = metricgroup__parse_groups(metric_evlist, > /*pmu=*/"all", > + /*cputype_filter=*/false, > pm->metric_name, > /*metric_no_group=*/false, > /*metric_no_merge=*/false, > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c > index a04466ea3b0a..2505bc604e53 100644 > --- a/tools/perf/builtin-stat.c > +++ b/tools/perf/builtin-stat.c > @@ -1210,6 +1210,7 @@ static int parse_cputype(const struct option *opt, > return -1; > } > parse_events_option_args.pmu_filter = pmu->name; > + parse_events_option_args.cputype_filter = true; > > return 0; > } > @@ -1226,6 +1227,7 @@ static int parse_pmu_filter(const struct option *opt, > } > > parse_events_option_args.pmu_filter = str; > + parse_events_option_args.cputype_filter = false; > return 0; > } > > @@ -1999,7 +2001,7 @@ static int add_default_events(void) > ret = -1; > goto out; > } > - ret = metricgroup__parse_groups(evlist, pmu, "transaction", > + ret = metricgroup__parse_groups(evlist, pmu, false, "transaction", > stat_config.metric_no_group, > stat_config.metric_no_merge, > stat_config.metric_no_threshold, > @@ -2036,7 +2038,7 @@ static int add_default_events(void) > if (!force_metric_only) > stat_config.metric_only = true; > > - ret = metricgroup__parse_groups(evlist, pmu, "smi", > + ret = metricgroup__parse_groups(evlist, pmu, false, "smi", > stat_config.metric_no_group, > stat_config.metric_no_merge, > stat_config.metric_no_threshold, > @@ -2073,7 +2075,7 @@ static int add_default_events(void) > } > str[8] = stat_config.topdown_level + '0'; > if (metricgroup__parse_groups(evlist, > - pmu, str, > + pmu, false, str, > /*metric_no_group=*/false, > /*metric_no_merge=*/false, > /*metric_no_threshold=*/true, > @@ -2112,7 +2114,7 @@ static int add_default_events(void) > ret = -ENOMEM; > break; > } > - if (metricgroup__parse_groups(metric_evlist, pmu, default_metricgroup_names[i], > + if (metricgroup__parse_groups(metric_evlist, pmu, false, default_metricgroup_names[i], > /*metric_no_group=*/false, > /*metric_no_merge=*/false, > /*metric_no_threshold=*/true, > @@ -2848,7 +2850,7 @@ int cmd_stat(int argc, const char **argv) > */ > if (metrics) { > const char *pmu = parse_events_option_args.pmu_filter ?: "all"; > - int ret = metricgroup__parse_groups(evsel_list, pmu, metrics, > + int ret = metricgroup__parse_groups(evsel_list, pmu, parse_events_option_args.cputype_filter, metrics, > stat_config.metric_no_group, > stat_config.metric_no_merge, > stat_config.metric_no_threshold, > diff --git a/tools/perf/tests/expand-cgroup.c b/tools/perf/tests/expand-cgroup.c > index dd547f2f77cc..be6d7feeb38e 100644 > --- a/tools/perf/tests/expand-cgroup.c > +++ b/tools/perf/tests/expand-cgroup.c > @@ -179,7 +179,7 @@ static int expand_metric_events(void) > TEST_ASSERT_VAL("failed to get evlist", evlist); > > pme_test = find_core_metrics_table("testarch", "testcpu"); > - ret = metricgroup__parse_groups_test(evlist, pme_test, metric_str); > + ret = metricgroup__parse_groups_test(evlist, pme_test, metric_str, false); > if (ret < 0) { > pr_debug("failed to parse '%s' metric\n", metric_str); > goto out; > diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c > index 05c3e899b425..2cbe81b9c886 100644 > --- a/tools/perf/tests/parse-events.c > +++ b/tools/perf/tests/parse-events.c > @@ -2556,8 +2556,10 @@ static int test_event(const struct evlist_test *e) > return TEST_FAIL; > } > parse_events_error__init(&err); > - ret = __parse_events(evlist, e->name, /*pmu_filter=*/NULL, &err, /*fake_pmu=*/false, > - /*warn_if_reordered=*/true, /*fake_tp=*/true); > + ret = __parse_events(evlist, e->name, /*pmu_filter=*/NULL, > + /*cputype_filter=*/false, &err, /*fake_pmu=*/false, > + /*warn_if_reordered=*/true, > + /*fake_tp=*/true); > if (ret) { > pr_debug("failed to parse event '%s', err %d\n", e->name, ret); > parse_events_error__print(&err, e->name); > @@ -2584,8 +2586,9 @@ static int test_event_fake_pmu(const char *str) > return -ENOMEM; > > parse_events_error__init(&err); > - ret = __parse_events(evlist, str, /*pmu_filter=*/NULL, &err, > - /*fake_pmu=*/true, /*warn_if_reordered=*/true, > + ret = __parse_events(evlist, str, /*pmu_filter=*/NULL, > + /*cputype_filter=*/false, &err, /*fake_pmu=*/true, > + /*warn_if_reordered=*/true, > /*fake_tp=*/true); > if (ret) { > pr_debug("failed to parse event '%s', err %d\n", > diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c > index 7c7f489a5eb0..fd9d0cae580e 100644 > --- a/tools/perf/tests/parse-metric.c > +++ b/tools/perf/tests/parse-metric.c > @@ -92,7 +92,7 @@ static int __compute_metric(const char *name, struct value *vals, > > /* Parse the metric into metric_events list. */ > pme_test = find_core_metrics_table("testarch", "testcpu"); > - err = metricgroup__parse_groups_test(evlist, pme_test, name); > + err = metricgroup__parse_groups_test(evlist, pme_test, name, false); > if (err) > goto out; > > diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c > index fd5630f0a13c..df8806d8a8fd 100644 > --- a/tools/perf/tests/pmu-events.c > +++ b/tools/perf/tests/pmu-events.c > @@ -794,8 +794,10 @@ static int check_parse_id(const char *id, struct parse_events_error *error) > for (cur = strchr(dup, '@') ; cur; cur = strchr(++cur, '@')) > *cur = '/'; > > - ret = __parse_events(evlist, dup, /*pmu_filter=*/NULL, error, /*fake_pmu=*/true, > - /*warn_if_reordered=*/true, /*fake_tp=*/false); > + ret = __parse_events(evlist, dup, /*pmu_filter=*/NULL, > + /*cputype_filter=*/false, error, /*fake_pmu=*/true, > + /*warn_if_reordered=*/true, > + /*fake_tp=*/false); > free(dup); > > evlist__delete(evlist); > @@ -871,7 +873,7 @@ static int test__parsing_callback(const struct pmu_metric *pm, > > perf_evlist__set_maps(&evlist->core, cpus, NULL); > > - err = metricgroup__parse_groups_test(evlist, table, pm->metric_name); > + err = metricgroup__parse_groups_test(evlist, table, pm->metric_name, false); > if (err) { > if (is_expected_broken_metric(pm)) { > (*failures)--; > diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c > index c2ce3e53aaee..9c9b04883841 100644 > --- a/tools/perf/util/metricgroup.c > +++ b/tools/perf/util/metricgroup.c > @@ -1262,7 +1262,8 @@ static int parse_ids(bool metric_no_merge, bool fake_pmu, > struct expr_parse_ctx *ids, const char *modifier, > bool group_events, const bool tool_events[TOOL_PMU__EVENT_MAX], > struct evlist **out_evlist, > - const char *filter_pmu) > + const char *filter_pmu, > + bool cputype_filter) > { > struct parse_events_error parse_error; > struct evlist *parsed_evlist; > @@ -1317,7 +1318,9 @@ static int parse_ids(bool metric_no_merge, bool fake_pmu, > pr_debug("Parsing metric events '%s'\n", events.buf); > parse_events_error__init(&parse_error); > ret = __parse_events(parsed_evlist, events.buf, filter_pmu, > - &parse_error, fake_pmu, /*warn_if_reordered=*/false, > + cputype_filter, > + &parse_error, fake_pmu, > + /*warn_if_reordered=*/false, > /*fake_tp=*/false); > if (ret) { > parse_events_error__print(&parse_error, events.buf); > @@ -1382,7 +1385,7 @@ static struct evsel *pick_display_evsel(struct list_head *metric_list, > } > > static int parse_groups(struct evlist *perf_evlist, > - const char *pmu, const char *str, > + const char *pmu, bool cputype_filter, const char *str, > bool metric_no_group, > bool metric_no_merge, > bool metric_no_threshold, > @@ -1420,7 +1423,8 @@ static int parse_groups(struct evlist *perf_evlist, > /*group_events=*/false, > tool_events, > &combined_evlist, > - (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu); > + (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu, > + cputype_filter); > } > if (combined) > expr__ctx_free(combined); > @@ -1476,7 +1480,8 @@ static int parse_groups(struct evlist *perf_evlist, > if (!metric_evlist) { > ret = parse_ids(metric_no_merge, fake_pmu, m->pctx, m->modifier, > m->group_events, tool_events, &m->evlist, > - (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu); > + (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu, > + cputype_filter); > if (ret) > goto out; > > @@ -1557,6 +1562,7 @@ static int parse_groups(struct evlist *perf_evlist, > > int metricgroup__parse_groups(struct evlist *perf_evlist, > const char *pmu, > + bool cputype_filter, > const char *str, > bool metric_no_group, > bool metric_no_merge, > @@ -1570,16 +1576,17 @@ int metricgroup__parse_groups(struct evlist *perf_evlist, > if (hardware_aware_grouping) > pr_debug("Use hardware aware grouping instead of traditional metric grouping method\n"); > > - return parse_groups(perf_evlist, pmu, str, metric_no_group, metric_no_merge, > + return parse_groups(perf_evlist, pmu, cputype_filter, str, metric_no_group, metric_no_merge, > metric_no_threshold, user_requested_cpu_list, system_wide, > /*fake_pmu=*/false, table); > } > > int metricgroup__parse_groups_test(struct evlist *evlist, > const struct pmu_metrics_table *table, > - const char *str) > + const char *str, > + bool cputype_filter) > { > - return parse_groups(evlist, "all", str, > + return parse_groups(evlist, "all", cputype_filter, str, > /*metric_no_group=*/false, > /*metric_no_merge=*/false, > /*metric_no_threshold=*/false, > diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h > index 4be6bfc13c46..6a66f14dd01b 100644 > --- a/tools/perf/util/metricgroup.h > +++ b/tools/perf/util/metricgroup.h > @@ -71,6 +71,7 @@ struct metric_event *metricgroup__lookup(struct rblist *metric_events, > bool create); > int metricgroup__parse_groups(struct evlist *perf_evlist, > const char *pmu, > + bool cputype_filter, > const char *str, > bool metric_no_group, > bool metric_no_merge, > @@ -80,7 +81,8 @@ int metricgroup__parse_groups(struct evlist *perf_evlist, > bool hardware_aware_grouping); > int metricgroup__parse_groups_test(struct evlist *evlist, > const struct pmu_metrics_table *table, > - const char *str); > + const char *str, > + bool cputype_filter); > > int metricgroup__for_each_metric(const struct pmu_metrics_table *table, pmu_metric_iter_fn fn, > void *data); > diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c > index 943569e82b82..9f64d5197b8a 100644 > --- a/tools/perf/util/parse-events.c > +++ b/tools/perf/util/parse-events.c > @@ -429,6 +429,9 @@ bool parse_events__filter_pmu(const struct parse_events_state *parse_state, > if (parse_state->pmu_filter == NULL) > return false; > > + if (parse_state->cputype_filter && !pmu->is_core) > + return false; > + > return perf_pmu__wildcard_match(pmu, parse_state->pmu_filter) == 0; > } > > @@ -2288,18 +2291,20 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list) > return (idx_changed || num_leaders != orig_num_leaders) ? 1 : 0; > } > > -int __parse_events(struct evlist *evlist, const char *str, const char *pmu_filter, > +int __parse_events(struct evlist *evlist, const char *str, > + const char *pmu_filter, bool cputype_filter, > struct parse_events_error *err, bool fake_pmu, > bool warn_if_reordered, bool fake_tp) > { > struct parse_events_state parse_state = { > - .list = LIST_HEAD_INIT(parse_state.list), > - .idx = evlist->core.nr_entries, > - .error = err, > - .stoken = PE_START_EVENTS, > + .list = LIST_HEAD_INIT(parse_state.list), > + .idx = evlist->core.nr_entries, > + .error = err, > + .stoken = PE_START_EVENTS, Nit: usually we touch unchanged lines to make it align, this time it was in reverse? Why not keep it aligned? > .fake_pmu = fake_pmu, > - .fake_tp = fake_tp, > + .fake_tp = fake_tp, > .pmu_filter = pmu_filter, > + .cputype_filter = cputype_filter, > .match_legacy_cache_terms = true, > }; > int ret, ret2; > @@ -2518,8 +2523,9 @@ int parse_events_option(const struct option *opt, const char *str, > int ret; > > parse_events_error__init(&err); > - ret = __parse_events(*args->evlistp, str, args->pmu_filter, &err, > - /*fake_pmu=*/false, /*warn_if_reordered=*/true, > + ret = __parse_events(*args->evlistp, str, args->pmu_filter, > + args->cputype_filter, &err, /*fake_pmu=*/false, > + /*warn_if_reordered=*/true, > /*fake_tp=*/false); > > if (ret) { > diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h > index 3577ab213730..b14c832b03a1 100644 > --- a/tools/perf/util/parse-events.h > +++ b/tools/perf/util/parse-events.h > @@ -26,20 +26,23 @@ const char *event_type(size_t type); > struct parse_events_option_args { > struct evlist **evlistp; > const char *pmu_filter; > + bool cputype_filter; > }; > int parse_events_option(const struct option *opt, const char *str, int unset); > int parse_events_option_new_evlist(const struct option *opt, const char *str, int unset); > -__attribute__((nonnull(1, 2, 4))) > -int __parse_events(struct evlist *evlist, const char *str, const char *pmu_filter, > - struct parse_events_error *error, bool fake_pmu, > - bool warn_if_reordered, bool fake_tp); > +__attribute__((nonnull(1, 2, 5))) int > +__parse_events(struct evlist *evlist, const char *str, const char *pmu_filter, > + bool cputype_filter, struct parse_events_error *error, > + bool fake_pmu, bool warn_if_reordered, bool fake_tp); > > __attribute__((nonnull(1, 2, 3))) > static inline int parse_events(struct evlist *evlist, const char *str, > struct parse_events_error *err) > { > - return __parse_events(evlist, str, /*pmu_filter=*/NULL, err, /*fake_pmu=*/false, > - /*warn_if_reordered=*/true, /*fake_tp=*/false); > + return __parse_events(evlist, str, /*pmu_filter=*/NULL, > + /*cputype_filter=*/false, err, /*fake_pmu=*/false, > + /*warn_if_reordered=*/true, > + /*fake_tp=*/false); > } > > int parse_event(struct evlist *evlist, const char *str); > @@ -161,6 +164,8 @@ struct parse_events_state { > bool fake_tp; > /* If non-null, when wildcard matching only match the given PMU. */ > const char *pmu_filter; > + /* If true, the pmu_filter was set by --cputype option. */ > + bool cputype_filter; > /* Should PE_LEGACY_NAME tokens be generated for config terms? */ > bool match_legacy_cache_terms; > /* Were multiple PMUs scanned to find events? */ > diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c > index cc1019d29a5d..fd9095aa4430 100644 > --- a/tools/perf/util/python.c > +++ b/tools/perf/util/python.c > @@ -2104,7 +2104,7 @@ static PyObject *pyrf__parse_metrics(PyObject *self, PyObject *args) > cpus = pcpus ? ((struct pyrf_cpu_map *)pcpus)->cpus : NULL; > > evlist__init(&evlist, cpus, threads); > - ret = metricgroup__parse_groups(&evlist, pmu ?: "all", input, > + ret = metricgroup__parse_groups(&evlist, pmu ?: "all", false, input, > /*metric_no_group=*/ false, > /*metric_no_merge=*/ false, > /*metric_no_threshold=*/ true, > -- > 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 02/12] perf test: Truncate test description to fit terminal width 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers 2026-06-16 6:13 ` [PATCH v2 01/12] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers @ 2026-06-16 6:13 ` Ian Rogers 2026-06-16 15:25 ` Arnaldo Carvalho de Melo 2026-06-16 6:13 ` [PATCH v2 03/12] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers ` (10 subsequent siblings) 12 siblings, 1 reply; 43+ messages in thread From: Ian Rogers @ 2026-06-16 6:13 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The parallel test harness uses the carriage return delete escape sequence `PERF_COLOR_DELETE_LINE` ("\033[A\33[2K\r") to erase and update the "Running (X active)" progress lines. However, if a test description is longer than the terminal width, the line wraps around. When this happens, the cursor up escape sequence `\033[A` only moves the cursor to the last wrapped row, leaving the top half of the description printed on the previous line. This leads to name duplication and output corruption spilling over multiple rows on consoles narrower than the maximum description length (e.g., 101 columns wide). Fix this by dynamically querying the terminal width using `get_term_dimensions` and truncating the printed test descriptions using the `%-*.*s` printf format. We reserve 35 characters for prefix, status, and spacing metrics to guarantee the progress line never wraps. Fixes: 0e036dcad4e6 ("perf test: Display number of active running tests") Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/builtin-test.c | 110 ++++++++++++++++++++++++-------- 1 file changed, 85 insertions(+), 25 deletions(-) diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c index afc06cec4954..65a2dc50ef55 100644 --- a/tools/perf/tests/builtin-test.c +++ b/tools/perf/tests/builtin-test.c @@ -10,35 +10,39 @@ #ifdef HAVE_BACKTRACE_SUPPORT #include <execinfo.h> #endif -#include <poll.h> -#include <unistd.h> #include <setjmp.h> -#include <string.h> #include <stdlib.h> -#include <sys/types.h> +#include <string.h> + #include <dirent.h> -#include <sys/wait.h> +#include <linux/kernel.h> +#include <linux/string.h> +#include <linux/zalloc.h> +#include <poll.h> +#include <sys/ioctl.h> #include <sys/stat.h> #include <sys/time.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <unistd.h> + +#include <subcmd/exec-cmd.h> +#include <subcmd/parse-options.h> +#include <subcmd/run-command.h> + #include "builtin.h" +#include "color.h" #include "config.h" +#include "debug.h" #include "hist.h" #include "intlist.h" -#include "tests.h" -#include "debug.h" -#include "color.h" -#include <subcmd/parse-options.h> -#include <subcmd/run-command.h> #include "string2.h" #include "symbol.h" +#include "tests-scripts.h" +#include "tests.h" #include "util/rlimit.h" #include "util/strbuf.h" -#include <linux/kernel.h> -#include <linux/string.h> -#include <subcmd/exec-cmd.h> -#include <linux/zalloc.h> - -#include "tests-scripts.h" +#include "util/term.h" static const char *junit_filename; static struct strbuf junit_xml_buf = STRBUF_INIT; @@ -413,19 +417,64 @@ static char *xml_escape(const char *str) return res ? res : strdup(""); } +static int get_term_width(void) +{ + struct winsize ws; + int cols = 80; + int term_width; + + if (!isatty(1)) + return 10000; + + get_term_dimensions(&ws); + if (ws.ws_col > 0) + cols = ws.ws_col; + + /* + * Limit description width to fit on a single line. We subtract 35 + * columns of headroom to allocate space for: + * - The suite index prefix: e.g. " 10.100:" (9 characters). + * - The colon separator and spaces: " : " (3 characters). + * - The longest status results: e.g. "Skip (some metrics failed)" (26 characters) + * or "Running (XX active)" (20 characters). + * + * A minimum description width of 10 is enforced to ensure names are + * legible even on very narrow consoles. + */ + term_width = cols - 35; + if (term_width < 10) + term_width = 10; + + return term_width; +} + +static int get_max_desc_width(int width) +{ + int term_width = get_term_width(); + + return width > term_width ? term_width : width; +} + static int print_test_result(struct test_suite *t, int curr_suite, int curr_test_case, int result, int width, int running, const char *err_output, double elapsed) { + int pad_width = get_max_desc_width(width); + int term_width = get_term_width(); + if (test_suite__num_test_cases(t) > 1) { char prefix[32]; int len = snprintf(prefix, sizeof(prefix), "%3d.%1d:", curr_suite + 1, curr_test_case + 1); - int subw = len >= 4 ? width + 4 - len : width; + int pad = len >= 4 ? pad_width + 4 - len : pad_width; + int trunc = len >= 4 ? term_width + 4 - len : term_width; - pr_info("%s %-*s:", prefix, subw, test_description(t, curr_test_case)); - } else - pr_info("%3d: %-*s:", curr_suite + 1, width, test_description(t, curr_test_case)); + pr_info("%s %-*.*s:", prefix, pad, trunc, + test_description(t, curr_test_case)); + } else { + pr_info("%3d: %-*.*s:", curr_suite + 1, pad_width, term_width, + test_description(t, curr_test_case)); + } switch (result) { case TEST_RUNNING: @@ -695,6 +744,7 @@ static void finish_test(struct child_test **child_tests, int running_test, int c int ret; struct timespec end_time; double elapsed; + width = get_max_desc_width(width); if (child_test == NULL) { /* Test wasn't started. */ @@ -709,7 +759,8 @@ static void finish_test(struct child_test **child_tests, int running_test, int c * sub test names. */ if (test_suite__num_test_cases(t) > 1 && curr_test_case == 0) - pr_info("%3d: %-*s:\n", curr_suite + 1, width, test_description(t, -1)); + pr_info("%3d: %-*.*s:\n", curr_suite + 1, width, width, + test_description(t, -1)); /* * Busy loop reading from the child's stdout/stderr that are set to be @@ -917,6 +968,8 @@ static int finish_tests_parallel(struct child_test **child_tests, size_t num_tes int last_suite_printed = -1; sigset_t set, oldset; + width = get_max_desc_width(width); + sigemptyset(&set); sigaddset(&set, SIGINT); sigaddset(&set, SIGTERM); @@ -985,8 +1038,11 @@ static int finish_tests_parallel(struct child_test **child_tests, size_t num_tes if (next_child) { if (test_suite__num_test_cases(next_child->test) > 1 && last_suite_printed != next_child->suite_num) { - pr_info("%3d: %-*s:\n", next_child->suite_num + 1, width, - test_description(next_child->test, -1)); + pr_info("%3d: %-*.*s:\n", + next_child->suite_num + 1, + width, width, + test_description( + next_child->test, -1)); last_suite_printed = next_child->suite_num; } print_test_result(next_child->test, next_child->suite_num, @@ -1049,7 +1105,8 @@ static int finish_tests_parallel(struct child_test **child_tests, size_t num_tes if (test_suite__num_test_cases(child->test) > 1 && last_suite_printed != child->suite_num) { - pr_info("%3d: %-*s:\n", child->suite_num + 1, width, + pr_info("%3d: %-*.*s:\n", child->suite_num + 1, + width, width, test_description(child->test, -1)); last_suite_printed = child->suite_num; } @@ -1296,7 +1353,10 @@ static int __cmd_test(struct test_suite **suites, int argc, const char *argv[], if (intlist__find(skiplist, curr_suite + 1)) { if (pass == 1) { - pr_info("%3d: %-*s:", curr_suite + 1, width, + int pad_width = get_max_desc_width(width); + int term_width = get_term_width(); + + pr_info("%3d: %-*.*s:", curr_suite + 1, pad_width, term_width, test_description(*t, -1)); color_fprintf(stderr, PERF_COLOR_YELLOW, " Skip (user override)\n"); -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH v2 02/12] perf test: Truncate test description to fit terminal width 2026-06-16 6:13 ` [PATCH v2 02/12] perf test: Truncate test description to fit terminal width Ian Rogers @ 2026-06-16 15:25 ` Arnaldo Carvalho de Melo 0 siblings, 0 replies; 43+ messages in thread From: Arnaldo Carvalho de Melo @ 2026-06-16 15:25 UTC (permalink / raw) To: Ian Rogers Cc: namhyung, adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht On Mon, Jun 15, 2026 at 11:13:54PM -0700, Ian Rogers wrote: > The parallel test harness uses the carriage return delete escape sequence > `PERF_COLOR_DELETE_LINE` ("\033[A\33[2K\r") to erase and update the > "Running (X active)" progress lines. > > However, if a test description is longer than the terminal width, the line > wraps around. When this happens, the cursor up escape sequence `\033[A` only > moves the cursor to the last wrapped row, leaving the top half of the description > printed on the previous line. This leads to name duplication and output corruption > spilling over multiple rows on consoles narrower than the maximum description length > (e.g., 101 columns wide). > > Fix this by dynamically querying the terminal width using `get_term_dimensions` > and truncating the printed test descriptions using the `%-*.*s` printf format. > We reserve 35 characters for prefix, status, and spacing metrics to guarantee > the progress line never wraps. > > Fixes: 0e036dcad4e6 ("perf test: Display number of active running tests") > Signed-off-by: Ian Rogers <irogers@google.com> > --- > tools/perf/tests/builtin-test.c | 110 ++++++++++++++++++++++++-------- > 1 file changed, 85 insertions(+), 25 deletions(-) > > diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c > index afc06cec4954..65a2dc50ef55 100644 > --- a/tools/perf/tests/builtin-test.c > +++ b/tools/perf/tests/builtin-test.c > @@ -10,35 +10,39 @@ > #ifdef HAVE_BACKTRACE_SUPPORT > #include <execinfo.h> > #endif > -#include <poll.h> > -#include <unistd.h> > #include <setjmp.h> > -#include <string.h> > #include <stdlib.h> > -#include <sys/types.h> > +#include <string.h> > + > #include <dirent.h> > -#include <sys/wait.h> > +#include <linux/kernel.h> > +#include <linux/string.h> > +#include <linux/zalloc.h> > +#include <poll.h> > +#include <sys/ioctl.h> > #include <sys/stat.h> > #include <sys/time.h> > +#include <sys/types.h> > +#include <sys/wait.h> > +#include <unistd.h> > + > +#include <subcmd/exec-cmd.h> > +#include <subcmd/parse-options.h> > +#include <subcmd/run-command.h> > + > #include "builtin.h" > +#include "color.h" > #include "config.h" > +#include "debug.h" > #include "hist.h" > #include "intlist.h" > -#include "tests.h" > -#include "debug.h" > -#include "color.h" > -#include <subcmd/parse-options.h> > -#include <subcmd/run-command.h> > #include "string2.h" > #include "symbol.h" > +#include "tests-scripts.h" > +#include "tests.h" > #include "util/rlimit.h" > #include "util/strbuf.h" > -#include <linux/kernel.h> > -#include <linux/string.h> > -#include <subcmd/exec-cmd.h> > -#include <linux/zalloc.h> > - > -#include "tests-scripts.h" > +#include "util/term.h" > > static const char *junit_filename; > static struct strbuf junit_xml_buf = STRBUF_INIT; > @@ -413,19 +417,64 @@ static char *xml_escape(const char *str) > return res ? res : strdup(""); > } > > +static int get_term_width(void) > +{ > + struct winsize ws; > + int cols = 80; > + int term_width; > + A comment here to explain this 10000 magic number? > + if (!isatty(1)) > + return 10000; > + > + get_term_dimensions(&ws); > + if (ws.ws_col > 0) > + cols = ws.ws_col; > + > + /* > + * Limit description width to fit on a single line. We subtract 35 > + * columns of headroom to allocate space for: > + * - The suite index prefix: e.g. " 10.100:" (9 characters). > + * - The colon separator and spaces: " : " (3 characters). > + * - The longest status results: e.g. "Skip (some metrics failed)" (26 characters) > + * or "Running (XX active)" (20 characters). > + * > + * A minimum description width of 10 is enforced to ensure names are > + * legible even on very narrow consoles. > + */ > + term_width = cols - 35; > + if (term_width < 10) > + term_width = 10; > + > + return term_width; > +} > + > +static int get_max_desc_width(int width) > +{ > + int term_width = get_term_width(); > + > + return width > term_width ? term_width : width; > +} > + > static int print_test_result(struct test_suite *t, int curr_suite, int curr_test_case, > int result, int width, int running, > const char *err_output, double elapsed) > { > + int pad_width = get_max_desc_width(width); > + int term_width = get_term_width(); > + > if (test_suite__num_test_cases(t) > 1) { > char prefix[32]; > int len = snprintf(prefix, sizeof(prefix), "%3d.%1d:", > curr_suite + 1, curr_test_case + 1); > - int subw = len >= 4 ? width + 4 - len : width; > + int pad = len >= 4 ? pad_width + 4 - len : pad_width; > + int trunc = len >= 4 ? term_width + 4 - len : term_width; > > - pr_info("%s %-*s:", prefix, subw, test_description(t, curr_test_case)); > - } else > - pr_info("%3d: %-*s:", curr_suite + 1, width, test_description(t, curr_test_case)); > + pr_info("%s %-*.*s:", prefix, pad, trunc, > + test_description(t, curr_test_case)); > + } else { > + pr_info("%3d: %-*.*s:", curr_suite + 1, pad_width, term_width, > + test_description(t, curr_test_case)); > + } > > switch (result) { > case TEST_RUNNING: > @@ -695,6 +744,7 @@ static void finish_test(struct child_test **child_tests, int running_test, int c > int ret; > struct timespec end_time; > double elapsed; > + width = get_max_desc_width(width); > > if (child_test == NULL) { > /* Test wasn't started. */ > @@ -709,7 +759,8 @@ static void finish_test(struct child_test **child_tests, int running_test, int c > * sub test names. > */ > if (test_suite__num_test_cases(t) > 1 && curr_test_case == 0) > - pr_info("%3d: %-*s:\n", curr_suite + 1, width, test_description(t, -1)); > + pr_info("%3d: %-*.*s:\n", curr_suite + 1, width, width, > + test_description(t, -1)); > > /* > * Busy loop reading from the child's stdout/stderr that are set to be > @@ -917,6 +968,8 @@ static int finish_tests_parallel(struct child_test **child_tests, size_t num_tes > int last_suite_printed = -1; > sigset_t set, oldset; > > + width = get_max_desc_width(width); > + > sigemptyset(&set); > sigaddset(&set, SIGINT); > sigaddset(&set, SIGTERM); > @@ -985,8 +1038,11 @@ static int finish_tests_parallel(struct child_test **child_tests, size_t num_tes > if (next_child) { > if (test_suite__num_test_cases(next_child->test) > 1 && > last_suite_printed != next_child->suite_num) { > - pr_info("%3d: %-*s:\n", next_child->suite_num + 1, width, > - test_description(next_child->test, -1)); > + pr_info("%3d: %-*.*s:\n", > + next_child->suite_num + 1, > + width, width, > + test_description( > + next_child->test, -1)); > last_suite_printed = next_child->suite_num; > } > print_test_result(next_child->test, next_child->suite_num, > @@ -1049,7 +1105,8 @@ static int finish_tests_parallel(struct child_test **child_tests, size_t num_tes > > if (test_suite__num_test_cases(child->test) > 1 && > last_suite_printed != child->suite_num) { > - pr_info("%3d: %-*s:\n", child->suite_num + 1, width, > + pr_info("%3d: %-*.*s:\n", child->suite_num + 1, > + width, width, > test_description(child->test, -1)); > last_suite_printed = child->suite_num; > } > @@ -1296,7 +1353,10 @@ static int __cmd_test(struct test_suite **suites, int argc, const char *argv[], > > if (intlist__find(skiplist, curr_suite + 1)) { > if (pass == 1) { > - pr_info("%3d: %-*s:", curr_suite + 1, width, > + int pad_width = get_max_desc_width(width); > + int term_width = get_term_width(); > + > + pr_info("%3d: %-*.*s:", curr_suite + 1, pad_width, term_width, > test_description(*t, -1)); > color_fprintf(stderr, PERF_COLOR_YELLOW, > " Skip (user override)\n"); > -- > 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v2 03/12] perf tests workloads: Support sub-second durations in noploop and thloop 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers 2026-06-16 6:13 ` [PATCH v2 01/12] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers 2026-06-16 6:13 ` [PATCH v2 02/12] perf test: Truncate test description to fit terminal width Ian Rogers @ 2026-06-16 6:13 ` Ian Rogers 2026-06-16 6:13 ` [PATCH v2 04/12] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers ` (9 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 6:13 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht Currently, the noploop and thloop workloads only support sleep durations in integer seconds because they parse the argument using atoi() and use alarm() for timer signaling. To support much shorter execution times in tests (speeding up test suites and allowing faster retries), change the input parsing to use atof() for double floating-point seconds. Use ualarm() for fractional durations less than 1.0 seconds, and fall back to alarm() for durations of 1.0 second or more. Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/workloads/noploop.c | 17 ++++++++++++++--- tools/perf/tests/workloads/thloop.c | 16 +++++++++++----- 2 files changed, 25 insertions(+), 8 deletions(-) diff --git a/tools/perf/tests/workloads/noploop.c b/tools/perf/tests/workloads/noploop.c index 656e472e6188..2a6636c020f6 100644 --- a/tools/perf/tests/workloads/noploop.c +++ b/tools/perf/tests/workloads/noploop.c @@ -15,15 +15,26 @@ static void sighandler(int sig __maybe_unused) static int noploop(int argc, const char **argv) { - int sec = 1; + double sec = 1.0; pthread_setname_np(pthread_self(), "perf-noploop"); if (argc > 0) - sec = atoi(argv[0]); + sec = atof(argv[0]); + + if (sec <= 0.0) { + fprintf(stderr, "Error: seconds (%f) must be > 0\n", sec); + return 1; + } signal(SIGINT, sighandler); signal(SIGALRM, sighandler); - alarm(sec); + + if (sec < 1.0) { + useconds_t usecs = (useconds_t)(sec * 1000000.0); + + ualarm(usecs > 0 ? usecs : 1, 0); + } else + alarm((unsigned int)sec); while (!done) continue; diff --git a/tools/perf/tests/workloads/thloop.c b/tools/perf/tests/workloads/thloop.c index bd8168f883fb..cf1ad6f92509 100644 --- a/tools/perf/tests/workloads/thloop.c +++ b/tools/perf/tests/workloads/thloop.c @@ -31,14 +31,15 @@ static void *thfunc(void *arg) static int thloop(int argc, const char **argv) { - int nt = 2, sec = 1, err = 1; + int nt = 2, err = 1; + double sec = 1.0; pthread_t *thread_list = NULL; if (argc > 0) - sec = atoi(argv[0]); + sec = atof(argv[0]); - if (sec <= 0) { - fprintf(stderr, "Error: seconds (%d) must be >= 1\n", sec); + if (sec <= 0.0) { + fprintf(stderr, "Error: seconds (%f) must be > 0\n", sec); return 1; } @@ -67,7 +68,12 @@ static int thloop(int argc, const char **argv) goto out; } } - alarm(sec); + if (sec < 1.0) { + useconds_t usecs = (useconds_t)(sec * 1000000.0); + + ualarm(usecs > 0 ? usecs : 1, 0); + } else + alarm((unsigned int)sec); test_loop(); err = 0; out: -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 04/12] perf tests: Add robust record retry helper and use subsecond workloads 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers ` (2 preceding siblings ...) 2026-06-16 6:13 ` [PATCH v2 03/12] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers @ 2026-06-16 6:13 ` Ian Rogers 2026-06-16 6:13 ` [PATCH v2 05/12] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers ` (8 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 6:13 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht Introduce `perf_record_with_retry` and `perf_record_cleanup` in a shared library `tests/shell/lib/perf_record.sh` to prevent record test failures caused by transient recording or workload delays. Update `record.sh`, `record_lbr.sh`, `pipe_test.sh`, `kvm.sh`, and `stat_all_pfm.sh` to use this robust record retry logic. These tests now start with very short durations (e.g. 0.01 seconds) and scale up if the initial recording failed to capture samples, significantly improving test execution speed on success while remaining resilient to slow systems. Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/kvm.sh | 61 +++++--- tools/perf/tests/shell/lib/perf_record.sh | 47 ++++++ tools/perf/tests/shell/pipe_test.sh | 4 +- tools/perf/tests/shell/record.sh | 172 +++++++++++----------- tools/perf/tests/shell/record_lbr.sh | 50 +++++-- tools/perf/tests/shell/stat_all_pfm.sh | 2 +- 6 files changed, 208 insertions(+), 128 deletions(-) create mode 100644 tools/perf/tests/shell/lib/perf_record.sh diff --git a/tools/perf/tests/shell/kvm.sh b/tools/perf/tests/shell/kvm.sh index f88e859025c4..255250ddba5c 100755 --- a/tools/perf/tests/shell/kvm.sh +++ b/tools/perf/tests/shell/kvm.sh @@ -39,17 +39,26 @@ skip() { test_kvm_stat() { echo "Testing perf kvm stat" - echo "Recording kvm events for pid ${qemu_pid}..." - if ! perf kvm stat record -p "${qemu_pid}" -o "${perfdata}" sleep 1; then - echo "Failed to record kvm events" - err=1 - return - fi + local duration + local success=false + for duration in 1 2 4 8; do + echo "Recording kvm events for pid ${qemu_pid} (duration ${duration}s)..." + rm -f "${perfdata}" "${perfdata}".old + if ! perf kvm stat record -p "${qemu_pid}" -o "${perfdata}" sleep ${duration} >/dev/null 2>&1; then + echo "perf kvm stat record failed, retrying..." + continue + fi - echo "Reporting kvm events..." - if ! perf kvm -i "${perfdata}" stat report 2>&1 | grep -q "VM-EXIT"; then + if [ -e "${perfdata}" ] && perf kvm -i "${perfdata}" stat report 2>&1 | grep -q "VM-EXIT"; then + success=true + break + fi + echo "No VM-EXIT events found, retrying..." + done + + if [ "$success" = false ]; then echo "Failed to find VM-EXIT in report" - perf kvm -i "${perfdata}" stat report 2>&1 + perf kvm -i "${perfdata}" stat report 2>&1 || true err=1 return fi @@ -60,22 +69,28 @@ test_kvm_stat() { test_kvm_record_report() { echo "Testing perf kvm record/report" - echo "Recording kvm profile for pid ${qemu_pid}..." - # Use --host to avoid needing guest symbols/mounts for this simple test - # We just want to verify the command runs and produces data - # We run in background and kill it because 'perf kvm record' appends options - # after the command, which breaks 'sleep' (e.g. it gets '-e cycles'). - perf kvm --host record -p "${qemu_pid}" -o "${perfdata}" & - rec_pid=$! - sleep 1 - kill -INT "${rec_pid}" - wait "${rec_pid}" || true + local duration + local success=false + for duration in 1 2 4 8; do + echo "Recording kvm profile for pid ${qemu_pid} (duration ${duration}s)..." + rm -f "${perfdata}" "${perfdata}".old + + perf kvm --host record -p "${qemu_pid}" -o "${perfdata}" & + local rec_pid=$! + sleep ${duration} + kill -INT "${rec_pid}" + wait "${rec_pid}" || true + + if [ -e "${perfdata}" ] && perf kvm -i "${perfdata}" report --stdio 2>&1 | grep -q "Event count"; then + success=true + break + fi + echo "No samples or report failed, retrying..." + done - echo "Reporting kvm profile..." - # Check for some standard output from report - if ! perf kvm -i "${perfdata}" report --stdio 2>&1 | grep -q "Event count"; then + if [ "$success" = false ]; then echo "Failed to report kvm profile" - perf kvm -i "${perfdata}" report --stdio 2>&1 + perf kvm -i "${perfdata}" report --stdio 2>&1 || true err=1 return fi diff --git a/tools/perf/tests/shell/lib/perf_record.sh b/tools/perf/tests/shell/lib/perf_record.sh new file mode 100644 index 000000000000..fe5721427e58 --- /dev/null +++ b/tools/perf/tests/shell/lib/perf_record.sh @@ -0,0 +1,47 @@ +# SPDX-License-Identifier: GPL-2.0 + +perf_record_with_retry() { + local perfdata="$1" + local check_cmd="$2" + local testprog_base="$3" + shift 3 + + local logfile + logfile=$(mktemp /tmp/__perf_record_retry.XXXXXX.log) + + # Save the e flag state and disable it + local save_e + if [[ $- == *e* ]]; then + save_e="set -e" + else + save_e="set +e" + fi + set +e + + local duration + local first_run=true + local ret=1 + for duration in 0.01 0.1 0.3 1.0 2.0; do + rm -f "${perfdata}".old + perf record "$@" -o "${perfdata}" ${testprog_base} ${duration} > "$logfile" 2>&1 + local record_exit=$? + + if [ "$first_run" = true ] && [ $record_exit -ne 0 ]; then + ret=2 + break + fi + first_run=false + + if [ -e "${perfdata}" ] && eval "${check_cmd}"; then + ret=0 + break + fi + done + + eval "$save_e" + return $ret +} + +perf_record_cleanup() { + rm -f /tmp/__perf_record_retry.*.log +} diff --git a/tools/perf/tests/shell/pipe_test.sh b/tools/perf/tests/shell/pipe_test.sh index e459aa99a951..ce68d850c983 100755 --- a/tools/perf/tests/shell/pipe_test.sh +++ b/tools/perf/tests/shell/pipe_test.sh @@ -12,8 +12,8 @@ skip_test_missing_symbol ${sym} data=$(mktemp /tmp/perf.data.XXXXXX) data2=$(mktemp /tmp/perf.data2.XXXXXX) -prog="perf test -w noploop" -[ "$(uname -m)" = "s390x" ] && prog="$prog 3" +prog="perf test -w noploop 0.1" +[ "$(uname -m)" = "s390x" ] && prog="perf test -w noploop 3" err=0 set -e diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh index 7cb81cf3444a..46eb449ad33a 100755 --- a/tools/perf/tests/shell/record.sh +++ b/tools/perf/tests/shell/record.sh @@ -1,10 +1,13 @@ #!/bin/bash -# perf record tests (exclusive) # SPDX-License-Identifier: GPL-2.0 +# perf record tests set -e shelldir=$(dirname "$0") +. "${shelldir}"/lib/perf_record.sh + + # shellcheck source=lib/waiting.sh . "${shelldir}"/lib/waiting.sh @@ -39,6 +42,7 @@ cleanup() { rm -f "${perfdata}" rm -f "${perfdata}".old rm -f "${script_output}" + perf_record_cleanup trap - EXIT TERM INT } @@ -50,22 +54,19 @@ trap_cleanup() { } trap trap_cleanup EXIT TERM INT +check_per_thread() { + perf report -i "${perfdata}" -q | grep -q "${testsym}" +} + test_per_thread() { echo "Basic --per-thread mode test" - if ! perf record -o /dev/null --quiet ${testprog} 2> /dev/null - then + local ret=0 + perf_record_with_retry "${perfdata}" "check_per_thread" "perf test -w thloop" --per-thread || ret=$? + if [ $ret -eq 2 ]; then echo "Per-thread record [Skipped event not supported]" return - fi - if ! perf record --per-thread -o "${perfdata}" ${testprog} 2> /dev/null - then - echo "Per-thread record [Failed record]" - err=1 - return - fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then - echo "Per-thread record [Failed missing output]" + elif [ $ret -eq 1 ]; then + echo "Per-thread record [Failed record or missing output]" err=1 return fi @@ -96,6 +97,10 @@ test_per_thread() { echo "Basic --per-thread mode test [Success]" } +check_register_capture() { + perf script -F ip,sym,iregs -i "${perfdata}" 2>/dev/null | grep -q "DI:" +} + test_register_capture() { echo "Register capture test" if ! perf list pmu | grep -q 'br_inst_retired.near_call' @@ -108,11 +113,12 @@ test_register_capture() { echo "Register capture test [Skipped missing registers]" return fi - if ! perf record -o - --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call \ - -c 1000 --per-thread ${testprog} 2> /dev/null \ - | perf script -F ip,sym,iregs -i - 2> /dev/null \ - | grep -q "DI:" - then + + local ret=0 + perf_record_with_retry "${perfdata}" "check_register_capture" "perf test -w thloop" \ + --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call -c 1000 --per-thread || ret=$? + + if [ $ret -ne 0 ]; then echo "Register capture test [Failed missing output]" err=1 return @@ -120,65 +126,65 @@ test_register_capture() { echo "Register capture test [Success]" } +check_system_wide() { + perf report -i "${perfdata}" -q | grep -q "${testsym}" +} + test_system_wide() { echo "Basic --system-wide mode test" - if ! perf record -aB --synth=no -o "${perfdata}" ${testprog} 2> /dev/null - then + local ret=0 + perf_record_with_retry "${perfdata}" "check_system_wide" "perf test -w thloop" -aB --synth=no || ret=$? + if [ $ret -eq 2 ]; then echo "System-wide record [Skipped not supported]" return - fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then + elif [ $ret -eq 1 ]; then echo "System-wide record [Failed missing output]" err=1 return fi - if ! perf record -aB --synth=no -e cpu-clock,cs --threads=cpu \ - -o "${perfdata}" ${testprog} 2> /dev/null - then - echo "System-wide record [Failed record --threads option]" - err=1 - return - fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then - echo "System-wide record [Failed --threads missing output]" + + ret=0 + perf_record_with_retry "${perfdata}" "check_system_wide" "perf test -w thloop" \ + -aB --synth=no -e cpu-clock,cs --threads=cpu || ret=$? + if [ $ret -ne 0 ]; then + echo "System-wide record [Failed record --threads option or missing output]" err=1 return fi echo "Basic --system-wide mode test [Success]" } +check_workload() { + perf report -i "${perfdata}" -q | grep -q "${testsym}" +} + test_workload() { echo "Basic target workload test" - if ! perf record -o "${perfdata}" ${testprog} 2> /dev/null - then - echo "Workload record [Failed record]" + local ret=0 + perf_record_with_retry "${perfdata}" "check_workload" "perf test -w thloop" || ret=$? + if [ $ret -ne 0 ]; then + echo "Workload record [Failed record or missing output]" err=1 return fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then - echo "Workload record [Failed missing output]" - err=1 - return - fi - if ! perf record -e cpu-clock,cs --threads=package \ - -o "${perfdata}" ${testprog} 2> /dev/null - then - echo "Workload record [Failed record --threads option]" - err=1 - return - fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then - echo "Workload record [Failed --threads missing output]" + + ret=0 + perf_record_with_retry "${perfdata}" "check_workload" "perf test -w thloop" \ + -e cpu-clock,cs --threads=package || ret=$? + if [ $ret -ne 0 ]; then + echo "Workload record [Failed record --threads option or missing output]" err=1 return fi echo "Basic target workload test [Success]" } +check_branch_counter() { + perf report -i "${perfdata}" -D -q 2>/dev/null | grep -q "$br_cntr_output" && \ + perf script -i "${perfdata}" -F +brstackinsn,+brcntr 2>/dev/null | \ + grep -q "$br_cntr_script_output" +} + test_branch_counter() { echo "Branch counter test" # Check if the branch counter feature is supported @@ -190,67 +196,61 @@ test_branch_counter() { return fi done - if ! perf record -o "${perfdata}" -e "{branches:p,instructions}" -j any,counter ${testprog} 2> /dev/null - then - echo "Branch counter record test [Failed record]" - err=1 - return - fi - if ! perf report -i "${perfdata}" -D -q | grep -q "$br_cntr_output" - then - echo "Branch counter report test [Failed missing output]" - err=1 - return - fi - if ! perf script -i "${perfdata}" -F +brstackinsn,+brcntr | grep -q "$br_cntr_script_output" - then - echo " Branch counter script test [Failed missing output]" + local ret=0 + perf_record_with_retry "${perfdata}" "check_branch_counter" "perf test -w thloop" \ + -e "{branches:p,instructions}" -j any,counter || ret=$? + if [ $ret -ne 0 ]; then + echo "Branch counter test [Failed record or missing output]" err=1 return fi echo "Branch counter test [Success]" } +check_cgroup() { + perf report -i "${perfdata}" -D 2>/dev/null | grep -q "CGROUP" && \ + perf script -i "${perfdata}" -F cgroup 2>/dev/null | grep -q -v "unknown" +} + test_cgroup() { echo "Cgroup sampling test" - if ! perf record -aB --synth=cgroup --all-cgroups -o "${perfdata}" ${testprog} 2> /dev/null - then + local ret=0 + perf_record_with_retry "${perfdata}" "check_cgroup" "perf test -w thloop" \ + -aB --synth=cgroup --all-cgroups || ret=$? + if [ $ret -eq 2 ]; then echo "Cgroup sampling [Skipped not supported]" return - fi - if ! perf report -i "${perfdata}" -D | grep -q "CGROUP" - then + elif [ $ret -eq 1 ]; then echo "Cgroup sampling [Failed missing output]" err=1 return fi - if ! perf script -i "${perfdata}" -F cgroup | grep -q -v "unknown" - then - echo "Cgroup sampling [Failed cannot resolve cgroup names]" - err=1 - return - fi echo "Cgroup sampling test [Success]" } +check_uid() { + perf report -i "${perfdata}" -q | grep -q "${testsym}" +} + test_uid() { echo "Uid sampling test" - if ! perf record -aB --synth=no --uid "$(id -u)" -o "${perfdata}" ${testprog} \ - > "${script_output}" 2>&1 - then - if grep -q "libbpf.*EPERM" "${script_output}" + local logfile + logfile="/tmp/__perf_record_retry.$(id -u).$BASHPID.log" + local ret=0 + perf_record_with_retry "${perfdata}" "check_uid" "perf test -w thloop" \ + -aB --synth=no --uid "$(id -u)" || ret=$? + if [ $ret -eq 2 ]; then + if grep -q -E "libbpf.*EPERM|Access to performance monitoring|Permission denied|Failure to open any events" \ + "$logfile" then echo "Uid sampling [Skipped permissions]" return else echo "Uid sampling [Failed to record]" err=1 - # cat "${script_output}" return fi - fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then + elif [ $ret -eq 1 ]; then echo "Uid sampling [Failed missing output]" err=1 return diff --git a/tools/perf/tests/shell/record_lbr.sh b/tools/perf/tests/shell/record_lbr.sh index 78a02e90ece1..8d51afeb437b 100755 --- a/tools/perf/tests/shell/record_lbr.sh +++ b/tools/perf/tests/shell/record_lbr.sh @@ -1,9 +1,12 @@ #!/bin/bash -# perf record LBR tests (exclusive) # SPDX-License-Identifier: GPL-2.0 +# perf record LBR tests set -e +shelldir=$(dirname "$0") +. "${shelldir}"/lib/perf_record.sh + ParanoidAndNotRoot() { [ "$(id -u)" != 0 ] && [ "$(cat /proc/sys/kernel/perf_event_paranoid)" -gt $1 ] } @@ -22,6 +25,7 @@ cleanup() { rm -rf "${perfdata}" rm -rf "${perfdata}".old rm -rf "${perfdata}".txt + perf_record_cleanup trap - EXIT TERM INT } @@ -34,22 +38,28 @@ trap_cleanup() { trap trap_cleanup EXIT TERM INT +check_lbr_callgraph() { + perf report --stitch-lbr -i "${perfdata}" > "${perfdata}".txt 2>&1 +} + lbr_callgraph_test() { test="LBR callgraph" echo "$test" - if ! perf record -e cycles --call-graph lbr -o "${perfdata}" perf test -w thloop - then + set +e + perf_record_with_retry "${perfdata}" "check_lbr_callgraph" "perf test -w thloop" \ + -e cycles --call-graph lbr + local ret=$? + set -e + + if [ $ret -eq 2 ]; then echo "$test [Failed support missing]" if [ $err -eq 0 ] then err=2 fi return - fi - - if ! perf report --stitch-lbr -i "${perfdata}" > "${perfdata}".txt - then + elif [ $ret -eq 1 ]; then cat "${perfdata}".txt echo "$test [Failed in perf report]" err=1 @@ -59,6 +69,12 @@ lbr_callgraph_test() { echo "$test [Success]" } +check_lbr_samples() { + local out + out=$(perf report -D -i "${perfdata}" 2> /dev/null | grep -A1 'PERF_RECORD_SAMPLE') + [ "$(echo "$out" | grep -c 'PERF_RECORD_SAMPLE' || true)" -gt 0 ] +} + lbr_test() { local branch_flags=$1 local test="LBR $2 test" @@ -70,25 +86,27 @@ lbr_test() { local r echo "$test" - if ! perf record -e cycles $branch_flags -o "${perfdata}" perf test -w thloop - then + set +e + perf_record_with_retry "${perfdata}" "check_lbr_samples" "perf test -w thloop" \ + -e cycles $branch_flags + local ret=$? + set -e + + if [ $ret -eq 2 ]; then echo "$test [Failed support missing]" - perf record -e cycles $branch_flags -o "${perfdata}" perf test -w thloop || true if [ $err -eq 0 ] then err=2 fi return - fi - - out=$(perf report -D -i "${perfdata}" 2> /dev/null | grep -A1 'PERF_RECORD_SAMPLE') - sam_nr=$(echo "$out" | grep -c 'PERF_RECORD_SAMPLE' || true) - if [ $sam_nr -eq 0 ] - then + elif [ $ret -eq 1 ]; then echo "$test [Failed no samples captured]" err=1 return fi + + out=$(perf report -D -i "${perfdata}" 2> /dev/null | grep -A1 'PERF_RECORD_SAMPLE') + sam_nr=$(echo "$out" | grep -c 'PERF_RECORD_SAMPLE' || true) echo "$test: $sam_nr samples" bs_nr=$(echo "$out" | grep -c 'branch stack: nr:' || true) diff --git a/tools/perf/tests/shell/stat_all_pfm.sh b/tools/perf/tests/shell/stat_all_pfm.sh index c08c186af2c4..ec262c6b1674 100755 --- a/tools/perf/tests/shell/stat_all_pfm.sh +++ b/tools/perf/tests/shell/stat_all_pfm.sh @@ -32,7 +32,7 @@ do then # We failed to see the event and it is supported. Possibly the workload was # too small so retry with something longer. - result=$(perf stat --pfm-events "$p" perf bench internals synthesize 2>&1) + result=$(perf stat --pfm-events "$p" perf test -w noploop 0.1 2>&1) x=$? if test "$x" -ne "0" then -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 05/12] perf tests: Skip metrics validation if system-wide recording lacks permission 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers ` (3 preceding siblings ...) 2026-06-16 6:13 ` [PATCH v2 04/12] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers @ 2026-06-16 6:13 ` Ian Rogers 2026-06-16 6:13 ` [PATCH v2 06/12] perf tests: Fix Python JIT dump profiling test failure Ian Rogers ` (7 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 6:13 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The metrics value validation test requires system-wide recording (`-a`), which can fail on systems without root permissions or where paranoid levels restrict tracing. Add a check to skip the test if `-a` is not supported. Also fix false negatives during validation by updating parse error string patterns and resolving issues in metric list generation. Fixes: 3ad7092f5145 ("perf test: Add metric value validation test") Signed-off-by: Ian Rogers <irogers@google.com> --- .../tests/shell/lib/perf_metric_validation.py | 11 ++- tools/perf/tests/shell/stat_all_metrics.sh | 75 ++++++++++++------- tools/perf/tests/shell/stat_metrics_values.sh | 7 ++ 3 files changed, 60 insertions(+), 33 deletions(-) diff --git a/tools/perf/tests/shell/lib/perf_metric_validation.py b/tools/perf/tests/shell/lib/perf_metric_validation.py index dea8ef1977bf..3d52f94f22b9 100644 --- a/tools/perf/tests/shell/lib/perf_metric_validation.py +++ b/tools/perf/tests/shell/lib/perf_metric_validation.py @@ -383,10 +383,13 @@ class Validator: wl = workload.split() command.extend(wl) print(" ".join(command)) - cmd = subprocess.run(command, stderr=subprocess.PIPE, encoding='utf-8') - data = [x+'}' for x in cmd.stderr.split('}\n') if x] - if data[0][0] != '{': - data[0] = data[0][data[0].find('{'):] + cmd = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, encoding='utf-8') + lines = cmd.stderr.splitlines() + cmd.stdout.splitlines() + data = [] + for line in lines: + line = line.strip() + if line.startswith('{') and line.endswith('}'): + data.append(line) return data def collect_perf(self, workload: str): diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh index b582d23f28c9..feeb34c6fa6d 100755 --- a/tools/perf/tests/shell/stat_all_metrics.sh +++ b/tools/perf/tests/shell/stat_all_metrics.sh @@ -12,38 +12,65 @@ system_wide_flag="-a" if ParanoidAndNotRoot 0 then system_wide_flag="" - test_prog="perf test -w noploop" + test_prog="perf test -w noploop 0.01" fi +check_metric() { + local output="$1" + local status="$2" + local metric="$3" + + if [[ $status -ne 0 || ! "$output" =~ ${metric:0:50} ]]; then + return 1 + fi + + if [[ "$output" =~ "<not counted>" || "$output" =~ "<not supported>" ]]; then + return 1 + fi + + return 0 +} + skip=0 err=3 for m in $(perf list --raw-dump metrics); do echo "Testing $m" result=$(perf stat -M "$m" $system_wide_flag -- $test_prog 2>&1) result_err=$? - if [[ $result_err -eq 0 && "$result" =~ ${m:0:50} ]] - then - # No error result and metric shown. + + if check_metric "$result" $result_err "$m"; then if [[ "$err" -ne 1 ]] then err=0 fi continue fi - if [[ "$result" =~ "Cannot resolve IDs for" || "$result" =~ "No supported events found" ]] + + if [[ "$result" =~ "Access to performance monitoring and observability operations is limited" || \ + "$result" =~ "in per-thread mode, enable system wide" || \ + "$result" =~ "<not supported>" || \ + "$result" =~ "Cannot resolve IDs for" || \ + "$result" =~ "No supported events found" || \ + "$result" =~ "FP_ARITH" || \ + "$result" =~ "AMX" || \ + "$result" =~ "PMM" ]] then - if [[ $(perf list --raw-dump $m) == "Default"* ]] - then - echo "[Ignored $m] failed but as a Default metric this can be expected" - echo $result + true + else + result=$(perf stat -M "$m" $system_wide_flag -- perf test -w noploop 0.1 2>&1) + result_err=$? + + if check_metric "$result" $result_err "$m"; then + if [[ "$err" -ne 1 ]] + then + err=0 + fi continue fi - echo "[Failed $m] Metric contains missing events" - echo $result - err=1 # Fail - continue - elif [[ "$result" =~ \ - "Access to performance monitoring and observability operations is limited" ]] + fi + + # If retry also failed, determine if we skip, ignore, or fail + if [[ "$result" =~ "Access to performance monitoring and observability operations is limited" ]] then echo "[Skipped $m] Permission failure" echo $result @@ -61,7 +88,9 @@ for m in $(perf list --raw-dump metrics); do skip=1 fi continue - elif [[ "$result" =~ "<not supported>" ]] + elif [[ "$result" =~ "<not supported>" || \ + "$result" =~ "Cannot resolve IDs for" || \ + "$result" =~ "No supported events found" ]] then if [[ $(perf list --raw-dump $m) == "Default"* ]] then @@ -105,19 +134,7 @@ for m in $(perf list --raw-dump metrics); do continue fi - # Failed, possibly the workload was too small so retry with something longer. - result=$(perf stat -M "$m" $system_wide_flag -- perf bench internals synthesize 2>&1) - result_err=$? - if [[ $result_err -eq 0 && "$result" =~ ${m:0:50} ]] - then - # No error result and metric shown. - if [[ "$err" -ne 1 ]] - then - err=0 - fi - continue - fi - echo "[Failed $m] has non-zero error '$result_err' or not printed in:" + echo "[Failed $m] has non-zero error '$result_err' or not printed/counted in:" echo "$result" err=1 done diff --git a/tools/perf/tests/shell/stat_metrics_values.sh b/tools/perf/tests/shell/stat_metrics_values.sh index 30566f0b5427..76f1e99d1273 100755 --- a/tools/perf/tests/shell/stat_metrics_values.sh +++ b/tools/perf/tests/shell/stat_metrics_values.sh @@ -8,6 +8,13 @@ shelldir=$(dirname "$0") grep -q GenuineIntel /proc/cpuinfo || { echo Skipping non-Intel; exit 2; } +# Skip if no permission to record system-wide events +if ! perf stat -a -e instructions sleep 0.01 >/dev/null 2>&1; then + echo "Skipping: no permission to record system-wide events (-a)" + exit 2 +fi + + pythonvalidator=$(dirname $0)/lib/perf_metric_validation.py rulefile=$(dirname $0)/lib/perf_metric_validation_rules.json tmpdir=$(mktemp -d /tmp/__perf_test.program.XXXXX) -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 06/12] perf tests: Fix Python JIT dump profiling test failure 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers ` (4 preceding siblings ...) 2026-06-16 6:13 ` [PATCH v2 05/12] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers @ 2026-06-16 6:13 ` Ian Rogers 2026-06-16 6:13 ` [PATCH v2 07/12] perf tests: Fix flakiness in trace record and replay test Ian Rogers ` (6 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 6:13 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The `python profiling with jitdump` test failed due to: 1. Target PID extraction resolving to duplicate space-separated values, which broke the buildid-cache loops. 2. The default workload duration being too short to capture JIT stack trampoline samples, resulting in 0 matching JIT symbols. Fix the PID parsing by sorting and retrieving a unique single-line value. Implement a robust retry loop starting at 1M python loop iterations and scaling up to 100M iterations until JIT symbols are successfully captured and verified. Fixes: c9cd0c7e529e ("perf test: Add python JIT dump test") Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/jitdump-python.sh | 76 ++++++++++++++++-------- 1 file changed, 51 insertions(+), 25 deletions(-) diff --git a/tools/perf/tests/shell/jitdump-python.sh b/tools/perf/tests/shell/jitdump-python.sh index ae86203b14a2..dd4a401cd245 100755 --- a/tools/perf/tests/shell/jitdump-python.sh +++ b/tools/perf/tests/shell/jitdump-python.sh @@ -20,7 +20,10 @@ PERF_DATA=$(mktemp /tmp/__perf_test.perf.data.XXXXXX) cleanup() { echo "Cleaning up files..." - rm -f ${PERF_DATA} ${PERF_DATA}.jit /tmp/jit-${PID}.dump /tmp/jitted-${PID}-*.so 2> /dev/null + rm -f ${PERF_DATA} ${PERF_DATA}.jit ${PERF_DATA}.pid 2> /dev/null + for p in ${ALL_PIDS}; do + rm -f /tmp/jit-${p}.dump /tmp/jitted-${p}-*.so 2> /dev/null + done trap - EXIT TERM INT } @@ -33,9 +36,16 @@ trap_cleanup() { trap trap_cleanup EXIT TERM INT -echo "Run python with -Xperf_jit" -cat <<EOF | perf record -k 1 -g --call-graph dwarf -o "${PERF_DATA}" \ - -- ${PYTHON} -Xperf_jit +ALL_PIDS="" +NUM=0 +for iterations in 1000000 10000000 50000000 100000000; do + echo "Running with $iterations iterations..." + rm -f "${PERF_DATA}.pid" + cat <<EOF | perf record -k 1 -g --call-graph dwarf -o "${PERF_DATA}" -- ${PYTHON} -Xperf_jit +import os +with open("${PERF_DATA}.pid", "w") as f: + f.write(str(os.getpid())) + def foo(n): result = 0 for _ in range(n): @@ -49,29 +59,45 @@ def baz(n): bar(n) if __name__ == "__main__": - baz(1000000) + baz($iterations) EOF -# extract PID of the target process from the data -_PID=$(perf report -i "${PERF_DATA}" -F pid -q -g none | cut -d: -f1 -s) -PID=$(echo -n $_PID) # remove newlines - -echo "Generate JIT-ed DSOs using perf inject" -DEBUGINFOD_URLS='' perf inject -i "${PERF_DATA}" -j -o "${PERF_DATA}.jit" - -echo "Add JIT-ed DSOs to the build-ID cache" -for F in /tmp/jitted-${PID}-*.so; do - perf buildid-cache -a "${F}" -done - -echo "Check the symbol containing the function/module name" -NUM=$(perf report -i "${PERF_DATA}.jit" -s sym | grep -cE 'py::(foo|bar|baz):<stdin>') - -echo "Found ${NUM} matching lines" - -echo "Remove JIT-ed DSOs from the build-ID cache" -for F in /tmp/jitted-${PID}-*.so; do - perf buildid-cache -r "${F}" + if [ -f "${PERF_DATA}.pid" ]; then + REAL_PID=$(cat "${PERF_DATA}.pid") + ALL_PIDS="${ALL_PIDS} ${REAL_PID}" + fi + + # extract PID of the target process from the data + PID=$(perf report -i "${PERF_DATA}" --stdio -F pid -q -g none | \ + cut -d: -f1 -s | sort -u | head -n 1 | tr -d ' ') + if [ -z "${PID}" ]; then + echo "Failed to get PID, retrying..." + continue + fi + ALL_PIDS="${ALL_PIDS} ${PID}" + + echo "Generate JIT-ed DSOs using perf inject" + DEBUGINFOD_URLS='' perf inject -i "${PERF_DATA}" -j -o "${PERF_DATA}.jit" + + echo "Add JIT-ed DSOs to the build-ID cache" + for F in /tmp/jitted-${PID}-*.so; do + perf buildid-cache -a "${F}" + done + + echo "Check the symbol containing the function/module name" + NUM=$(perf report -i "${PERF_DATA}.jit" -s sym --stdio | grep -cE 'py::(foo|bar|baz):<stdin>') + + echo "Remove JIT-ed DSOs from the build-ID cache" + for F in /tmp/jitted-${PID}-*.so; do + perf buildid-cache -r "${F}" + done + rm -f /tmp/jitted-${PID}-*.so /tmp/jit-${PID}.dump 2>/dev/null + + if [ "${NUM}" -gt 0 ]; then + echo "Success: found ${NUM} matching lines" + break + fi + echo "No matching lines found, retrying with more iterations..." done cleanup -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 07/12] perf tests: Fix flakiness in trace record and replay test 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers ` (5 preceding siblings ...) 2026-06-16 6:13 ` [PATCH v2 06/12] perf tests: Fix Python JIT dump profiling test failure Ian Rogers @ 2026-06-16 6:13 ` Ian Rogers 2026-06-16 6:14 ` [PATCH v2 08/12] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers ` (5 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 6:13 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The `perf trace record and replay` test fails intermittently on slow or virtualized hosts because the default recording workload (`sleep 1`) occasionally completes without scheduling the target `nanosleep` or `clock_nanosleep` system calls inside the recorded sample window, resulting in the error: `Failed: cannot find *nanosleep syscall`. Generalize the `perf_record_with_retry` helper in `tests/shell/lib/perf_record.sh` to support a custom record command prefix via the `PERF_RECORD_CMD` environment variable (defaulting to "perf record"). Update `trace_record_replay.sh` to use this robust retry loop running with `PERF_RECORD_CMD="perf trace record"` and a base workload of `sleep`. The test will automatically retry with scaled sleep durations (from 0.01s up to 2.0s) until the required `nanosleep` event is successfully captured. Fixes: 15bcfb96d0dd ("perf test: Add trace record and replay test") Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/lib/perf_record.sh | 7 ++++++- tools/perf/tests/shell/trace_record_replay.sh | 18 ++++++++++++++---- 2 files changed, 20 insertions(+), 5 deletions(-) diff --git a/tools/perf/tests/shell/lib/perf_record.sh b/tools/perf/tests/shell/lib/perf_record.sh index fe5721427e58..2c705840d554 100644 --- a/tools/perf/tests/shell/lib/perf_record.sh +++ b/tools/perf/tests/shell/lib/perf_record.sh @@ -21,9 +21,14 @@ perf_record_with_retry() { local duration local first_run=true local ret=1 + local cmd_prefix="perf record" + if [ -n "${PERF_RECORD_CMD}" ]; then + cmd_prefix="${PERF_RECORD_CMD}" + fi + for duration in 0.01 0.1 0.3 1.0 2.0; do rm -f "${perfdata}".old - perf record "$@" -o "${perfdata}" ${testprog_base} ${duration} > "$logfile" 2>&1 + ${cmd_prefix} "$@" -o "${perfdata}" ${testprog_base} ${duration} > "$logfile" 2>&1 local record_exit=$? if [ "$first_run" = true ] && [ $record_exit -ne 0 ]; then diff --git a/tools/perf/tests/shell/trace_record_replay.sh b/tools/perf/tests/shell/trace_record_replay.sh index 88d30a03dcec..f27e32b18697 100755 --- a/tools/perf/tests/shell/trace_record_replay.sh +++ b/tools/perf/tests/shell/trace_record_replay.sh @@ -6,16 +6,26 @@ # shellcheck source=lib/probe.sh . "$(dirname $0)"/lib/probe.sh +# shellcheck source=lib/perf_record.sh +. "$(dirname $0)"/lib/perf_record.sh skip_if_no_perf_trace || exit 2 [ "$(id -u)" = 0 ] || exit 2 file=$(mktemp /tmp/temporary_file.XXXXX) -perf trace record -o ${file} sleep 1 || exit 1 -if ! perf trace -i ${file} 2>&1 | grep nanosleep; then +check_nanosleep() { + perf trace -i "${file}" 2>&1 | grep -q nanosleep +} + +PERF_RECORD_CMD="perf trace record" perf_record_with_retry "${file}" "check_nanosleep" "sleep" +err=$? + +perf_record_cleanup +rm -f ${file} + +if [ $err -ne 0 ]; then echo "Failed: cannot find *nanosleep syscall" exit 1 fi - -rm -f ${file} +exit 0 -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 08/12] perf tests: Fix flakiness in BPF counters test on hybrid systems 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers ` (6 preceding siblings ...) 2026-06-16 6:13 ` [PATCH v2 07/12] perf tests: Fix flakiness in trace record and replay test Ian Rogers @ 2026-06-16 6:14 ` Ian Rogers 2026-06-16 6:14 ` [PATCH v2 09/12] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers ` (4 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 6:14 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The `perf stat --bpf-counters test` fails intermittently on hybrid architectures or systems with dynamic frequency scaling (DVFS). This happens because the test workload (`sqrtloop`) runs for a fixed 1-second duration, and the CPU frequency can scale dynamically between idle and maximum frequency. As the first run runs on a cold CPU and the second run runs on a warmed-up CPU (or vice versa), the number of instructions executed in 1 second differs by up to 2.2x, violating the comparison tolerance. Also, when running as root, BPF tracepoints and scheduling programs trigger frequently. Since standard `perf stat -e instructions` measures both user and kernel space instructions, it counts BPF helper and program execution overheads, whereas the BPF counters themselves do not self-measure. This introduces a large kernel-space instruction count discrepancy between standard and BPF counters. Fix these issues by: 1. Switching the workload to a strictly deterministic, iteration-based workload: `awk 'BEGIN { for (i=0; i<10000000; i++) sum+=i }'`. We pin the workload to a single random allowed CPU using `taskset -c $CPU` via a bash array. 2. Restricting the counted event to user-space only (`instructions:u` or `/u`). 3. Tightening the comparison tolerance from 20% to 15%. These modifications isolate the measurements to user-space instructions of the deterministic loop, which executes a virtually identical number of instructions on both runs (with less than 0.001% variation), eliminating Dynamic Frequency Scaling (DVFS), kernel scheduling noise, and BPF helper self-measurement overheads. Fixes: 2c0cb9f56020 ("perf test: Add a shell test for 'perf stat --bpf-counters' new option") Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/stat_bpf_counters.sh | 28 +++++++++++++-------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/tools/perf/tests/shell/stat_bpf_counters.sh b/tools/perf/tests/shell/stat_bpf_counters.sh index 35463358b273..11de77ee38ad 100755 --- a/tools/perf/tests/shell/stat_bpf_counters.sh +++ b/tools/perf/tests/shell/stat_bpf_counters.sh @@ -4,21 +4,26 @@ set -e -workload="perf test -w sqrtloop" +# Get the first allowed CPU +CPU=$(taskset -c -p $$ | awk -F': ' '{print $2}' | awk -F'[,-]' '{print $1}') +if [ -z "$CPU" ]; then + CPU=0 +fi +workload=(taskset -c "$CPU" awk 'BEGIN { for (i=0; i<10000000; i++) sum+=i }') -# check whether $2 is within +/- 20% of $1 +# check whether $2 is within +/- 15% of $1 compare_number() { first_num=$1 second_num=$2 - # upper bound is first_num * 120% - upper=$(expr $first_num + $first_num / 5 ) - # lower bound is first_num * 80% - lower=$(expr $first_num - $first_num / 5 ) + # upper bound is first_num * 115% + upper=$(expr $first_num + $first_num / 20 \* 3 ) + # lower bound is first_num * 85% + lower=$(expr $first_num - $first_num / 20 \* 3 ) if [ $second_num -gt $upper ] || [ $second_num -lt $lower ]; then - echo "The difference between $first_num and $second_num are greater than 20%." + echo "The difference between $first_num and $second_num are greater than 15%." exit 1 fi } @@ -41,11 +46,12 @@ check_counts() test_bpf_counters() { printf "Testing --bpf-counters " - base_instructions=$(perf stat --no-big-num -e instructions -- $workload 2>&1 | \ + base_instructions=$(perf stat --no-big-num -e instructions:u -- "${workload[@]}" 2>&1 | \ awk -v i=0 -v c=0 '/instructions/ { \ if ($1 != "<not") { i++; c += $1 } \ } END { if (i > 0) printf "%.0f", c; else print "<not" }') - bpf_instructions=$(perf stat --no-big-num --bpf-counters -e instructions -- $workload 2>&1 | \ + bpf_instructions=$(perf stat --no-big-num --bpf-counters -e instructions:u \ + -- "${workload[@]}" 2>&1 | \ awk -v i=0 -v c=0 '/instructions/ { \ if ($1 != "<not") { i++; c += $1 } \ } END { if (i > 0) printf "%.0f", c; else print "<not" }') @@ -57,7 +63,9 @@ test_bpf_counters() test_bpf_modifier() { printf "Testing bpf event modifier " - stat_output=$(perf stat --no-big-num -e instructions/name=base_instructions/,instructions/name=bpf_instructions/b -- $workload 2>&1) + stat_output=$(perf stat --no-big-num \ + -e instructions/name=base_instructions/u,instructions/name=bpf_instructions/bu \ + -- "${workload[@]}" 2>&1) base_instructions=$(echo "$stat_output"| \ awk -v i=0 -v c=0 '/base_instructions/ { \ if ($1 != "<not") { i++; c += $1 } \ -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 09/12] perf tests: Fix flakiness in branch stack sampling tests 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers ` (7 preceding siblings ...) 2026-06-16 6:14 ` [PATCH v2 08/12] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers @ 2026-06-16 6:14 ` Ian Rogers 2026-06-16 6:14 ` [PATCH v2 10/12] perf tests: Speed up off-cpu profiling tests Ian Rogers ` (3 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 6:14 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The branch stack sampling test (test 130) runs short iteration-based workloads to verify syscall, kernel, and trap branch stack sampling. Specifically, `test_syscall()` and `test_kernel_branches()` run `perf bench syscall basic` with loop counts of 8000 and 1000, and `test_trap_eret_branches()` runs `traploop` with 1000 iterations. Because these loop limits are extremely small, the total benchmark runtimes last only a few milliseconds (or less). Under high load, virtualization, or coarse sampling conditions, PMU cycle sampling fails to capture enough samples inside the brief benchmark loops. This leads to false negatives where the script output lacks the expected syscall, kernel, or trap branch entries (e.g. "ERROR: Branches missing getppid[^ ]*/SYSCALL/"). Fix this by increasing the workload loop counts to 100,000 across all three test sections. Running 100,000 loops still finishes virtually instantaneously (less than 0.1 seconds), but generates enough iterations to guarantee robust branch stack capture. Fixes: b55878c90ab9 ("perf test: Add test for branch stack sampling") Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/test_brstack.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/perf/tests/shell/test_brstack.sh b/tools/perf/tests/shell/test_brstack.sh index eb5837f82e39..be46a68daf76 100755 --- a/tools/perf/tests/shell/test_brstack.sh +++ b/tools/perf/tests/shell/test_brstack.sh @@ -112,7 +112,7 @@ test_trap_eret_branches() { start_err=$err err=0 perf record -o $TMPDIR/perf.data --branch-filter any,save_type,u,k -- \ - perf test -w traploop 1000 > "$TMPDIR/record.txt" 2>&1 + perf test -w traploop 100000 > "$TMPDIR/record.txt" 2>&1 perf script -i $TMPDIR/perf.data --fields brstacksym | \ tr ' ' '\n' > $TMPDIR/perf.script @@ -137,7 +137,7 @@ test_kernel_branches() { start_err=$err err=0 perf record -o $TMPDIR/perf.data --branch-filter any,k -- \ - perf bench syscall basic --loop 1000 > "$TMPDIR/record.txt" 2>&1 + perf bench syscall basic --loop 100000 > "$TMPDIR/record.txt" 2>&1 perf script -i $TMPDIR/perf.data --fields brstack | \ tr ' ' '\n' > $TMPDIR/perf.script @@ -209,7 +209,7 @@ test_syscall() { err=0 perf record -o $TMPDIR/perf.data --branch-filter \ any_call,save_type,u,k -c 10007 -- \ - perf bench syscall basic --loop 8000 > "$TMPDIR/record.txt" 2>&1 + perf bench syscall basic --loop 100000 > "$TMPDIR/record.txt" 2>&1 perf script -i $TMPDIR/perf.data --fields brstacksym | \ tr ' ' '\n' > $TMPDIR/perf.script -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 10/12] perf tests: Speed up off-cpu profiling tests 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers ` (8 preceding siblings ...) 2026-06-16 6:14 ` [PATCH v2 09/12] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers @ 2026-06-16 6:14 ` Ian Rogers 2026-06-16 6:14 ` [PATCH v2 11/12] perf tests: Speed up lock contention analysis shell test Ian Rogers ` (2 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 6:14 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The off-cpu profiling test suite runs multiple recording commands with a default workload of `sleep 1` to test the off-cpu threshold configurations (specifically, above 999ms and below 1200ms). This adds a mandatory 3.0 seconds of sleep overhead. Optimize this by scaling down the thresholds and workload durations by a factor of 10: - Use `sleep 0.1` as the workload duration. - Change the above-threshold test to use `--off-cpu-thresh 99` and `sleep 0.1`. - Change the below-threshold test to use `--off-cpu-thresh 120` and `sleep 0.1`. - Update the awk period check in the above-threshold test to look for a period greater than 99,000,000 ns (99ms) instead of 999,000,000 ns (999ms). This reduces raw test sleep overhead from 3.0s down to 0.3s, yielding a ~2.7 second speedup for this test. Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/record_offcpu.sh | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/perf/tests/shell/record_offcpu.sh b/tools/perf/tests/shell/record_offcpu.sh index 860a2d6f4b75..8ee2112b7778 100755 --- a/tools/perf/tests/shell/record_offcpu.sh +++ b/tools/perf/tests/shell/record_offcpu.sh @@ -45,7 +45,7 @@ test_offcpu_priv() { test_offcpu_basic() { echo "Basic off-cpu test" - if ! perf record --off-cpu -e dummy -o ${perfdata} sleep 1 2> /dev/null + if ! perf record --off-cpu -e dummy -o ${perfdata} sleep 0.1 2> /dev/null then echo "Basic off-cpu test [Failed record]" err=1 @@ -98,8 +98,8 @@ test_offcpu_child() { test_offcpu_above_thresh() { echo "${test_above_thresh}" - # collect direct off-cpu samples for tasks blocked for more than 999ms - if ! perf record -e dummy --off-cpu --off-cpu-thresh 999 -o ${perfdata} -- sleep 1 2> /dev/null + # collect direct off-cpu samples for tasks blocked for more than 50ms + if ! perf record -e dummy --off-cpu --off-cpu-thresh 50 -o ${perfdata} -- sleep 0.1 2> /dev/null then echo "${test_above_thresh} [Failed record]" err=1 @@ -115,7 +115,7 @@ test_offcpu_above_thresh() { fi # there should only be one direct sample, and its period should be higher than off-cpu-thresh if ! perf script --time "0, ${dummy_timestamp}" -i ${perfdata} -F period | \ - awk '{ if (int($1) > 999000000) exit 0; else exit 1; }' + awk '{ if (int($1) > 50000000) exit 0; else exit 1; }' then echo "${test_above_thresh} [Failed off-cpu time too short]" err=1 @@ -128,8 +128,8 @@ test_offcpu_above_thresh() { test_offcpu_below_thresh() { echo "${test_below_thresh}" - # collect direct off-cpu samples for tasks blocked for more than 1.2s - if ! perf record -e dummy --off-cpu --off-cpu-thresh 1200 -o ${perfdata} -- sleep 1 2> /dev/null + # collect direct off-cpu samples for tasks blocked for more than 500ms + if ! perf record -e dummy --off-cpu --off-cpu-thresh 500 -o ${perfdata} -- sleep 0.1 2> /dev/null then echo "${test_below_thresh} [Failed record]" err=1 -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 11/12] perf tests: Speed up lock contention analysis shell test 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers ` (9 preceding siblings ...) 2026-06-16 6:14 ` [PATCH v2 10/12] perf tests: Speed up off-cpu profiling tests Ian Rogers @ 2026-06-16 6:14 ` Ian Rogers 2026-06-16 6:14 ` [PATCH v2 12/12] perf tests: Speed up metrics checking shell tests Ian Rogers 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 6:14 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The lock contention analysis test suite (`lock_contention.sh`) performs a series of 13 separate profiling checks to verify various aggregation and filtering parameters of `perf lock contention`. Each of these checks runs the `perf bench sched messaging` messaging benchmark as its workload. By default, `sched messaging` runs 10 groups of 40 processes (400 processes total) generating substantial task scheduling, context switching, and IPC message passing. When traced system-wide for lock events, the tracing overhead (handling millions of lock acquisitions and releases) slows execution down significantly, causing the test suite to take over 80 seconds. Optimize this by introducing a scaled-down messaging benchmark workload: `perf bench sched messaging -g 1 -p`. Running 1 group (40 processes) takes only 0.01 seconds natively (instead of 0.08 seconds), drastically reduces the sheer volume of lock acquire/release trace events, and reduces CPU context switching during tracing while still generating sufficient lock events to fully exercise the BPF/record filters. Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/lock_contention.sh | 30 +++++++++++++---------- 1 file changed, 17 insertions(+), 13 deletions(-) diff --git a/tools/perf/tests/shell/lock_contention.sh b/tools/perf/tests/shell/lock_contention.sh index 52e8b9db9fbd..f589d387f6b3 100755 --- a/tools/perf/tests/shell/lock_contention.sh +++ b/tools/perf/tests/shell/lock_contention.sh @@ -9,6 +9,10 @@ perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX) result=$(mktemp /tmp/__perf_test.result.XXXXX) errout=$(mktemp /tmp/__perf_test.errout.XXXXX) +# Workload to generate lock contention. +# Using 1 group (-g 1) keeps runtime low while generating sufficient lock events. +msg_workload="perf bench sched messaging -g 1 -p" + cleanup() { rm -f ${perfdata} rm -f ${result} @@ -50,7 +54,7 @@ check() { test_record() { echo "Testing perf lock record and perf lock contention" - perf lock record -o ${perfdata} -- perf bench sched messaging -p > /dev/null 2>&1 + perf lock record -o ${perfdata} -- ${msg_workload} > /dev/null 2>&1 # the output goes to the stderr and we expect only 1 output (-E 1) perf lock contention -i ${perfdata} -E 1 -q 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then @@ -70,7 +74,7 @@ test_bpf() fi # the perf lock contention output goes to the stderr - perf lock con -a -b -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)" err=1 @@ -81,7 +85,7 @@ test_bpf() test_record_concurrent() { echo "Testing perf lock record and perf lock contention at the same time" - perf lock record -o- -- perf bench sched messaging -p 2> ${errout} | \ + perf lock record -o- -- ${msg_workload} 2> ${errout} | \ perf lock contention -i- -E 1 -q 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] Recorded result count is not 1:" "$(cat "${result}" | wc -l)" @@ -107,7 +111,7 @@ test_aggr_task() fi # the perf lock contention output goes to the stderr - perf lock con -a -b -t -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -t -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)" err=1 @@ -130,7 +134,7 @@ test_aggr_addr() fi # the perf lock contention output goes to the stderr - perf lock con -a -b -l -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -l -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)" err=1 @@ -148,7 +152,7 @@ test_aggr_cgroup() fi # the perf lock contention output goes to the stderr - perf lock con -a -b --lock-cgroup -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b --lock-cgroup -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)" err=1 @@ -170,7 +174,7 @@ test_type_filter() return fi - perf lock con -a -b -Y spinlock -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -Y spinlock -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(grep -c -v spinlock "${result}")" != "0" ]; then echo "[Fail] BPF result should not have non-spinlocks:" "$(cat "${result}")" err=1 @@ -202,7 +206,7 @@ test_lock_filter() return fi - perf lock con -a -b -L tasklist_lock -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -L tasklist_lock -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(grep -c -v "${test_lock_filter_type}" "${result}")" != "0" ]; then echo "[Fail] BPF result should not have non-${test_lock_filter_type} locks:" "$(cat "${result}")" err=1 @@ -241,7 +245,7 @@ test_stack_filter() return fi - perf lock con -a -b -S unix_stream -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -S unix_stream -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result should have a lock from unix_stream:" "$(cat "${result}")" err=1 @@ -269,7 +273,7 @@ test_aggr_task_stack_filter() return fi - perf lock con -a -b -t -S unix_stream -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -t -S unix_stream -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result should have a task from unix_stream:" "$(cat "${result}")" err=1 @@ -285,7 +289,7 @@ test_cgroup_filter() return fi - perf lock con -a -b --lock-cgroup -E 1 -F wait_total -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b --lock-cgroup -E 1 -F wait_total -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result should have a cgroup result:" "$(cat "${result}")" err=1 @@ -293,7 +297,7 @@ test_cgroup_filter() fi cgroup=$(cat "${result}" | awk '{ print $3 }') - perf lock con -a -b --lock-cgroup -E 1 -G "${cgroup}" -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b --lock-cgroup -E 1 -G "${cgroup}" -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result should have a result with cgroup filter:" "$(cat "${cgroup}")" err=1 @@ -328,7 +332,7 @@ test_csv_output() fi # the perf lock contention output goes to the stderr - perf lock con -a -b -E 1 -x , --output ${result} -- perf bench sched messaging -p > /dev/null 2>&1 + perf lock con -a -b -E 1 -x , --output ${result} -- ${msg_workload} > /dev/null 2>&1 output=$(grep -v "^#" ${result} | tr -d -c , | wc -c) if [ "${header}" != "${output}" ]; then echo "[Fail] BPF result does not match the number of commas: ${header} != ${output}" -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v2 12/12] perf tests: Speed up metrics checking shell tests 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers ` (10 preceding siblings ...) 2026-06-16 6:14 ` [PATCH v2 11/12] perf tests: Speed up lock contention analysis shell test Ian Rogers @ 2026-06-16 6:14 ` Ian Rogers 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 6:14 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht Optimize the execution of the metric validation and metric listing shell test suites: 1. `stat_metrics_values.sh`: The Python metric validator runs the `perf bench futex hash` workload for each validated metric relationship. Reduce the benchmark runtime limit from `-r 2` (2 seconds) to `-r 1` (1 second). This cuts the workload duration in half while still generating sufficient PMU events to satisfy non-zero threshold metric validations. 2. `stat_all_metrics.sh`: The metric checking test runs `perf stat` sequentially across all 433+ listed metrics. Change the default workload for system-wide runs from `sleep 0.01` to `true`. This avoids the 10ms sleep delay on each sequential metric invocation, saving over 4 seconds of total wall time during full test suite runs. Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/stat_all_metrics.sh | 2 +- tools/perf/tests/shell/stat_metrics_values.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh index feeb34c6fa6d..df5fa27b21ae 100755 --- a/tools/perf/tests/shell/stat_all_metrics.sh +++ b/tools/perf/tests/shell/stat_all_metrics.sh @@ -7,7 +7,7 @@ ParanoidAndNotRoot() [ "$(id -u)" != 0 ] && [ "$(cat /proc/sys/kernel/perf_event_paranoid)" -gt $1 ] } -test_prog="sleep 0.01" +test_prog="true" system_wide_flag="-a" if ParanoidAndNotRoot 0 then diff --git a/tools/perf/tests/shell/stat_metrics_values.sh b/tools/perf/tests/shell/stat_metrics_values.sh index 76f1e99d1273..86c5c7f70933 100755 --- a/tools/perf/tests/shell/stat_metrics_values.sh +++ b/tools/perf/tests/shell/stat_metrics_values.sh @@ -18,7 +18,7 @@ fi pythonvalidator=$(dirname $0)/lib/perf_metric_validation.py rulefile=$(dirname $0)/lib/perf_metric_validation_rules.json tmpdir=$(mktemp -d /tmp/__perf_test.program.XXXXX) -workload="perf bench futex hash -r 2 -s" +workload="perf bench futex hash -r 1 -s" # Add -debug, save data file and full rule file echo "Launch python validation script $pythonvalidator" -- 2.54.0.1136.gdb2ca164c4-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v3 00/13] perf tests: Robustness and performance improvements 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers ` (11 preceding siblings ...) 2026-06-16 6:14 ` [PATCH v2 12/12] perf tests: Speed up metrics checking shell tests Ian Rogers @ 2026-06-16 16:48 ` Ian Rogers 2026-06-16 16:48 ` [PATCH v3 01/13] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers ` (12 more replies) 12 siblings, 13 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 16:48 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht This patch series introduces several robustness and performance improvements to the perf test suite, alongside fixes for long-standing flakiness. Key changes across the series: - Introduces robust retry logic (`perf_record_with_retry`) to prevent spurious test failures due to transient `EPERM` issues during system-wide background profiling, safely extracting error messages from the PERF_RECORD_LOGS array. - Significantly speeds up test execution across multiple scripts (such as kvm, record, trace, off-cpu, and lock contention) by supporting sub-second durations in the `noploop` and `thloop` workloads, reducing unnecessary wait times. - Fixes flakiness in the BPF counters hybrid test by parsing `taskset` to determine valid CPUs rather than relying on `shuf`, which can fail under cgroups or missing binaries. - Fixes JIT dump file leaks and TOCTOU vulnerabilities by securely staging with `mktemp -d` to guarantee cleanup. - Decouples format alignments in `builtin-test.c` to gracefully truncate test descriptions without losing alignment or failing under non-TTY environments. - Restricts uncore PMU bypass in event parsing strictly to `--cputype` to prevent unintended metric evaluation impacts. - Injects trace output into JUnit XML `<skipped>` blocks to aid debugging. Changes in v3: - Fix line length warnings in checkpatch.pl. - Fix wrap commit descriptions in checkpatch.pl. - Re-aligned parse_events_state struct initializations. - Added explanatory comment for the 10000 magic number in get_term_width(). - Added inline comments (e.g. /*cputype_filter=*/false) to boolean literal arguments. - Fixed patch 4 to securely read from PERF_RECORD_LOGS array instead of unsafe wildcard cleanup. - Fixed patch 6 TOCTOU vulnerability via mktemp -d staging directory. - Implemented patch 13 to inject trace output into JUnit XML <skipped> blocks for debugging. - Add Assisted-by tags to all commits. Changes in v2: - Drop the sleep 0 patch from v1. - Introduce 'perf_record_with_retry' helper to encapsulate the retry logic and cleanly output error messages (patch 4). - Fix issue with `perf record` lacking sufficient permissions on some setups (patch 5). - Improve subsecond duration support in noploop/thloop (patch 3). - Reduce test durations to avoid excessive overall runtimes. Ian Rogers (13): perf parse-events: Restrict core PMU bypass to --cputype option perf test: Truncate test description to fit terminal width perf tests workloads: Support sub-second durations in noploop and thloop perf tests: Add robust record retry helper and use subsecond workloads perf tests: Skip metrics validation if system-wide recording lacks permission perf tests: Fix Python JIT dump profiling test failure perf tests: Fix flakiness in trace record and replay test perf tests: Fix flakiness in BPF counters test on hybrid systems perf tests: Fix flakiness in branch stack sampling tests perf tests: Speed up off-cpu profiling tests perf tests: Speed up lock contention analysis shell test perf tests: Speed up metrics checking shell tests perf tests: Include error output for skipped tests in JUnit XML tools/perf/builtin-script.c | 1 + tools/perf/builtin-stat.c | 20 +- tools/perf/tests/builtin-test.c | 140 ++++++++++---- tools/perf/tests/expand-cgroup.c | 3 +- tools/perf/tests/parse-events.c | 11 +- tools/perf/tests/parse-metric.c | 3 +- tools/perf/tests/pmu-events.c | 10 +- tools/perf/tests/shell/jitdump-python.sh | 79 +++++--- tools/perf/tests/shell/kvm.sh | 64 ++++--- .../tests/shell/lib/perf_metric_validation.py | 11 +- tools/perf/tests/shell/lib/perf_record.sh | 58 ++++++ tools/perf/tests/shell/lock_contention.sh | 32 ++-- tools/perf/tests/shell/pipe_test.sh | 4 +- tools/perf/tests/shell/record.sh | 173 +++++++++--------- tools/perf/tests/shell/record_lbr.sh | 50 +++-- tools/perf/tests/shell/record_offcpu.sh | 12 +- tools/perf/tests/shell/stat_all_metrics.sh | 77 +++++--- tools/perf/tests/shell/stat_all_pfm.sh | 2 +- tools/perf/tests/shell/stat_bpf_counters.sh | 28 ++- tools/perf/tests/shell/stat_metrics_values.sh | 9 +- tools/perf/tests/shell/test_brstack.sh | 6 +- tools/perf/tests/shell/trace_record_replay.sh | 18 +- tools/perf/tests/workloads/noploop.c | 17 +- tools/perf/tests/workloads/thloop.c | 16 +- tools/perf/util/metricgroup.c | 26 ++- tools/perf/util/metricgroup.h | 4 +- tools/perf/util/parse-events.c | 30 +-- tools/perf/util/parse-events.h | 17 +- tools/perf/util/python.c | 3 +- 29 files changed, 615 insertions(+), 309 deletions(-) create mode 100644 tools/perf/tests/shell/lib/perf_record.sh -- 2.54.0.1189.g8c84645362-goog ^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH v3 01/13] perf parse-events: Restrict core PMU bypass to --cputype option 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers @ 2026-06-16 16:48 ` Ian Rogers 2026-06-16 16:48 ` [PATCH v3 02/13] perf test: Truncate test description to fit terminal width Ian Rogers ` (11 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 16:48 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht Commit b1c5efbfd92e ("perf parse-events: Remove hard coded legacy hardware and cache parsing") introduced a bypass to PMU filtering to prevent uncore PMUs from being filtered out during event parsing, which was required for resolving `duration_time` and `uncore_freq` when running with `--cputype`. However, this bypass was active whenever `pmu_filter` was set, which also incorrectly bypassed filtering for the `--pmu-filter` option. Introduce a `cputype_filter` boolean flag in `parse_events_state` and `parse_events_option_args` to distinguish filtering initiated by `--cputype` from that initiated by `--pmu-filter`. Restrict the core-only check in `parse_events__filter_pmu()` to when `cputype_filter` is true. Fixes: b1c5efbfd92e ("perf parse-events: Remove hard coded legacy hardware and cache parsing") Assisted-by: Antigravity:gemini-3.1-pro Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/builtin-script.c | 1 + tools/perf/builtin-stat.c | 20 +++++++++++++++----- tools/perf/tests/expand-cgroup.c | 3 ++- tools/perf/tests/parse-events.c | 11 +++++++---- tools/perf/tests/parse-metric.c | 3 ++- tools/perf/tests/pmu-events.c | 10 +++++++--- tools/perf/util/metricgroup.c | 26 ++++++++++++++++++-------- tools/perf/util/metricgroup.h | 4 +++- tools/perf/util/parse-events.c | 30 ++++++++++++++++++------------ tools/perf/util/parse-events.h | 17 +++++++++++------ tools/perf/util/python.c | 3 ++- 11 files changed, 86 insertions(+), 42 deletions(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 9ac29bdc3cd5..6bd4d8627704 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -2174,6 +2174,7 @@ static int script_find_metrics(const struct pmu_metric *pm, struct evsel *metric_evsel; int ret = metricgroup__parse_groups(metric_evlist, /*pmu=*/"all", + /*cputype_filter=*/false, pm->metric_name, /*metric_no_group=*/false, /*metric_no_merge=*/false, diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index a04466ea3b0a..5e73e4b128a1 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -1210,6 +1210,7 @@ static int parse_cputype(const struct option *opt, return -1; } parse_events_option_args.pmu_filter = pmu->name; + parse_events_option_args.cputype_filter = true; return 0; } @@ -1226,6 +1227,7 @@ static int parse_pmu_filter(const struct option *opt, } parse_events_option_args.pmu_filter = str; + parse_events_option_args.cputype_filter = false; return 0; } @@ -1999,7 +2001,9 @@ static int add_default_events(void) ret = -1; goto out; } - ret = metricgroup__parse_groups(evlist, pmu, "transaction", + ret = metricgroup__parse_groups(evlist, pmu, + parse_events_option_args.cputype_filter, + "transaction", stat_config.metric_no_group, stat_config.metric_no_merge, stat_config.metric_no_threshold, @@ -2036,7 +2040,9 @@ static int add_default_events(void) if (!force_metric_only) stat_config.metric_only = true; - ret = metricgroup__parse_groups(evlist, pmu, "smi", + ret = metricgroup__parse_groups(evlist, pmu, + parse_events_option_args.cputype_filter, + "smi", stat_config.metric_no_group, stat_config.metric_no_merge, stat_config.metric_no_threshold, @@ -2073,7 +2079,7 @@ static int add_default_events(void) } str[8] = stat_config.topdown_level + '0'; if (metricgroup__parse_groups(evlist, - pmu, str, + pmu, parse_events_option_args.cputype_filter, str, /*metric_no_group=*/false, /*metric_no_merge=*/false, /*metric_no_threshold=*/true, @@ -2112,7 +2118,9 @@ static int add_default_events(void) ret = -ENOMEM; break; } - if (metricgroup__parse_groups(metric_evlist, pmu, default_metricgroup_names[i], + if (metricgroup__parse_groups(metric_evlist, pmu, + parse_events_option_args.cputype_filter, + default_metricgroup_names[i], /*metric_no_group=*/false, /*metric_no_merge=*/false, /*metric_no_threshold=*/true, @@ -2848,7 +2856,9 @@ int cmd_stat(int argc, const char **argv) */ if (metrics) { const char *pmu = parse_events_option_args.pmu_filter ?: "all"; - int ret = metricgroup__parse_groups(evsel_list, pmu, metrics, + int ret = metricgroup__parse_groups(evsel_list, pmu, + parse_events_option_args.cputype_filter, + metrics, stat_config.metric_no_group, stat_config.metric_no_merge, stat_config.metric_no_threshold, diff --git a/tools/perf/tests/expand-cgroup.c b/tools/perf/tests/expand-cgroup.c index dd547f2f77cc..19bc273c2984 100644 --- a/tools/perf/tests/expand-cgroup.c +++ b/tools/perf/tests/expand-cgroup.c @@ -179,7 +179,8 @@ static int expand_metric_events(void) TEST_ASSERT_VAL("failed to get evlist", evlist); pme_test = find_core_metrics_table("testarch", "testcpu"); - ret = metricgroup__parse_groups_test(evlist, pme_test, metric_str); + ret = metricgroup__parse_groups_test(evlist, pme_test, metric_str, + /*cputype_filter=*/false); if (ret < 0) { pr_debug("failed to parse '%s' metric\n", metric_str); goto out; diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c index 05c3e899b425..2cbe81b9c886 100644 --- a/tools/perf/tests/parse-events.c +++ b/tools/perf/tests/parse-events.c @@ -2556,8 +2556,10 @@ static int test_event(const struct evlist_test *e) return TEST_FAIL; } parse_events_error__init(&err); - ret = __parse_events(evlist, e->name, /*pmu_filter=*/NULL, &err, /*fake_pmu=*/false, - /*warn_if_reordered=*/true, /*fake_tp=*/true); + ret = __parse_events(evlist, e->name, /*pmu_filter=*/NULL, + /*cputype_filter=*/false, &err, /*fake_pmu=*/false, + /*warn_if_reordered=*/true, + /*fake_tp=*/true); if (ret) { pr_debug("failed to parse event '%s', err %d\n", e->name, ret); parse_events_error__print(&err, e->name); @@ -2584,8 +2586,9 @@ static int test_event_fake_pmu(const char *str) return -ENOMEM; parse_events_error__init(&err); - ret = __parse_events(evlist, str, /*pmu_filter=*/NULL, &err, - /*fake_pmu=*/true, /*warn_if_reordered=*/true, + ret = __parse_events(evlist, str, /*pmu_filter=*/NULL, + /*cputype_filter=*/false, &err, /*fake_pmu=*/true, + /*warn_if_reordered=*/true, /*fake_tp=*/true); if (ret) { pr_debug("failed to parse event '%s', err %d\n", diff --git a/tools/perf/tests/parse-metric.c b/tools/perf/tests/parse-metric.c index 7c7f489a5eb0..eb456dd8cd9a 100644 --- a/tools/perf/tests/parse-metric.c +++ b/tools/perf/tests/parse-metric.c @@ -92,7 +92,8 @@ static int __compute_metric(const char *name, struct value *vals, /* Parse the metric into metric_events list. */ pme_test = find_core_metrics_table("testarch", "testcpu"); - err = metricgroup__parse_groups_test(evlist, pme_test, name); + err = metricgroup__parse_groups_test(evlist, pme_test, name, + /*cputype_filter=*/false); if (err) goto out; diff --git a/tools/perf/tests/pmu-events.c b/tools/perf/tests/pmu-events.c index fd5630f0a13c..4febba3ab495 100644 --- a/tools/perf/tests/pmu-events.c +++ b/tools/perf/tests/pmu-events.c @@ -794,8 +794,10 @@ static int check_parse_id(const char *id, struct parse_events_error *error) for (cur = strchr(dup, '@') ; cur; cur = strchr(++cur, '@')) *cur = '/'; - ret = __parse_events(evlist, dup, /*pmu_filter=*/NULL, error, /*fake_pmu=*/true, - /*warn_if_reordered=*/true, /*fake_tp=*/false); + ret = __parse_events(evlist, dup, /*pmu_filter=*/NULL, + /*cputype_filter=*/false, error, /*fake_pmu=*/true, + /*warn_if_reordered=*/true, + /*fake_tp=*/false); free(dup); evlist__delete(evlist); @@ -871,7 +873,9 @@ static int test__parsing_callback(const struct pmu_metric *pm, perf_evlist__set_maps(&evlist->core, cpus, NULL); - err = metricgroup__parse_groups_test(evlist, table, pm->metric_name); + err = metricgroup__parse_groups_test(evlist, table, + pm->metric_name, + /*cputype_filter=*/false); if (err) { if (is_expected_broken_metric(pm)) { (*failures)--; diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index c2ce3e53aaee..7584b7a32853 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -1262,7 +1262,8 @@ static int parse_ids(bool metric_no_merge, bool fake_pmu, struct expr_parse_ctx *ids, const char *modifier, bool group_events, const bool tool_events[TOOL_PMU__EVENT_MAX], struct evlist **out_evlist, - const char *filter_pmu) + const char *filter_pmu, + bool cputype_filter) { struct parse_events_error parse_error; struct evlist *parsed_evlist; @@ -1317,7 +1318,9 @@ static int parse_ids(bool metric_no_merge, bool fake_pmu, pr_debug("Parsing metric events '%s'\n", events.buf); parse_events_error__init(&parse_error); ret = __parse_events(parsed_evlist, events.buf, filter_pmu, - &parse_error, fake_pmu, /*warn_if_reordered=*/false, + cputype_filter, + &parse_error, fake_pmu, + /*warn_if_reordered=*/false, /*fake_tp=*/false); if (ret) { parse_events_error__print(&parse_error, events.buf); @@ -1382,7 +1385,7 @@ static struct evsel *pick_display_evsel(struct list_head *metric_list, } static int parse_groups(struct evlist *perf_evlist, - const char *pmu, const char *str, + const char *pmu, bool cputype_filter, const char *str, bool metric_no_group, bool metric_no_merge, bool metric_no_threshold, @@ -1420,7 +1423,8 @@ static int parse_groups(struct evlist *perf_evlist, /*group_events=*/false, tool_events, &combined_evlist, - (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu); + (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu, + cputype_filter); } if (combined) expr__ctx_free(combined); @@ -1476,7 +1480,8 @@ static int parse_groups(struct evlist *perf_evlist, if (!metric_evlist) { ret = parse_ids(metric_no_merge, fake_pmu, m->pctx, m->modifier, m->group_events, tool_events, &m->evlist, - (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu); + (pmu && strcmp(pmu, "all") == 0) ? NULL : pmu, + cputype_filter); if (ret) goto out; @@ -1543,6 +1548,7 @@ static int parse_groups(struct evlist *perf_evlist, if (combined_evlist) { evlist__splice_list_tail(perf_evlist, &combined_evlist->core.entries); evlist__delete(combined_evlist); + combined_evlist = NULL; } list_for_each_entry(m, &metric_list, nd) { @@ -1551,12 +1557,15 @@ static int parse_groups(struct evlist *perf_evlist, } out: + if (combined_evlist) + evlist__delete(combined_evlist); metricgroup__free_metrics(&metric_list); return ret; } int metricgroup__parse_groups(struct evlist *perf_evlist, const char *pmu, + bool cputype_filter, const char *str, bool metric_no_group, bool metric_no_merge, @@ -1570,16 +1579,17 @@ int metricgroup__parse_groups(struct evlist *perf_evlist, if (hardware_aware_grouping) pr_debug("Use hardware aware grouping instead of traditional metric grouping method\n"); - return parse_groups(perf_evlist, pmu, str, metric_no_group, metric_no_merge, + return parse_groups(perf_evlist, pmu, cputype_filter, str, metric_no_group, metric_no_merge, metric_no_threshold, user_requested_cpu_list, system_wide, /*fake_pmu=*/false, table); } int metricgroup__parse_groups_test(struct evlist *evlist, const struct pmu_metrics_table *table, - const char *str) + const char *str, + bool cputype_filter) { - return parse_groups(evlist, "all", str, + return parse_groups(evlist, "all", cputype_filter, str, /*metric_no_group=*/false, /*metric_no_merge=*/false, /*metric_no_threshold=*/false, diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h index 4be6bfc13c46..6a66f14dd01b 100644 --- a/tools/perf/util/metricgroup.h +++ b/tools/perf/util/metricgroup.h @@ -71,6 +71,7 @@ struct metric_event *metricgroup__lookup(struct rblist *metric_events, bool create); int metricgroup__parse_groups(struct evlist *perf_evlist, const char *pmu, + bool cputype_filter, const char *str, bool metric_no_group, bool metric_no_merge, @@ -80,7 +81,8 @@ int metricgroup__parse_groups(struct evlist *perf_evlist, bool hardware_aware_grouping); int metricgroup__parse_groups_test(struct evlist *evlist, const struct pmu_metrics_table *table, - const char *str); + const char *str, + bool cputype_filter); int metricgroup__for_each_metric(const struct pmu_metrics_table *table, pmu_metric_iter_fn fn, void *data); diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 943569e82b82..c62453f97b9e 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -429,6 +429,9 @@ bool parse_events__filter_pmu(const struct parse_events_state *parse_state, if (parse_state->pmu_filter == NULL) return false; + if (parse_state->cputype_filter && !pmu->is_core) + return false; + return perf_pmu__wildcard_match(pmu, parse_state->pmu_filter) == 0; } @@ -2288,18 +2291,20 @@ static int parse_events__sort_events_and_fix_groups(struct list_head *list) return (idx_changed || num_leaders != orig_num_leaders) ? 1 : 0; } -int __parse_events(struct evlist *evlist, const char *str, const char *pmu_filter, +int __parse_events(struct evlist *evlist, const char *str, + const char *pmu_filter, bool cputype_filter, struct parse_events_error *err, bool fake_pmu, bool warn_if_reordered, bool fake_tp) { struct parse_events_state parse_state = { - .list = LIST_HEAD_INIT(parse_state.list), - .idx = evlist->core.nr_entries, - .error = err, - .stoken = PE_START_EVENTS, - .fake_pmu = fake_pmu, - .fake_tp = fake_tp, - .pmu_filter = pmu_filter, + .list = LIST_HEAD_INIT(parse_state.list), + .idx = evlist->core.nr_entries, + .error = err, + .stoken = PE_START_EVENTS, + .fake_pmu = fake_pmu, + .fake_tp = fake_tp, + .pmu_filter = pmu_filter, + .cputype_filter = cputype_filter, .match_legacy_cache_terms = true, }; int ret, ret2; @@ -2312,8 +2317,8 @@ int __parse_events(struct evlist *evlist, const char *str, const char *pmu_filte } ret2 = parse_events__sort_events_and_fix_groups(&parse_state.list); - if (ret2 < 0) - return ret; + if (ret2 < 0 && !ret) + ret = ret2; /* * Add list to the evlist even with errors to allow callers to clean up. @@ -2518,8 +2523,9 @@ int parse_events_option(const struct option *opt, const char *str, int ret; parse_events_error__init(&err); - ret = __parse_events(*args->evlistp, str, args->pmu_filter, &err, - /*fake_pmu=*/false, /*warn_if_reordered=*/true, + ret = __parse_events(*args->evlistp, str, args->pmu_filter, + args->cputype_filter, &err, /*fake_pmu=*/false, + /*warn_if_reordered=*/true, /*fake_tp=*/false); if (ret) { diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h index 3577ab213730..b14c832b03a1 100644 --- a/tools/perf/util/parse-events.h +++ b/tools/perf/util/parse-events.h @@ -26,20 +26,23 @@ const char *event_type(size_t type); struct parse_events_option_args { struct evlist **evlistp; const char *pmu_filter; + bool cputype_filter; }; int parse_events_option(const struct option *opt, const char *str, int unset); int parse_events_option_new_evlist(const struct option *opt, const char *str, int unset); -__attribute__((nonnull(1, 2, 4))) -int __parse_events(struct evlist *evlist, const char *str, const char *pmu_filter, - struct parse_events_error *error, bool fake_pmu, - bool warn_if_reordered, bool fake_tp); +__attribute__((nonnull(1, 2, 5))) int +__parse_events(struct evlist *evlist, const char *str, const char *pmu_filter, + bool cputype_filter, struct parse_events_error *error, + bool fake_pmu, bool warn_if_reordered, bool fake_tp); __attribute__((nonnull(1, 2, 3))) static inline int parse_events(struct evlist *evlist, const char *str, struct parse_events_error *err) { - return __parse_events(evlist, str, /*pmu_filter=*/NULL, err, /*fake_pmu=*/false, - /*warn_if_reordered=*/true, /*fake_tp=*/false); + return __parse_events(evlist, str, /*pmu_filter=*/NULL, + /*cputype_filter=*/false, err, /*fake_pmu=*/false, + /*warn_if_reordered=*/true, + /*fake_tp=*/false); } int parse_event(struct evlist *evlist, const char *str); @@ -161,6 +164,8 @@ struct parse_events_state { bool fake_tp; /* If non-null, when wildcard matching only match the given PMU. */ const char *pmu_filter; + /* If true, the pmu_filter was set by --cputype option. */ + bool cputype_filter; /* Should PE_LEGACY_NAME tokens be generated for config terms? */ bool match_legacy_cache_terms; /* Were multiple PMUs scanned to find events? */ diff --git a/tools/perf/util/python.c b/tools/perf/util/python.c index cc1019d29a5d..63067ade4cbd 100644 --- a/tools/perf/util/python.c +++ b/tools/perf/util/python.c @@ -2104,7 +2104,8 @@ static PyObject *pyrf__parse_metrics(PyObject *self, PyObject *args) cpus = pcpus ? ((struct pyrf_cpu_map *)pcpus)->cpus : NULL; evlist__init(&evlist, cpus, threads); - ret = metricgroup__parse_groups(&evlist, pmu ?: "all", input, + ret = metricgroup__parse_groups(&evlist, pmu ?: "all", + /*cputype_filter=*/false, input, /*metric_no_group=*/ false, /*metric_no_merge=*/ false, /*metric_no_threshold=*/ true, -- 2.54.0.1189.g8c84645362-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v3 02/13] perf test: Truncate test description to fit terminal width 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers 2026-06-16 16:48 ` [PATCH v3 01/13] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers @ 2026-06-16 16:48 ` Ian Rogers 2026-06-16 16:48 ` [PATCH v3 03/13] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers ` (10 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 16:48 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The parallel test harness uses the carriage return delete escape sequence `PERF_COLOR_DELETE_LINE` ("\033[A\33[2K\r") to erase and update the "Running (X active)" progress lines. However, if a test description is longer than the terminal width, the line wraps around. When this happens, the cursor up escape sequence `\033[A` only moves the cursor to the last wrapped row, leaving the top half of the description printed on the previous line. This leads to name duplication and output corruption spilling over multiple rows on consoles narrower than the maximum description length (e.g., 101 columns wide). Fix this by dynamically querying the terminal width using `get_term_dimensions` and truncating the printed test descriptions using the `%-*.*s` printf format. We reserve 35 characters for prefix, status, and spacing metrics to guarantee the progress line never wraps. Fixes: 0e036dcad4e6 ("perf test: Display number of active running tests") Assisted-by: Antigravity:gemini-3.1-pro Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/builtin-test.c | 130 ++++++++++++++++++++++++-------- 1 file changed, 98 insertions(+), 32 deletions(-) diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c index afc06cec4954..1dcaeb8505b7 100644 --- a/tools/perf/tests/builtin-test.c +++ b/tools/perf/tests/builtin-test.c @@ -10,35 +10,39 @@ #ifdef HAVE_BACKTRACE_SUPPORT #include <execinfo.h> #endif -#include <poll.h> -#include <unistd.h> #include <setjmp.h> -#include <string.h> #include <stdlib.h> -#include <sys/types.h> +#include <string.h> + #include <dirent.h> -#include <sys/wait.h> +#include <linux/kernel.h> +#include <linux/string.h> +#include <linux/zalloc.h> +#include <poll.h> +#include <sys/ioctl.h> #include <sys/stat.h> #include <sys/time.h> +#include <sys/types.h> +#include <sys/wait.h> +#include <unistd.h> + +#include <subcmd/exec-cmd.h> +#include <subcmd/parse-options.h> +#include <subcmd/run-command.h> + #include "builtin.h" +#include "color.h" #include "config.h" +#include "debug.h" #include "hist.h" #include "intlist.h" -#include "tests.h" -#include "debug.h" -#include "color.h" -#include <subcmd/parse-options.h> -#include <subcmd/run-command.h> #include "string2.h" #include "symbol.h" +#include "tests-scripts.h" +#include "tests.h" #include "util/rlimit.h" #include "util/strbuf.h" -#include <linux/kernel.h> -#include <linux/string.h> -#include <subcmd/exec-cmd.h> -#include <linux/zalloc.h> - -#include "tests-scripts.h" +#include "util/term.h" static const char *junit_filename; static struct strbuf junit_xml_buf = STRBUF_INIT; @@ -413,23 +417,73 @@ static char *xml_escape(const char *str) return res ? res : strdup(""); } +static int get_term_width(void) +{ + struct winsize ws; + int cols = 80; + int term_width; + + /* + * If output is redirected to a file or piped, we don't need to wrap + * or truncate at all. Use a massive virtually infinite terminal width + * so descriptions are printed in full. + */ + if (!isatty(fileno(debug_file()))) + return 10000; + + get_term_dimensions(&ws); + if (ws.ws_col > 0) + cols = ws.ws_col; + + /* + * Limit description width to fit on a single line. We subtract 35 + * columns of headroom to allocate space for: + * - The suite index prefix: e.g. " 10.100:" (9 characters). + * - The colon separator and spaces: " : " (3 characters). + * - The longest status results: e.g. "Skip (some metrics failed)" (26 characters) + * or "Running (XX active)" (20 characters). + * + * A minimum description width of 10 is enforced to ensure names are + * legible even on very narrow consoles. + */ + term_width = cols - 35; + if (term_width < 10) + term_width = 10; + + return term_width; +} + +static int get_max_desc_width(int width) +{ + int term_width = get_term_width(); + + return width > term_width ? term_width : width; +} + static int print_test_result(struct test_suite *t, int curr_suite, int curr_test_case, int result, int width, int running, const char *err_output, double elapsed) { + int pad_width = get_max_desc_width(width); + int term_width = get_term_width(); + if (test_suite__num_test_cases(t) > 1) { char prefix[32]; int len = snprintf(prefix, sizeof(prefix), "%3d.%1d:", curr_suite + 1, curr_test_case + 1); - int subw = len >= 4 ? width + 4 - len : width; + int pad = len >= 4 ? pad_width + 4 - len : pad_width; + int trunc = len >= 4 ? term_width + 4 - len : term_width; - pr_info("%s %-*s:", prefix, subw, test_description(t, curr_test_case)); - } else - pr_info("%3d: %-*s:", curr_suite + 1, width, test_description(t, curr_test_case)); + pr_info("%s %-*.*s:", prefix, pad, trunc, + test_description(t, curr_test_case)); + } else { + pr_info("%3d: %-*.*s:", curr_suite + 1, pad_width, term_width, + test_description(t, curr_test_case)); + } switch (result) { case TEST_RUNNING: - color_fprintf(stderr, PERF_COLOR_YELLOW, " Running (%d active)\n", running); + color_fprintf(debug_file(), PERF_COLOR_YELLOW, " Running (%d active)\n", running); break; case TEST_OK: if (test_suite__num_test_cases(t) > 1) @@ -443,9 +497,9 @@ static int print_test_result(struct test_suite *t, int curr_suite, int curr_test summary_tests_skipped++; if (reason) - color_fprintf(stderr, PERF_COLOR_YELLOW, " Skip (%s)\n", reason); + color_fprintf(debug_file(), PERF_COLOR_YELLOW, " Skip (%s)\n", reason); else - color_fprintf(stderr, PERF_COLOR_YELLOW, " Skip\n"); + color_fprintf(debug_file(), PERF_COLOR_YELLOW, " Skip\n"); } break; case TEST_FAIL: @@ -459,7 +513,7 @@ static int print_test_result(struct test_suite *t, int curr_suite, int curr_test strbuf_addf_safe(&summary_failed_tests_buf, " %3d: %s\n", curr_suite + 1, test_description(t, curr_test_case)); - color_fprintf(stderr, PERF_COLOR_RED, " FAILED!\n"); + color_fprintf(debug_file(), PERF_COLOR_RED, " FAILED!\n"); break; } @@ -695,6 +749,7 @@ static void finish_test(struct child_test **child_tests, int running_test, int c int ret; struct timespec end_time; double elapsed; + width = get_max_desc_width(width); if (child_test == NULL) { /* Test wasn't started. */ @@ -709,7 +764,8 @@ static void finish_test(struct child_test **child_tests, int running_test, int c * sub test names. */ if (test_suite__num_test_cases(t) > 1 && curr_test_case == 0) - pr_info("%3d: %-*s:\n", curr_suite + 1, width, test_description(t, -1)); + pr_info("%3d: %-*.*s:\n", curr_suite + 1, width, width, + test_description(t, -1)); /* * Busy loop reading from the child's stdout/stderr that are set to be @@ -917,6 +973,8 @@ static int finish_tests_parallel(struct child_test **child_tests, size_t num_tes int last_suite_printed = -1; sigset_t set, oldset; + width = get_max_desc_width(width); + sigemptyset(&set); sigaddset(&set, SIGINT); sigaddset(&set, SIGTERM); @@ -985,8 +1043,11 @@ static int finish_tests_parallel(struct child_test **child_tests, size_t num_tes if (next_child) { if (test_suite__num_test_cases(next_child->test) > 1 && last_suite_printed != next_child->suite_num) { - pr_info("%3d: %-*s:\n", next_child->suite_num + 1, width, - test_description(next_child->test, -1)); + pr_info("%3d: %-*.*s:\n", + next_child->suite_num + 1, + width, width, + test_description( + next_child->test, -1)); last_suite_printed = next_child->suite_num; } print_test_result(next_child->test, next_child->suite_num, @@ -1049,7 +1110,8 @@ static int finish_tests_parallel(struct child_test **child_tests, size_t num_tes if (test_suite__num_test_cases(child->test) > 1 && last_suite_printed != child->suite_num) { - pr_info("%3d: %-*s:\n", child->suite_num + 1, width, + pr_info("%3d: %-*.*s:\n", child->suite_num + 1, + width, width, test_description(child->test, -1)); last_suite_printed = child->suite_num; } @@ -1173,12 +1235,12 @@ static void print_tests_summary(void) pr_info("Passed subtests : %u\n", summary_subtests_passed); pr_info("Skipped tests : %u\n", summary_tests_skipped); if (summary_tests_failed > 0) { - color_fprintf(stderr, PERF_COLOR_RED, "Failed tests : %u\n", + color_fprintf(debug_file(), PERF_COLOR_RED, "Failed tests : %u\n", summary_tests_failed); pr_info("List of failed tests:\n"); pr_info("%s", summary_failed_tests_buf.buf); } else { - color_fprintf(stderr, PERF_COLOR_GREEN, "Failed tests : 0\n"); + color_fprintf(debug_file(), PERF_COLOR_GREEN, "Failed tests : 0\n"); } if (junit_filename) { @@ -1296,9 +1358,13 @@ static int __cmd_test(struct test_suite **suites, int argc, const char *argv[], if (intlist__find(skiplist, curr_suite + 1)) { if (pass == 1) { - pr_info("%3d: %-*s:", curr_suite + 1, width, + int pad_width = get_max_desc_width(width); + int term_width = get_term_width(); + + pr_info("%3d: %-*.*s:", curr_suite + 1, + pad_width, term_width, test_description(*t, -1)); - color_fprintf(stderr, PERF_COLOR_YELLOW, + color_fprintf(debug_file(), PERF_COLOR_YELLOW, " Skip (user override)\n"); summary_tests_skipped++; if (junit_filename) { -- 2.54.0.1189.g8c84645362-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v3 03/13] perf tests workloads: Support sub-second durations in noploop and thloop 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers 2026-06-16 16:48 ` [PATCH v3 01/13] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers 2026-06-16 16:48 ` [PATCH v3 02/13] perf test: Truncate test description to fit terminal width Ian Rogers @ 2026-06-16 16:48 ` Ian Rogers 2026-06-16 16:48 ` [PATCH v3 04/13] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers ` (9 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 16:48 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht Currently, the noploop and thloop workloads only support sleep durations in integer seconds because they parse the argument using atoi() and use alarm() for timer signaling. To support much shorter execution times in tests (speeding up test suites and allowing faster retries), change the input parsing to use atof() for double floating-point seconds. Use ualarm() for fractional durations less than 1.0 seconds, and fall back to alarm() for durations of 1.0 second or more. Assisted-by: Antigravity:gemini-3.1-pro Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/workloads/noploop.c | 17 ++++++++++++++--- tools/perf/tests/workloads/thloop.c | 16 +++++++++++----- 2 files changed, 25 insertions(+), 8 deletions(-) diff --git a/tools/perf/tests/workloads/noploop.c b/tools/perf/tests/workloads/noploop.c index 656e472e6188..ca9f871e136f 100644 --- a/tools/perf/tests/workloads/noploop.c +++ b/tools/perf/tests/workloads/noploop.c @@ -15,15 +15,26 @@ static void sighandler(int sig __maybe_unused) static int noploop(int argc, const char **argv) { - int sec = 1; + double sec = 1.0; pthread_setname_np(pthread_self(), "perf-noploop"); if (argc > 0) - sec = atoi(argv[0]); + sec = atof(argv[0]); + + if (!(sec > 0.0)) { + fprintf(stderr, "Error: seconds (%f) must be > 0\n", sec); + return 1; + } signal(SIGINT, sighandler); signal(SIGALRM, sighandler); - alarm(sec); + + if (sec < 1.0) { + useconds_t usecs = (useconds_t)(sec * 1000000.0); + + ualarm(usecs > 0 ? usecs : 1, 0); + } else + alarm((unsigned int)sec); while (!done) continue; diff --git a/tools/perf/tests/workloads/thloop.c b/tools/perf/tests/workloads/thloop.c index bd8168f883fb..c830d739489f 100644 --- a/tools/perf/tests/workloads/thloop.c +++ b/tools/perf/tests/workloads/thloop.c @@ -31,14 +31,15 @@ static void *thfunc(void *arg) static int thloop(int argc, const char **argv) { - int nt = 2, sec = 1, err = 1; + int nt = 2, err = 1; + double sec = 1.0; pthread_t *thread_list = NULL; if (argc > 0) - sec = atoi(argv[0]); + sec = atof(argv[0]); - if (sec <= 0) { - fprintf(stderr, "Error: seconds (%d) must be >= 1\n", sec); + if (!(sec > 0.0)) { + fprintf(stderr, "Error: seconds (%f) must be > 0\n", sec); return 1; } @@ -67,7 +68,12 @@ static int thloop(int argc, const char **argv) goto out; } } - alarm(sec); + if (sec < 1.0) { + useconds_t usecs = (useconds_t)(sec * 1000000.0); + + ualarm(usecs > 0 ? usecs : 1, 0); + } else + alarm((unsigned int)sec); test_loop(); err = 0; out: -- 2.54.0.1189.g8c84645362-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v3 04/13] perf tests: Add robust record retry helper and use subsecond workloads 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers ` (2 preceding siblings ...) 2026-06-16 16:48 ` [PATCH v3 03/13] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers @ 2026-06-16 16:48 ` Ian Rogers 2026-06-16 16:48 ` [PATCH v3 05/13] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers ` (8 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 16:48 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht Introduce `perf_record_with_retry` and `perf_record_cleanup` in a shared library `tests/shell/lib/perf_record.sh` to prevent record test failures caused by transient recording or workload delays. Update `record.sh`, `record_lbr.sh`, `pipe_test.sh`, `kvm.sh`, and `stat_all_pfm.sh` to use this robust record retry logic. These tests now start with very short durations (e.g. 0.01 seconds) and scale up if the initial recording failed to capture samples, significantly improving test execution speed on success while remaining resilient to slow systems. Assisted-by: Antigravity:gemini-3.1-pro Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/kvm.sh | 64 +++++--- tools/perf/tests/shell/lib/perf_record.sh | 53 +++++++ tools/perf/tests/shell/pipe_test.sh | 4 +- tools/perf/tests/shell/record.sh | 173 +++++++++++----------- tools/perf/tests/shell/record_lbr.sh | 50 +++++-- tools/perf/tests/shell/stat_all_pfm.sh | 2 +- 6 files changed, 218 insertions(+), 128 deletions(-) create mode 100644 tools/perf/tests/shell/lib/perf_record.sh diff --git a/tools/perf/tests/shell/kvm.sh b/tools/perf/tests/shell/kvm.sh index f88e859025c4..2f67e1eb4c78 100755 --- a/tools/perf/tests/shell/kvm.sh +++ b/tools/perf/tests/shell/kvm.sh @@ -39,17 +39,28 @@ skip() { test_kvm_stat() { echo "Testing perf kvm stat" - echo "Recording kvm events for pid ${qemu_pid}..." - if ! perf kvm stat record -p "${qemu_pid}" -o "${perfdata}" sleep 1; then - echo "Failed to record kvm events" - err=1 - return - fi + local duration + local success=false + for duration in 1 2 4 8; do + echo "Recording kvm events for pid ${qemu_pid} (duration ${duration}s)..." + rm -f "${perfdata}" "${perfdata}".old + if ! perf kvm stat record -p "${qemu_pid}" -o "${perfdata}" \ + sleep ${duration} >/dev/null 2>&1; then + echo "perf kvm stat record failed, retrying..." + continue + fi - echo "Reporting kvm events..." - if ! perf kvm -i "${perfdata}" stat report 2>&1 | grep -q "VM-EXIT"; then + if [ -e "${perfdata}" ] && \ + perf kvm -i "${perfdata}" stat report 2>&1 | grep -q "VM-EXIT"; then + success=true + break + fi + echo "No VM-EXIT events found, retrying..." + done + + if [ "$success" = false ]; then echo "Failed to find VM-EXIT in report" - perf kvm -i "${perfdata}" stat report 2>&1 + perf kvm -i "${perfdata}" stat report 2>&1 || true err=1 return fi @@ -60,22 +71,29 @@ test_kvm_stat() { test_kvm_record_report() { echo "Testing perf kvm record/report" - echo "Recording kvm profile for pid ${qemu_pid}..." - # Use --host to avoid needing guest symbols/mounts for this simple test - # We just want to verify the command runs and produces data - # We run in background and kill it because 'perf kvm record' appends options - # after the command, which breaks 'sleep' (e.g. it gets '-e cycles'). - perf kvm --host record -p "${qemu_pid}" -o "${perfdata}" & - rec_pid=$! - sleep 1 - kill -INT "${rec_pid}" - wait "${rec_pid}" || true + local duration + local success=false + for duration in 1 2 4 8; do + echo "Recording kvm profile for pid ${qemu_pid} (duration ${duration}s)..." + rm -f "${perfdata}" "${perfdata}".old + + perf kvm --host record -p "${qemu_pid}" -o "${perfdata}" & + local rec_pid=$! + sleep ${duration} + kill -INT "${rec_pid}" + wait "${rec_pid}" || true + + if [ -e "${perfdata}" ] && \ + perf kvm -i "${perfdata}" report --stdio 2>&1 | grep -q "Event count"; then + success=true + break + fi + echo "No samples or report failed, retrying..." + done - echo "Reporting kvm profile..." - # Check for some standard output from report - if ! perf kvm -i "${perfdata}" report --stdio 2>&1 | grep -q "Event count"; then + if [ "$success" = false ]; then echo "Failed to report kvm profile" - perf kvm -i "${perfdata}" report --stdio 2>&1 + perf kvm -i "${perfdata}" report --stdio 2>&1 || true err=1 return fi diff --git a/tools/perf/tests/shell/lib/perf_record.sh b/tools/perf/tests/shell/lib/perf_record.sh new file mode 100644 index 000000000000..e137fa75370d --- /dev/null +++ b/tools/perf/tests/shell/lib/perf_record.sh @@ -0,0 +1,53 @@ +# SPDX-License-Identifier: GPL-2.0 + +PERF_RECORD_LOGS=() + +perf_record_with_retry() { + local perfdata="$1" + local check_cmd="$2" + local testprog_base="$3" + shift 3 + + local logfile + logfile=$(mktemp /tmp/__perf_record_retry.XXXXXX) + PERF_RECORD_LOGS+=("$logfile") + + # Save the e flag state and disable it + local save_e + if [[ $- == *e* ]]; then + save_e="set -e" + else + save_e="set +e" + fi + set +e + + local duration + local first_run=true + local ret=1 + for duration in 0.01 0.1 0.3 1.0 2.0; do + rm -f "${perfdata}".old + perf record "$@" -o "${perfdata}" ${testprog_base} ${duration} > "$logfile" 2>&1 + local record_exit=$? + + if [ "$first_run" = true ] && [ $record_exit -ne 0 ]; then + ret=2 + break + fi + first_run=false + + if [ -e "${perfdata}" ] && eval "${check_cmd}"; then + ret=0 + break + fi + done + + eval "$save_e" + return $ret +} + +perf_record_cleanup() { + for logfile in "${PERF_RECORD_LOGS[@]}"; do + rm -f "$logfile" + done + PERF_RECORD_LOGS=() +} diff --git a/tools/perf/tests/shell/pipe_test.sh b/tools/perf/tests/shell/pipe_test.sh index e459aa99a951..ce68d850c983 100755 --- a/tools/perf/tests/shell/pipe_test.sh +++ b/tools/perf/tests/shell/pipe_test.sh @@ -12,8 +12,8 @@ skip_test_missing_symbol ${sym} data=$(mktemp /tmp/perf.data.XXXXXX) data2=$(mktemp /tmp/perf.data2.XXXXXX) -prog="perf test -w noploop" -[ "$(uname -m)" = "s390x" ] && prog="$prog 3" +prog="perf test -w noploop 0.1" +[ "$(uname -m)" = "s390x" ] && prog="perf test -w noploop 3" err=0 set -e diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh index 7cb81cf3444a..dd90fef2088b 100755 --- a/tools/perf/tests/shell/record.sh +++ b/tools/perf/tests/shell/record.sh @@ -1,10 +1,13 @@ #!/bin/bash -# perf record tests (exclusive) # SPDX-License-Identifier: GPL-2.0 +# perf record tests set -e shelldir=$(dirname "$0") +. "${shelldir}"/lib/perf_record.sh + + # shellcheck source=lib/waiting.sh . "${shelldir}"/lib/waiting.sh @@ -39,6 +42,7 @@ cleanup() { rm -f "${perfdata}" rm -f "${perfdata}".old rm -f "${script_output}" + perf_record_cleanup trap - EXIT TERM INT } @@ -50,22 +54,20 @@ trap_cleanup() { } trap trap_cleanup EXIT TERM INT +check_per_thread() { + perf report -i "${perfdata}" -q | grep -q "${testsym}" +} + test_per_thread() { echo "Basic --per-thread mode test" - if ! perf record -o /dev/null --quiet ${testprog} 2> /dev/null - then + local ret=0 + perf_record_with_retry "${perfdata}" "check_per_thread" "perf test -w thloop" \ + --per-thread || ret=$? + if [ $ret -eq 2 ]; then echo "Per-thread record [Skipped event not supported]" return - fi - if ! perf record --per-thread -o "${perfdata}" ${testprog} 2> /dev/null - then - echo "Per-thread record [Failed record]" - err=1 - return - fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then - echo "Per-thread record [Failed missing output]" + elif [ $ret -eq 1 ]; then + echo "Per-thread record [Failed record or missing output]" err=1 return fi @@ -96,6 +98,10 @@ test_per_thread() { echo "Basic --per-thread mode test [Success]" } +check_register_capture() { + perf script -F ip,sym,iregs -i "${perfdata}" 2>/dev/null | grep -q "DI:" +} + test_register_capture() { echo "Register capture test" if ! perf list pmu | grep -q 'br_inst_retired.near_call' @@ -108,11 +114,12 @@ test_register_capture() { echo "Register capture test [Skipped missing registers]" return fi - if ! perf record -o - --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call \ - -c 1000 --per-thread ${testprog} 2> /dev/null \ - | perf script -F ip,sym,iregs -i - 2> /dev/null \ - | grep -q "DI:" - then + + local ret=0 + perf_record_with_retry "${perfdata}" "check_register_capture" "perf test -w thloop" \ + --intr-regs=di,r8,dx,cx -e br_inst_retired.near_call -c 1000 --per-thread || ret=$? + + if [ $ret -ne 0 ]; then echo "Register capture test [Failed missing output]" err=1 return @@ -120,65 +127,66 @@ test_register_capture() { echo "Register capture test [Success]" } +check_system_wide() { + perf report -i "${perfdata}" -q | grep -q "${testsym}" +} + test_system_wide() { echo "Basic --system-wide mode test" - if ! perf record -aB --synth=no -o "${perfdata}" ${testprog} 2> /dev/null - then + local ret=0 + perf_record_with_retry "${perfdata}" "check_system_wide" "perf test -w thloop" \ + -aB --synth=no || ret=$? + if [ $ret -eq 2 ]; then echo "System-wide record [Skipped not supported]" return - fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then + elif [ $ret -eq 1 ]; then echo "System-wide record [Failed missing output]" err=1 return fi - if ! perf record -aB --synth=no -e cpu-clock,cs --threads=cpu \ - -o "${perfdata}" ${testprog} 2> /dev/null - then - echo "System-wide record [Failed record --threads option]" - err=1 - return - fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then - echo "System-wide record [Failed --threads missing output]" + + ret=0 + perf_record_with_retry "${perfdata}" "check_system_wide" "perf test -w thloop" \ + -aB --synth=no -e cpu-clock,cs --threads=cpu || ret=$? + if [ $ret -ne 0 ]; then + echo "System-wide record [Failed record --threads option or missing output]" err=1 return fi echo "Basic --system-wide mode test [Success]" } +check_workload() { + perf report -i "${perfdata}" -q | grep -q "${testsym}" +} + test_workload() { echo "Basic target workload test" - if ! perf record -o "${perfdata}" ${testprog} 2> /dev/null - then - echo "Workload record [Failed record]" + local ret=0 + perf_record_with_retry "${perfdata}" "check_workload" "perf test -w thloop" || ret=$? + if [ $ret -ne 0 ]; then + echo "Workload record [Failed record or missing output]" err=1 return fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then - echo "Workload record [Failed missing output]" - err=1 - return - fi - if ! perf record -e cpu-clock,cs --threads=package \ - -o "${perfdata}" ${testprog} 2> /dev/null - then - echo "Workload record [Failed record --threads option]" - err=1 - return - fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then - echo "Workload record [Failed --threads missing output]" + + ret=0 + perf_record_with_retry "${perfdata}" "check_workload" "perf test -w thloop" \ + -e cpu-clock,cs --threads=package || ret=$? + if [ $ret -ne 0 ]; then + echo "Workload record [Failed record --threads option or missing output]" err=1 return fi echo "Basic target workload test [Success]" } +check_branch_counter() { + perf report -i "${perfdata}" -D -q 2>/dev/null | grep -q "$br_cntr_output" && \ + perf script -i "${perfdata}" -F +brstackinsn,+brcntr 2>/dev/null | \ + grep -q "$br_cntr_script_output" +} + test_branch_counter() { echo "Branch counter test" # Check if the branch counter feature is supported @@ -190,67 +198,60 @@ test_branch_counter() { return fi done - if ! perf record -o "${perfdata}" -e "{branches:p,instructions}" -j any,counter ${testprog} 2> /dev/null - then - echo "Branch counter record test [Failed record]" - err=1 - return - fi - if ! perf report -i "${perfdata}" -D -q | grep -q "$br_cntr_output" - then - echo "Branch counter report test [Failed missing output]" - err=1 - return - fi - if ! perf script -i "${perfdata}" -F +brstackinsn,+brcntr | grep -q "$br_cntr_script_output" - then - echo " Branch counter script test [Failed missing output]" + local ret=0 + perf_record_with_retry "${perfdata}" "check_branch_counter" "perf test -w thloop" \ + -e "{branches:p,instructions}" -j any,counter || ret=$? + if [ $ret -ne 0 ]; then + echo "Branch counter test [Failed record or missing output]" err=1 return fi echo "Branch counter test [Success]" } +check_cgroup() { + perf report -i "${perfdata}" -D 2>/dev/null | grep -q "CGROUP" && \ + perf script -i "${perfdata}" -F cgroup 2>/dev/null | grep -q -v "unknown" +} + test_cgroup() { echo "Cgroup sampling test" - if ! perf record -aB --synth=cgroup --all-cgroups -o "${perfdata}" ${testprog} 2> /dev/null - then + local ret=0 + perf_record_with_retry "${perfdata}" "check_cgroup" "perf test -w thloop" \ + -aB --synth=cgroup --all-cgroups || ret=$? + if [ $ret -eq 2 ]; then echo "Cgroup sampling [Skipped not supported]" return - fi - if ! perf report -i "${perfdata}" -D | grep -q "CGROUP" - then + elif [ $ret -eq 1 ]; then echo "Cgroup sampling [Failed missing output]" err=1 return fi - if ! perf script -i "${perfdata}" -F cgroup | grep -q -v "unknown" - then - echo "Cgroup sampling [Failed cannot resolve cgroup names]" - err=1 - return - fi echo "Cgroup sampling test [Success]" } +check_uid() { + perf report -i "${perfdata}" -q | grep -q "${testsym}" +} + test_uid() { echo "Uid sampling test" - if ! perf record -aB --synth=no --uid "$(id -u)" -o "${perfdata}" ${testprog} \ - > "${script_output}" 2>&1 - then - if grep -q "libbpf.*EPERM" "${script_output}" + local ret=0 + perf_record_with_retry "${perfdata}" "check_uid" "perf test -w thloop" \ + -aB --synth=no --uid "$(id -u)" || ret=$? + if [ $ret -eq 2 ]; then + local logfile="${PERF_RECORD_LOGS[${#PERF_RECORD_LOGS[@]}-1]}" + if grep -q -E "libbpf.*EPERM|Access to performance monitoring" "$logfile" || \ + grep -q -E "Permission denied|Failure to open any events" "$logfile" then echo "Uid sampling [Skipped permissions]" return else echo "Uid sampling [Failed to record]" err=1 - # cat "${script_output}" return fi - fi - if ! perf report -i "${perfdata}" -q | grep -q "${testsym}" - then + elif [ $ret -eq 1 ]; then echo "Uid sampling [Failed missing output]" err=1 return diff --git a/tools/perf/tests/shell/record_lbr.sh b/tools/perf/tests/shell/record_lbr.sh index 78a02e90ece1..8d51afeb437b 100755 --- a/tools/perf/tests/shell/record_lbr.sh +++ b/tools/perf/tests/shell/record_lbr.sh @@ -1,9 +1,12 @@ #!/bin/bash -# perf record LBR tests (exclusive) # SPDX-License-Identifier: GPL-2.0 +# perf record LBR tests set -e +shelldir=$(dirname "$0") +. "${shelldir}"/lib/perf_record.sh + ParanoidAndNotRoot() { [ "$(id -u)" != 0 ] && [ "$(cat /proc/sys/kernel/perf_event_paranoid)" -gt $1 ] } @@ -22,6 +25,7 @@ cleanup() { rm -rf "${perfdata}" rm -rf "${perfdata}".old rm -rf "${perfdata}".txt + perf_record_cleanup trap - EXIT TERM INT } @@ -34,22 +38,28 @@ trap_cleanup() { trap trap_cleanup EXIT TERM INT +check_lbr_callgraph() { + perf report --stitch-lbr -i "${perfdata}" > "${perfdata}".txt 2>&1 +} + lbr_callgraph_test() { test="LBR callgraph" echo "$test" - if ! perf record -e cycles --call-graph lbr -o "${perfdata}" perf test -w thloop - then + set +e + perf_record_with_retry "${perfdata}" "check_lbr_callgraph" "perf test -w thloop" \ + -e cycles --call-graph lbr + local ret=$? + set -e + + if [ $ret -eq 2 ]; then echo "$test [Failed support missing]" if [ $err -eq 0 ] then err=2 fi return - fi - - if ! perf report --stitch-lbr -i "${perfdata}" > "${perfdata}".txt - then + elif [ $ret -eq 1 ]; then cat "${perfdata}".txt echo "$test [Failed in perf report]" err=1 @@ -59,6 +69,12 @@ lbr_callgraph_test() { echo "$test [Success]" } +check_lbr_samples() { + local out + out=$(perf report -D -i "${perfdata}" 2> /dev/null | grep -A1 'PERF_RECORD_SAMPLE') + [ "$(echo "$out" | grep -c 'PERF_RECORD_SAMPLE' || true)" -gt 0 ] +} + lbr_test() { local branch_flags=$1 local test="LBR $2 test" @@ -70,25 +86,27 @@ lbr_test() { local r echo "$test" - if ! perf record -e cycles $branch_flags -o "${perfdata}" perf test -w thloop - then + set +e + perf_record_with_retry "${perfdata}" "check_lbr_samples" "perf test -w thloop" \ + -e cycles $branch_flags + local ret=$? + set -e + + if [ $ret -eq 2 ]; then echo "$test [Failed support missing]" - perf record -e cycles $branch_flags -o "${perfdata}" perf test -w thloop || true if [ $err -eq 0 ] then err=2 fi return - fi - - out=$(perf report -D -i "${perfdata}" 2> /dev/null | grep -A1 'PERF_RECORD_SAMPLE') - sam_nr=$(echo "$out" | grep -c 'PERF_RECORD_SAMPLE' || true) - if [ $sam_nr -eq 0 ] - then + elif [ $ret -eq 1 ]; then echo "$test [Failed no samples captured]" err=1 return fi + + out=$(perf report -D -i "${perfdata}" 2> /dev/null | grep -A1 'PERF_RECORD_SAMPLE') + sam_nr=$(echo "$out" | grep -c 'PERF_RECORD_SAMPLE' || true) echo "$test: $sam_nr samples" bs_nr=$(echo "$out" | grep -c 'branch stack: nr:' || true) diff --git a/tools/perf/tests/shell/stat_all_pfm.sh b/tools/perf/tests/shell/stat_all_pfm.sh index c08c186af2c4..ec262c6b1674 100755 --- a/tools/perf/tests/shell/stat_all_pfm.sh +++ b/tools/perf/tests/shell/stat_all_pfm.sh @@ -32,7 +32,7 @@ do then # We failed to see the event and it is supported. Possibly the workload was # too small so retry with something longer. - result=$(perf stat --pfm-events "$p" perf bench internals synthesize 2>&1) + result=$(perf stat --pfm-events "$p" perf test -w noploop 0.1 2>&1) x=$? if test "$x" -ne "0" then -- 2.54.0.1189.g8c84645362-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v3 05/13] perf tests: Skip metrics validation if system-wide recording lacks permission 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers ` (3 preceding siblings ...) 2026-06-16 16:48 ` [PATCH v3 04/13] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers @ 2026-06-16 16:48 ` Ian Rogers 2026-06-16 16:48 ` [PATCH v3 06/13] perf tests: Fix Python JIT dump profiling test failure Ian Rogers ` (7 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 16:48 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The metrics value validation test requires system-wide recording (`-a`), which can fail on systems without root permissions or where paranoid levels restrict tracing. Add a check to skip the test if `-a` is not supported. Also fix false negatives during validation by updating parse error string patterns and resolving issues in metric list generation. Fixes: 3ad7092f5145 ("perf test: Add metric value validation test") Assisted-by: Antigravity:gemini-3.1-pro Signed-off-by: Ian Rogers <irogers@google.com> --- .../tests/shell/lib/perf_metric_validation.py | 11 ++- tools/perf/tests/shell/stat_all_metrics.sh | 75 ++++++++++++------- tools/perf/tests/shell/stat_metrics_values.sh | 7 ++ 3 files changed, 60 insertions(+), 33 deletions(-) diff --git a/tools/perf/tests/shell/lib/perf_metric_validation.py b/tools/perf/tests/shell/lib/perf_metric_validation.py index dea8ef1977bf..3d52f94f22b9 100644 --- a/tools/perf/tests/shell/lib/perf_metric_validation.py +++ b/tools/perf/tests/shell/lib/perf_metric_validation.py @@ -383,10 +383,13 @@ class Validator: wl = workload.split() command.extend(wl) print(" ".join(command)) - cmd = subprocess.run(command, stderr=subprocess.PIPE, encoding='utf-8') - data = [x+'}' for x in cmd.stderr.split('}\n') if x] - if data[0][0] != '{': - data[0] = data[0][data[0].find('{'):] + cmd = subprocess.run(command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, encoding='utf-8') + lines = cmd.stderr.splitlines() + cmd.stdout.splitlines() + data = [] + for line in lines: + line = line.strip() + if line.startswith('{') and line.endswith('}'): + data.append(line) return data def collect_perf(self, workload: str): diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh index b582d23f28c9..feeb34c6fa6d 100755 --- a/tools/perf/tests/shell/stat_all_metrics.sh +++ b/tools/perf/tests/shell/stat_all_metrics.sh @@ -12,38 +12,65 @@ system_wide_flag="-a" if ParanoidAndNotRoot 0 then system_wide_flag="" - test_prog="perf test -w noploop" + test_prog="perf test -w noploop 0.01" fi +check_metric() { + local output="$1" + local status="$2" + local metric="$3" + + if [[ $status -ne 0 || ! "$output" =~ ${metric:0:50} ]]; then + return 1 + fi + + if [[ "$output" =~ "<not counted>" || "$output" =~ "<not supported>" ]]; then + return 1 + fi + + return 0 +} + skip=0 err=3 for m in $(perf list --raw-dump metrics); do echo "Testing $m" result=$(perf stat -M "$m" $system_wide_flag -- $test_prog 2>&1) result_err=$? - if [[ $result_err -eq 0 && "$result" =~ ${m:0:50} ]] - then - # No error result and metric shown. + + if check_metric "$result" $result_err "$m"; then if [[ "$err" -ne 1 ]] then err=0 fi continue fi - if [[ "$result" =~ "Cannot resolve IDs for" || "$result" =~ "No supported events found" ]] + + if [[ "$result" =~ "Access to performance monitoring and observability operations is limited" || \ + "$result" =~ "in per-thread mode, enable system wide" || \ + "$result" =~ "<not supported>" || \ + "$result" =~ "Cannot resolve IDs for" || \ + "$result" =~ "No supported events found" || \ + "$result" =~ "FP_ARITH" || \ + "$result" =~ "AMX" || \ + "$result" =~ "PMM" ]] then - if [[ $(perf list --raw-dump $m) == "Default"* ]] - then - echo "[Ignored $m] failed but as a Default metric this can be expected" - echo $result + true + else + result=$(perf stat -M "$m" $system_wide_flag -- perf test -w noploop 0.1 2>&1) + result_err=$? + + if check_metric "$result" $result_err "$m"; then + if [[ "$err" -ne 1 ]] + then + err=0 + fi continue fi - echo "[Failed $m] Metric contains missing events" - echo $result - err=1 # Fail - continue - elif [[ "$result" =~ \ - "Access to performance monitoring and observability operations is limited" ]] + fi + + # If retry also failed, determine if we skip, ignore, or fail + if [[ "$result" =~ "Access to performance monitoring and observability operations is limited" ]] then echo "[Skipped $m] Permission failure" echo $result @@ -61,7 +88,9 @@ for m in $(perf list --raw-dump metrics); do skip=1 fi continue - elif [[ "$result" =~ "<not supported>" ]] + elif [[ "$result" =~ "<not supported>" || \ + "$result" =~ "Cannot resolve IDs for" || \ + "$result" =~ "No supported events found" ]] then if [[ $(perf list --raw-dump $m) == "Default"* ]] then @@ -105,19 +134,7 @@ for m in $(perf list --raw-dump metrics); do continue fi - # Failed, possibly the workload was too small so retry with something longer. - result=$(perf stat -M "$m" $system_wide_flag -- perf bench internals synthesize 2>&1) - result_err=$? - if [[ $result_err -eq 0 && "$result" =~ ${m:0:50} ]] - then - # No error result and metric shown. - if [[ "$err" -ne 1 ]] - then - err=0 - fi - continue - fi - echo "[Failed $m] has non-zero error '$result_err' or not printed in:" + echo "[Failed $m] has non-zero error '$result_err' or not printed/counted in:" echo "$result" err=1 done diff --git a/tools/perf/tests/shell/stat_metrics_values.sh b/tools/perf/tests/shell/stat_metrics_values.sh index 30566f0b5427..76f1e99d1273 100755 --- a/tools/perf/tests/shell/stat_metrics_values.sh +++ b/tools/perf/tests/shell/stat_metrics_values.sh @@ -8,6 +8,13 @@ shelldir=$(dirname "$0") grep -q GenuineIntel /proc/cpuinfo || { echo Skipping non-Intel; exit 2; } +# Skip if no permission to record system-wide events +if ! perf stat -a -e instructions sleep 0.01 >/dev/null 2>&1; then + echo "Skipping: no permission to record system-wide events (-a)" + exit 2 +fi + + pythonvalidator=$(dirname $0)/lib/perf_metric_validation.py rulefile=$(dirname $0)/lib/perf_metric_validation_rules.json tmpdir=$(mktemp -d /tmp/__perf_test.program.XXXXX) -- 2.54.0.1189.g8c84645362-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v3 06/13] perf tests: Fix Python JIT dump profiling test failure 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers ` (4 preceding siblings ...) 2026-06-16 16:48 ` [PATCH v3 05/13] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers @ 2026-06-16 16:48 ` Ian Rogers 2026-06-16 16:48 ` [PATCH v3 07/13] perf tests: Fix flakiness in trace record and replay test Ian Rogers ` (6 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 16:48 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The `python profiling with jitdump` test failed due to: 1. Target PID extraction resolving to duplicate space-separated values, which broke the buildid-cache loops. 2. The default workload duration being too short to capture JIT stack trampoline samples, resulting in 0 matching JIT symbols. Fix the PID parsing by sorting and retrieving a unique single-line value. Implement a robust retry loop starting at 1M python loop iterations and scaling up to 100M iterations until JIT symbols are successfully captured and verified. Fixes: c9cd0c7e529e ("perf test: Add python JIT dump test") Assisted-by: Antigravity:gemini-3.1-pro Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/jitdump-python.sh | 79 ++++++++++++++++-------- 1 file changed, 53 insertions(+), 26 deletions(-) diff --git a/tools/perf/tests/shell/jitdump-python.sh b/tools/perf/tests/shell/jitdump-python.sh index ae86203b14a2..05aaa3bd900b 100755 --- a/tools/perf/tests/shell/jitdump-python.sh +++ b/tools/perf/tests/shell/jitdump-python.sh @@ -16,11 +16,15 @@ if [ "${HAS_PERF_JIT}" != "True" ]; then exit 2 fi -PERF_DATA=$(mktemp /tmp/__perf_test.perf.data.XXXXXX) +PERF_DATA_DIR=$(mktemp -d /tmp/__perf_test.perf.data.dir.XXXXXX) +PERF_DATA="${PERF_DATA_DIR}/perf.data" cleanup() { echo "Cleaning up files..." - rm -f ${PERF_DATA} ${PERF_DATA}.jit /tmp/jit-${PID}.dump /tmp/jitted-${PID}-*.so 2> /dev/null + rm -rf ${PERF_DATA_DIR} 2> /dev/null + for p in ${ALL_PIDS}; do + rm -f /tmp/jit-${p}.dump /tmp/jitted-${p}-*.so 2> /dev/null + done trap - EXIT TERM INT } @@ -33,9 +37,16 @@ trap_cleanup() { trap trap_cleanup EXIT TERM INT -echo "Run python with -Xperf_jit" -cat <<EOF | perf record -k 1 -g --call-graph dwarf -o "${PERF_DATA}" \ - -- ${PYTHON} -Xperf_jit +ALL_PIDS="" +NUM=0 +for iterations in 1000000 10000000 50000000 100000000; do + echo "Running with $iterations iterations..." + rm -f "${PERF_DATA}.pid" + cat <<EOF | perf record -k 1 -g --call-graph dwarf -o "${PERF_DATA}" -- ${PYTHON} -Xperf_jit +import os +with open("${PERF_DATA}.pid", "w") as f: + f.write(str(os.getpid())) + def foo(n): result = 0 for _ in range(n): @@ -49,29 +60,45 @@ def baz(n): bar(n) if __name__ == "__main__": - baz(1000000) + baz($iterations) EOF -# extract PID of the target process from the data -_PID=$(perf report -i "${PERF_DATA}" -F pid -q -g none | cut -d: -f1 -s) -PID=$(echo -n $_PID) # remove newlines - -echo "Generate JIT-ed DSOs using perf inject" -DEBUGINFOD_URLS='' perf inject -i "${PERF_DATA}" -j -o "${PERF_DATA}.jit" - -echo "Add JIT-ed DSOs to the build-ID cache" -for F in /tmp/jitted-${PID}-*.so; do - perf buildid-cache -a "${F}" -done - -echo "Check the symbol containing the function/module name" -NUM=$(perf report -i "${PERF_DATA}.jit" -s sym | grep -cE 'py::(foo|bar|baz):<stdin>') - -echo "Found ${NUM} matching lines" - -echo "Remove JIT-ed DSOs from the build-ID cache" -for F in /tmp/jitted-${PID}-*.so; do - perf buildid-cache -r "${F}" + if [ -f "${PERF_DATA}.pid" ]; then + REAL_PID=$(cat "${PERF_DATA}.pid") + ALL_PIDS="${ALL_PIDS} ${REAL_PID}" + fi + + # extract PID of the target process from the data + PID=$(perf report -i "${PERF_DATA}" --stdio -F pid -q -g none | \ + cut -d: -f1 -s | sort -u | head -n 1 | tr -d ' ') + if [ -z "${PID}" ]; then + echo "Failed to get PID, retrying..." + continue + fi + ALL_PIDS="${ALL_PIDS} ${PID}" + + echo "Generate JIT-ed DSOs using perf inject" + DEBUGINFOD_URLS='' perf inject -i "${PERF_DATA}" -j -o "${PERF_DATA}.jit" + + echo "Add JIT-ed DSOs to the build-ID cache" + for F in /tmp/jitted-${PID}-*.so; do + perf buildid-cache -a "${F}" + done + + echo "Check the symbol containing the function/module name" + NUM=$(perf report -i "${PERF_DATA}.jit" -s sym --stdio | grep -cE 'py::(foo|bar|baz):<stdin>') + + echo "Remove JIT-ed DSOs from the build-ID cache" + for F in /tmp/jitted-${PID}-*.so; do + perf buildid-cache -r "${F}" + done + rm -f /tmp/jitted-${PID}-*.so /tmp/jit-${PID}.dump 2>/dev/null + + if [ "${NUM}" -gt 0 ]; then + echo "Success: found ${NUM} matching lines" + break + fi + echo "No matching lines found, retrying with more iterations..." done cleanup -- 2.54.0.1189.g8c84645362-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v3 07/13] perf tests: Fix flakiness in trace record and replay test 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers ` (5 preceding siblings ...) 2026-06-16 16:48 ` [PATCH v3 06/13] perf tests: Fix Python JIT dump profiling test failure Ian Rogers @ 2026-06-16 16:48 ` Ian Rogers 2026-06-16 16:48 ` [PATCH v3 08/13] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers ` (5 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 16:48 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The `perf trace record and replay` test fails intermittently on slow or virtualized hosts because the default recording workload (`sleep 1`) occasionally completes without scheduling the target `nanosleep` or `clock_nanosleep` system calls inside the recorded sample window, resulting in the error: `Failed: cannot find *nanosleep syscall`. Generalize the `perf_record_with_retry` helper in `tests/shell/lib/perf_record.sh` to support a custom record command prefix via the `PERF_RECORD_CMD` environment variable (defaulting to "perf record"). Update `trace_record_replay.sh` to use this robust retry loop running with `PERF_RECORD_CMD="perf trace record"` and a base workload of `sleep`. The test will automatically retry with scaled sleep durations (from 0.01s up to 2.0s) until the required `nanosleep` event is successfully captured. Fixes: 15bcfb96d0dd ("perf test: Add trace record and replay test") Assisted-by: Antigravity:gemini-3.1-pro Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/lib/perf_record.sh | 7 ++++++- tools/perf/tests/shell/trace_record_replay.sh | 18 ++++++++++++++---- 2 files changed, 20 insertions(+), 5 deletions(-) diff --git a/tools/perf/tests/shell/lib/perf_record.sh b/tools/perf/tests/shell/lib/perf_record.sh index e137fa75370d..2b9e11b66dc7 100644 --- a/tools/perf/tests/shell/lib/perf_record.sh +++ b/tools/perf/tests/shell/lib/perf_record.sh @@ -24,9 +24,14 @@ perf_record_with_retry() { local duration local first_run=true local ret=1 + local cmd_prefix="perf record" + if [ -n "${PERF_RECORD_CMD}" ]; then + cmd_prefix="${PERF_RECORD_CMD}" + fi + for duration in 0.01 0.1 0.3 1.0 2.0; do rm -f "${perfdata}".old - perf record "$@" -o "${perfdata}" ${testprog_base} ${duration} > "$logfile" 2>&1 + ${cmd_prefix} "$@" -o "${perfdata}" ${testprog_base} ${duration} > "$logfile" 2>&1 local record_exit=$? if [ "$first_run" = true ] && [ $record_exit -ne 0 ]; then diff --git a/tools/perf/tests/shell/trace_record_replay.sh b/tools/perf/tests/shell/trace_record_replay.sh index 88d30a03dcec..f27e32b18697 100755 --- a/tools/perf/tests/shell/trace_record_replay.sh +++ b/tools/perf/tests/shell/trace_record_replay.sh @@ -6,16 +6,26 @@ # shellcheck source=lib/probe.sh . "$(dirname $0)"/lib/probe.sh +# shellcheck source=lib/perf_record.sh +. "$(dirname $0)"/lib/perf_record.sh skip_if_no_perf_trace || exit 2 [ "$(id -u)" = 0 ] || exit 2 file=$(mktemp /tmp/temporary_file.XXXXX) -perf trace record -o ${file} sleep 1 || exit 1 -if ! perf trace -i ${file} 2>&1 | grep nanosleep; then +check_nanosleep() { + perf trace -i "${file}" 2>&1 | grep -q nanosleep +} + +PERF_RECORD_CMD="perf trace record" perf_record_with_retry "${file}" "check_nanosleep" "sleep" +err=$? + +perf_record_cleanup +rm -f ${file} + +if [ $err -ne 0 ]; then echo "Failed: cannot find *nanosleep syscall" exit 1 fi - -rm -f ${file} +exit 0 -- 2.54.0.1189.g8c84645362-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v3 08/13] perf tests: Fix flakiness in BPF counters test on hybrid systems 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers ` (6 preceding siblings ...) 2026-06-16 16:48 ` [PATCH v3 07/13] perf tests: Fix flakiness in trace record and replay test Ian Rogers @ 2026-06-16 16:48 ` Ian Rogers 2026-06-16 16:48 ` [PATCH v3 09/13] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers ` (4 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 16:48 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The `perf stat --bpf-counters test` fails intermittently on hybrid architectures or systems with dynamic frequency scaling (DVFS). This happens because the test workload (`sqrtloop`) runs for a fixed 1-second duration, and the CPU frequency can scale dynamically between idle and maximum frequency. As the first run runs on a cold CPU and the second run runs on a warmed-up CPU (or vice versa), the number of instructions executed in 1 second differs by up to 2.2x, violating the comparison tolerance. Also, when running as root, BPF tracepoints and scheduling programs trigger frequently. Since standard `perf stat -e instructions` measures both user and kernel space instructions, it counts BPF helper and program execution overheads, whereas the BPF counters themselves do not self- measure. This introduces a large kernel-space instruction count discrepancy between standard and BPF counters. Fix these issues by: 1. Switching the workload to a strictly deterministic, iteration-based workload: `awk 'BEGIN { for (i=0; i<10000000; i++) sum+=i }'`. We pin the workload to a single random allowed CPU using `taskset -c $CPU` via a bash array. 2. Restricting the counted event to user-space only (`instructions:u` or `/u`). 3. Tightening the comparison tolerance from 20% to 15%. These modifications isolate the measurements to user-space instructions of the deterministic loop, which executes a virtually identical number of instructions on both runs (with less than 0.001% variation), eliminating Dynamic Frequency Scaling (DVFS), kernel scheduling noise, and BPF helper self-measurement overheads. Fixes: 2c0cb9f56020 ("perf test: Add a shell test for 'perf stat --bpf-counters' new option") Assisted-by: Antigravity:gemini-3.1-pro Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/stat_bpf_counters.sh | 28 +++++++++++++-------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/tools/perf/tests/shell/stat_bpf_counters.sh b/tools/perf/tests/shell/stat_bpf_counters.sh index 35463358b273..11de77ee38ad 100755 --- a/tools/perf/tests/shell/stat_bpf_counters.sh +++ b/tools/perf/tests/shell/stat_bpf_counters.sh @@ -4,21 +4,26 @@ set -e -workload="perf test -w sqrtloop" +# Get the first allowed CPU +CPU=$(taskset -c -p $$ | awk -F': ' '{print $2}' | awk -F'[,-]' '{print $1}') +if [ -z "$CPU" ]; then + CPU=0 +fi +workload=(taskset -c "$CPU" awk 'BEGIN { for (i=0; i<10000000; i++) sum+=i }') -# check whether $2 is within +/- 20% of $1 +# check whether $2 is within +/- 15% of $1 compare_number() { first_num=$1 second_num=$2 - # upper bound is first_num * 120% - upper=$(expr $first_num + $first_num / 5 ) - # lower bound is first_num * 80% - lower=$(expr $first_num - $first_num / 5 ) + # upper bound is first_num * 115% + upper=$(expr $first_num + $first_num / 20 \* 3 ) + # lower bound is first_num * 85% + lower=$(expr $first_num - $first_num / 20 \* 3 ) if [ $second_num -gt $upper ] || [ $second_num -lt $lower ]; then - echo "The difference between $first_num and $second_num are greater than 20%." + echo "The difference between $first_num and $second_num are greater than 15%." exit 1 fi } @@ -41,11 +46,12 @@ check_counts() test_bpf_counters() { printf "Testing --bpf-counters " - base_instructions=$(perf stat --no-big-num -e instructions -- $workload 2>&1 | \ + base_instructions=$(perf stat --no-big-num -e instructions:u -- "${workload[@]}" 2>&1 | \ awk -v i=0 -v c=0 '/instructions/ { \ if ($1 != "<not") { i++; c += $1 } \ } END { if (i > 0) printf "%.0f", c; else print "<not" }') - bpf_instructions=$(perf stat --no-big-num --bpf-counters -e instructions -- $workload 2>&1 | \ + bpf_instructions=$(perf stat --no-big-num --bpf-counters -e instructions:u \ + -- "${workload[@]}" 2>&1 | \ awk -v i=0 -v c=0 '/instructions/ { \ if ($1 != "<not") { i++; c += $1 } \ } END { if (i > 0) printf "%.0f", c; else print "<not" }') @@ -57,7 +63,9 @@ test_bpf_counters() test_bpf_modifier() { printf "Testing bpf event modifier " - stat_output=$(perf stat --no-big-num -e instructions/name=base_instructions/,instructions/name=bpf_instructions/b -- $workload 2>&1) + stat_output=$(perf stat --no-big-num \ + -e instructions/name=base_instructions/u,instructions/name=bpf_instructions/bu \ + -- "${workload[@]}" 2>&1) base_instructions=$(echo "$stat_output"| \ awk -v i=0 -v c=0 '/base_instructions/ { \ if ($1 != "<not") { i++; c += $1 } \ -- 2.54.0.1189.g8c84645362-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v3 09/13] perf tests: Fix flakiness in branch stack sampling tests 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers ` (7 preceding siblings ...) 2026-06-16 16:48 ` [PATCH v3 08/13] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers @ 2026-06-16 16:48 ` Ian Rogers 2026-06-16 16:48 ` [PATCH v3 10/13] perf tests: Speed up off-cpu profiling tests Ian Rogers ` (3 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 16:48 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The branch stack sampling test (test 130) runs short iteration-based workloads to verify syscall, kernel, and trap branch stack sampling. Specifically, `test_syscall()` and `test_kernel_branches()` run `perf bench syscall basic` with loop counts of 8000 and 1000, and `test_trap_eret_branches()` runs `traploop` with 1000 iterations. Because these loop limits are extremely small, the total benchmark runtimes last only a few milliseconds (or less). Under high load, virtualization, or coarse sampling conditions, PMU cycle sampling fails to capture enough samples inside the brief benchmark loops. This leads to false negatives where the script output lacks the expected syscall, kernel, or trap branch entries (e.g. "ERROR: Branches missing getppid[^ ]*/SYSCALL/"). Fix this by increasing the workload loop counts to 100,000 across all three test sections. Running 100,000 loops still finishes virtually instantaneously (less than 0.1 seconds), but generates enough iterations to guarantee robust branch stack capture. Fixes: b55878c90ab9 ("perf test: Add test for branch stack sampling") Assisted-by: Antigravity:gemini-3.1-pro Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/test_brstack.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/perf/tests/shell/test_brstack.sh b/tools/perf/tests/shell/test_brstack.sh index eb5837f82e39..be46a68daf76 100755 --- a/tools/perf/tests/shell/test_brstack.sh +++ b/tools/perf/tests/shell/test_brstack.sh @@ -112,7 +112,7 @@ test_trap_eret_branches() { start_err=$err err=0 perf record -o $TMPDIR/perf.data --branch-filter any,save_type,u,k -- \ - perf test -w traploop 1000 > "$TMPDIR/record.txt" 2>&1 + perf test -w traploop 100000 > "$TMPDIR/record.txt" 2>&1 perf script -i $TMPDIR/perf.data --fields brstacksym | \ tr ' ' '\n' > $TMPDIR/perf.script @@ -137,7 +137,7 @@ test_kernel_branches() { start_err=$err err=0 perf record -o $TMPDIR/perf.data --branch-filter any,k -- \ - perf bench syscall basic --loop 1000 > "$TMPDIR/record.txt" 2>&1 + perf bench syscall basic --loop 100000 > "$TMPDIR/record.txt" 2>&1 perf script -i $TMPDIR/perf.data --fields brstack | \ tr ' ' '\n' > $TMPDIR/perf.script @@ -209,7 +209,7 @@ test_syscall() { err=0 perf record -o $TMPDIR/perf.data --branch-filter \ any_call,save_type,u,k -c 10007 -- \ - perf bench syscall basic --loop 8000 > "$TMPDIR/record.txt" 2>&1 + perf bench syscall basic --loop 100000 > "$TMPDIR/record.txt" 2>&1 perf script -i $TMPDIR/perf.data --fields brstacksym | \ tr ' ' '\n' > $TMPDIR/perf.script -- 2.54.0.1189.g8c84645362-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v3 10/13] perf tests: Speed up off-cpu profiling tests 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers ` (8 preceding siblings ...) 2026-06-16 16:48 ` [PATCH v3 09/13] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers @ 2026-06-16 16:48 ` Ian Rogers 2026-06-16 16:48 ` [PATCH v3 11/13] perf tests: Speed up lock contention analysis shell test Ian Rogers ` (2 subsequent siblings) 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 16:48 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The off-cpu profiling test suite runs multiple recording commands with a default workload of `sleep 1` to test the off-cpu threshold configurations (specifically, above 999ms and below 1200ms). This adds a mandatory 3.0 seconds of sleep overhead. Optimize this by scaling down the thresholds and workload durations by a factor of 10: - Use `sleep 0.1` as the workload duration. - Change the above-threshold test to use `--off-cpu-thresh 50` and `sleep 0.1`. - Change the below-threshold test to use `--off-cpu-thresh 500` and `sleep 0.1`. - Update the awk period check in the above-threshold test to look for a period greater than 50,000,000 ns (50ms) instead of 999,000,000 ns (999ms). This reduces raw test sleep overhead from 3.0s down to 0.3s, yielding a ~2.7 second speedup for this test. Assisted-by: Antigravity:gemini-3.1-pro Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/record_offcpu.sh | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/tools/perf/tests/shell/record_offcpu.sh b/tools/perf/tests/shell/record_offcpu.sh index 860a2d6f4b75..8ee2112b7778 100755 --- a/tools/perf/tests/shell/record_offcpu.sh +++ b/tools/perf/tests/shell/record_offcpu.sh @@ -45,7 +45,7 @@ test_offcpu_priv() { test_offcpu_basic() { echo "Basic off-cpu test" - if ! perf record --off-cpu -e dummy -o ${perfdata} sleep 1 2> /dev/null + if ! perf record --off-cpu -e dummy -o ${perfdata} sleep 0.1 2> /dev/null then echo "Basic off-cpu test [Failed record]" err=1 @@ -98,8 +98,8 @@ test_offcpu_child() { test_offcpu_above_thresh() { echo "${test_above_thresh}" - # collect direct off-cpu samples for tasks blocked for more than 999ms - if ! perf record -e dummy --off-cpu --off-cpu-thresh 999 -o ${perfdata} -- sleep 1 2> /dev/null + # collect direct off-cpu samples for tasks blocked for more than 50ms + if ! perf record -e dummy --off-cpu --off-cpu-thresh 50 -o ${perfdata} -- sleep 0.1 2> /dev/null then echo "${test_above_thresh} [Failed record]" err=1 @@ -115,7 +115,7 @@ test_offcpu_above_thresh() { fi # there should only be one direct sample, and its period should be higher than off-cpu-thresh if ! perf script --time "0, ${dummy_timestamp}" -i ${perfdata} -F period | \ - awk '{ if (int($1) > 999000000) exit 0; else exit 1; }' + awk '{ if (int($1) > 50000000) exit 0; else exit 1; }' then echo "${test_above_thresh} [Failed off-cpu time too short]" err=1 @@ -128,8 +128,8 @@ test_offcpu_above_thresh() { test_offcpu_below_thresh() { echo "${test_below_thresh}" - # collect direct off-cpu samples for tasks blocked for more than 1.2s - if ! perf record -e dummy --off-cpu --off-cpu-thresh 1200 -o ${perfdata} -- sleep 1 2> /dev/null + # collect direct off-cpu samples for tasks blocked for more than 500ms + if ! perf record -e dummy --off-cpu --off-cpu-thresh 500 -o ${perfdata} -- sleep 0.1 2> /dev/null then echo "${test_below_thresh} [Failed record]" err=1 -- 2.54.0.1189.g8c84645362-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v3 11/13] perf tests: Speed up lock contention analysis shell test 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers ` (9 preceding siblings ...) 2026-06-16 16:48 ` [PATCH v3 10/13] perf tests: Speed up off-cpu profiling tests Ian Rogers @ 2026-06-16 16:48 ` Ian Rogers 2026-06-16 16:48 ` [PATCH v3 12/13] perf tests: Speed up metrics checking shell tests Ian Rogers 2026-06-16 16:48 ` [PATCH v3 13/13] perf tests: Include error output for skipped tests in JUnit XML Ian Rogers 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 16:48 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The lock contention analysis test suite (`lock_contention.sh`) performs a series of 13 separate profiling checks to verify various aggregation and filtering parameters of `perf lock contention`. Each of these checks runs the `perf bench sched messaging` messaging benchmark as its workload. By default, `sched messaging` runs 10 groups of 40 processes (400 processes total) generating substantial task scheduling, context switching, and IPC message passing. When traced system-wide for lock events, the tracing overhead (handling millions of lock acquisitions and releases) slows execution down significantly, causing the test suite to take over 80 seconds. Optimize this by introducing a scaled-down messaging benchmark workload: `perf bench sched messaging -g 1 -p`. Running 1 group (40 processes) takes only 0.01 seconds natively (instead of 0.08 seconds), drastically reduces the sheer volume of lock acquire/release trace events, and reduces CPU context switching during tracing while still generating sufficient lock events to fully exercise the BPF/record filters. Assisted-by: Antigravity:gemini-3.1-pro Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/lock_contention.sh | 32 ++++++++++++++--------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/tools/perf/tests/shell/lock_contention.sh b/tools/perf/tests/shell/lock_contention.sh index 52e8b9db9fbd..953483d987e9 100755 --- a/tools/perf/tests/shell/lock_contention.sh +++ b/tools/perf/tests/shell/lock_contention.sh @@ -9,6 +9,10 @@ perfdata=$(mktemp /tmp/__perf_test.perf.data.XXXXX) result=$(mktemp /tmp/__perf_test.result.XXXXX) errout=$(mktemp /tmp/__perf_test.errout.XXXXX) +# Workload to generate lock contention. +# Using 1 group (-g 1) keeps runtime low while generating sufficient lock events. +msg_workload="perf bench sched messaging -g 1 -p" + cleanup() { rm -f ${perfdata} rm -f ${result} @@ -50,7 +54,7 @@ check() { test_record() { echo "Testing perf lock record and perf lock contention" - perf lock record -o ${perfdata} -- perf bench sched messaging -p > /dev/null 2>&1 + perf lock record -o ${perfdata} -- ${msg_workload} > /dev/null 2>&1 # the output goes to the stderr and we expect only 1 output (-E 1) perf lock contention -i ${perfdata} -E 1 -q 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then @@ -70,7 +74,7 @@ test_bpf() fi # the perf lock contention output goes to the stderr - perf lock con -a -b -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)" err=1 @@ -81,7 +85,7 @@ test_bpf() test_record_concurrent() { echo "Testing perf lock record and perf lock contention at the same time" - perf lock record -o- -- perf bench sched messaging -p 2> ${errout} | \ + perf lock record -o- -- ${msg_workload} 2> ${errout} | \ perf lock contention -i- -E 1 -q 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] Recorded result count is not 1:" "$(cat "${result}" | wc -l)" @@ -107,7 +111,7 @@ test_aggr_task() fi # the perf lock contention output goes to the stderr - perf lock con -a -b -t -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -t -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)" err=1 @@ -130,7 +134,7 @@ test_aggr_addr() fi # the perf lock contention output goes to the stderr - perf lock con -a -b -l -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -l -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)" err=1 @@ -148,7 +152,7 @@ test_aggr_cgroup() fi # the perf lock contention output goes to the stderr - perf lock con -a -b --lock-cgroup -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b --lock-cgroup -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result count is not 1:" "$(cat "${result}" | wc -l)" err=1 @@ -170,7 +174,7 @@ test_type_filter() return fi - perf lock con -a -b -Y spinlock -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -Y spinlock -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(grep -c -v spinlock "${result}")" != "0" ]; then echo "[Fail] BPF result should not have non-spinlocks:" "$(cat "${result}")" err=1 @@ -202,7 +206,7 @@ test_lock_filter() return fi - perf lock con -a -b -L tasklist_lock -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -L tasklist_lock -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(grep -c -v "${test_lock_filter_type}" "${result}")" != "0" ]; then echo "[Fail] BPF result should not have non-${test_lock_filter_type} locks:" "$(cat "${result}")" err=1 @@ -241,7 +245,7 @@ test_stack_filter() return fi - perf lock con -a -b -S unix_stream -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -S unix_stream -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result should have a lock from unix_stream:" "$(cat "${result}")" err=1 @@ -269,7 +273,7 @@ test_aggr_task_stack_filter() return fi - perf lock con -a -b -t -S unix_stream -E 1 -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b -t -S unix_stream -E 1 -q -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result should have a task from unix_stream:" "$(cat "${result}")" err=1 @@ -285,7 +289,8 @@ test_cgroup_filter() return fi - perf lock con -a -b --lock-cgroup -E 1 -F wait_total -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b --lock-cgroup -E 1 -F wait_total -q \ + -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result should have a cgroup result:" "$(cat "${result}")" err=1 @@ -293,7 +298,8 @@ test_cgroup_filter() fi cgroup=$(cat "${result}" | awk '{ print $3 }') - perf lock con -a -b --lock-cgroup -E 1 -G "${cgroup}" -q -- perf bench sched messaging -p > /dev/null 2> ${result} + perf lock con -a -b --lock-cgroup -E 1 -G "${cgroup}" -q \ + -- ${msg_workload} > /dev/null 2> ${result} if [ "$(cat "${result}" | wc -l)" != "1" ]; then echo "[Fail] BPF result should have a result with cgroup filter:" "$(cat "${cgroup}")" err=1 @@ -328,7 +334,7 @@ test_csv_output() fi # the perf lock contention output goes to the stderr - perf lock con -a -b -E 1 -x , --output ${result} -- perf bench sched messaging -p > /dev/null 2>&1 + perf lock con -a -b -E 1 -x , --output ${result} -- ${msg_workload} > /dev/null 2>&1 output=$(grep -v "^#" ${result} | tr -d -c , | wc -c) if [ "${header}" != "${output}" ]; then echo "[Fail] BPF result does not match the number of commas: ${header} != ${output}" -- 2.54.0.1189.g8c84645362-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v3 12/13] perf tests: Speed up metrics checking shell tests 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers ` (10 preceding siblings ...) 2026-06-16 16:48 ` [PATCH v3 11/13] perf tests: Speed up lock contention analysis shell test Ian Rogers @ 2026-06-16 16:48 ` Ian Rogers 2026-06-16 16:48 ` [PATCH v3 13/13] perf tests: Include error output for skipped tests in JUnit XML Ian Rogers 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 16:48 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht Optimize the execution of the metric validation and metric listing shell test suites: 1. `stat_metrics_values.sh`: The Python metric validator runs the `perf bench futex hash` workload for each validated metric relationship. Reduce the benchmark runtime limit from `-r 2` (2 seconds) to `-r 1` (1 second). This cuts the workload duration in half while still generating sufficient PMU events to satisfy non-zero threshold metric validations. 2. `stat_all_metrics.sh`: The metric checking test runs `perf stat` sequentially across all 433+ listed metrics. Change the default workload for system-wide runs from `sleep 0.01` to `true`. This avoids the 10ms sleep delay on each sequential metric invocation, saving over 4 seconds of total wall time during full test suite runs. Assisted-by: Antigravity:gemini-3.1-pro Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/shell/stat_all_metrics.sh | 2 +- tools/perf/tests/shell/stat_metrics_values.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/tests/shell/stat_all_metrics.sh b/tools/perf/tests/shell/stat_all_metrics.sh index feeb34c6fa6d..df5fa27b21ae 100755 --- a/tools/perf/tests/shell/stat_all_metrics.sh +++ b/tools/perf/tests/shell/stat_all_metrics.sh @@ -7,7 +7,7 @@ ParanoidAndNotRoot() [ "$(id -u)" != 0 ] && [ "$(cat /proc/sys/kernel/perf_event_paranoid)" -gt $1 ] } -test_prog="sleep 0.01" +test_prog="true" system_wide_flag="-a" if ParanoidAndNotRoot 0 then diff --git a/tools/perf/tests/shell/stat_metrics_values.sh b/tools/perf/tests/shell/stat_metrics_values.sh index 76f1e99d1273..86c5c7f70933 100755 --- a/tools/perf/tests/shell/stat_metrics_values.sh +++ b/tools/perf/tests/shell/stat_metrics_values.sh @@ -18,7 +18,7 @@ fi pythonvalidator=$(dirname $0)/lib/perf_metric_validation.py rulefile=$(dirname $0)/lib/perf_metric_validation_rules.json tmpdir=$(mktemp -d /tmp/__perf_test.program.XXXXX) -workload="perf bench futex hash -r 2 -s" +workload="perf bench futex hash -r 1 -s" # Add -debug, save data file and full rule file echo "Launch python validation script $pythonvalidator" -- 2.54.0.1189.g8c84645362-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH v3 13/13] perf tests: Include error output for skipped tests in JUnit XML 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers ` (11 preceding siblings ...) 2026-06-16 16:48 ` [PATCH v3 12/13] perf tests: Speed up metrics checking shell tests Ian Rogers @ 2026-06-16 16:48 ` Ian Rogers 12 siblings, 0 replies; 43+ messages in thread From: Ian Rogers @ 2026-06-16 16:48 UTC (permalink / raw) To: irogers, acme, namhyung Cc: adrian.hunter, james.clark, jolsa, leo.yan, linux-kernel, linux-perf-users, mingo, peterz, thomas.falcon, tmricht The JUnit XML output correctly captures the stderr/stdout output of failed tests inside the <failure> element. However, for skipped tests, the output was completely discarded and the XML only received a self-closing <skipped message="reason"/> tag. This expands the <skipped> element to include the test's err_output when available, which is extremely helpful for debugging why a test was skipped (e.g. diagnosing missing prerequisites or unexpected environment states that triggered the skip) directly from CI systems parsing the XML report. Assisted-by: Antigravity:gemini-3.1-pro Signed-off-by: Ian Rogers <irogers@google.com> --- tools/perf/tests/builtin-test.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/tools/perf/tests/builtin-test.c b/tools/perf/tests/builtin-test.c index 1dcaeb8505b7..02a80efcdd66 100644 --- a/tools/perf/tests/builtin-test.c +++ b/tools/perf/tests/builtin-test.c @@ -535,8 +535,14 @@ static int print_test_result(struct test_suite *t, int curr_suite, int curr_test const char *reason = skip_reason(t, curr_test_case); char *escaped_reason = xml_escape(reason ? reason : "Skip"); - strbuf_addf(&junit_xml_buf, " <skipped message=\"%s\"/>\n", - escaped_reason); + if (err_output && *err_output) { + strbuf_addf(&junit_xml_buf, + " <skipped message=\"%s\">\n%s\n </skipped>\n", + escaped_reason, escaped_err); + } else { + strbuf_addf(&junit_xml_buf, " <skipped message=\"%s\"/>\n", + escaped_reason); + } free(escaped_reason); } strbuf_addstr(&junit_xml_buf, " </testcase>\n"); -- 2.54.0.1189.g8c84645362-goog ^ permalink raw reply related [flat|nested] 43+ messages in thread
end of thread, other threads:[~2026-06-16 16:48 UTC | newest] Thread overview: 43+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-16 1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers 2026-06-16 1:27 ` [PATCH v1 01/12] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers 2026-06-16 1:27 ` [PATCH v1 02/12] perf test: Truncate test description to fit terminal width Ian Rogers 2026-06-16 1:27 ` [PATCH v1 03/12] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers 2026-06-16 1:27 ` [PATCH v1 04/12] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers 2026-06-16 1:27 ` [PATCH v1 05/12] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers 2026-06-16 1:27 ` [PATCH v1 06/12] perf tests: Fix Python JIT dump profiling test failure Ian Rogers 2026-06-16 1:27 ` [PATCH v1 07/12] perf tests: Fix flakiness in trace record and replay test Ian Rogers 2026-06-16 1:27 ` [PATCH v1 08/12] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers 2026-06-16 1:27 ` [PATCH v1 09/12] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers 2026-06-16 1:27 ` [PATCH v1 10/12] perf tests: Speed up off-cpu profiling tests Ian Rogers 2026-06-16 1:27 ` [PATCH v1 11/12] perf tests: Speed up lock contention analysis shell test Ian Rogers 2026-06-16 1:27 ` [PATCH v1 12/12] perf tests: Speed up metrics checking shell tests Ian Rogers 2026-06-16 6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers 2026-06-16 6:13 ` [PATCH v2 01/12] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers 2026-06-16 15:14 ` Arnaldo Carvalho de Melo 2026-06-16 15:17 ` Arnaldo Carvalho de Melo 2026-06-16 6:13 ` [PATCH v2 02/12] perf test: Truncate test description to fit terminal width Ian Rogers 2026-06-16 15:25 ` Arnaldo Carvalho de Melo 2026-06-16 6:13 ` [PATCH v2 03/12] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers 2026-06-16 6:13 ` [PATCH v2 04/12] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers 2026-06-16 6:13 ` [PATCH v2 05/12] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers 2026-06-16 6:13 ` [PATCH v2 06/12] perf tests: Fix Python JIT dump profiling test failure Ian Rogers 2026-06-16 6:13 ` [PATCH v2 07/12] perf tests: Fix flakiness in trace record and replay test Ian Rogers 2026-06-16 6:14 ` [PATCH v2 08/12] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers 2026-06-16 6:14 ` [PATCH v2 09/12] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers 2026-06-16 6:14 ` [PATCH v2 10/12] perf tests: Speed up off-cpu profiling tests Ian Rogers 2026-06-16 6:14 ` [PATCH v2 11/12] perf tests: Speed up lock contention analysis shell test Ian Rogers 2026-06-16 6:14 ` [PATCH v2 12/12] perf tests: Speed up metrics checking shell tests Ian Rogers 2026-06-16 16:48 ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers 2026-06-16 16:48 ` [PATCH v3 01/13] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers 2026-06-16 16:48 ` [PATCH v3 02/13] perf test: Truncate test description to fit terminal width Ian Rogers 2026-06-16 16:48 ` [PATCH v3 03/13] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers 2026-06-16 16:48 ` [PATCH v3 04/13] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers 2026-06-16 16:48 ` [PATCH v3 05/13] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers 2026-06-16 16:48 ` [PATCH v3 06/13] perf tests: Fix Python JIT dump profiling test failure Ian Rogers 2026-06-16 16:48 ` [PATCH v3 07/13] perf tests: Fix flakiness in trace record and replay test Ian Rogers 2026-06-16 16:48 ` [PATCH v3 08/13] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers 2026-06-16 16:48 ` [PATCH v3 09/13] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers 2026-06-16 16:48 ` [PATCH v3 10/13] perf tests: Speed up off-cpu profiling tests Ian Rogers 2026-06-16 16:48 ` [PATCH v3 11/13] perf tests: Speed up lock contention analysis shell test Ian Rogers 2026-06-16 16:48 ` [PATCH v3 12/13] perf tests: Speed up metrics checking shell tests Ian Rogers 2026-06-16 16:48 ` [PATCH v3 13/13] perf tests: Include error output for skipped tests in JUnit XML Ian Rogers
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox