Re: [PATCH v4 04/13] perf tests: Add robust record retry helper and use subsecond workloads

Linux Perf Users
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Ian Rogers" <irogers@google.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v4 04/13] perf tests: Add robust record retry helper and use subsecond workloads
Date: Tue, 23 Jun 2026 21:28:59 +0000	[thread overview]
Message-ID: <20260623212859.A63D31F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260623012758.2291858-5-irogers@google.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Medium] Commit message inaccuracy and incomplete refactoring regarding `pipe_test.sh` and `stat_all_pfm.sh`.
- [Medium] Synchronous command execution in `test_kvm_record_report` lacks error checking, causing fatal script aborts under `set -e`.
- [Medium] `perf_record_with_retry` permanently leaks and disables the global `set -e` (errexit) flag when invoked within an OR list.
--

commit d89e4e460291f1a110feac7d87fda40226859958
Author: Ian Rogers <irogers@google.com>

perf tests: Add robust record retry helper and use subsecond workloads

Introduce perf_record_with_retry and perf_record_cleanup in a shared
library tests/shell/lib/perf_record.sh to prevent record test failures
caused by transient recording or workload delays.

> Update record.sh, record_lbr.sh, pipe_test.sh, kvm.sh, and
> stat_all_pfm.sh to use this robust record retry logic. These tests now
> start with very short durations (e.g. 0.01 seconds) and scale up if the
> initial recording failed to capture samples, significantly improving test
> execution speed on success while remaining resilient to slow systems.

[Severity: Medium]
This isn't a bug, but the commit message states that stat_all_pfm.sh is updated 
to use the robust retry logic, yet stat_all_pfm.sh isn't modified in this patch.

> diff --git a/tools/perf/tests/shell/kvm.sh b/tools/perf/tests/shell/kvm.sh
> index f88e859025c42..a5396f8e6fe5c 100755
> --- a/tools/perf/tests/shell/kvm.sh
> +++ b/tools/perf/tests/shell/kvm.sh
[ ... ]
> @@ -60,22 +71,26 @@ test_kvm_stat() {
>  test_kvm_record_report() {
>  	echo "Testing perf kvm record/report"
>  
> -	echo "Recording kvm profile for pid ${qemu_pid}..."
[ ... ]
> +	local duration
> +	local success=false
> +	for duration in 1 2 4 8; do
> +		echo "Recording kvm profile for pid ${qemu_pid} (duration ${duration}s)..."
> +		rm -f "${perfdata}" "${perfdata}".old
> +
> +		perf kvm --host record -p "${qemu_pid}" -o "${perfdata}" \
> +			-e cpu-clock sleep ${duration}

[Severity: Medium]
Does this unguarded execution bypass the retry loop if it fails?

Since the kvm.sh script operates with `set -e` enabled globally, if 
`perf kvm record` returns a non-zero exit code here, the script will immediately 
abort entirely.

Should this be guarded, perhaps by appending `|| true` to prevent the fatal 
abort so the retry loop can continue?

> +
> +		if [ -e "${perfdata}" ] && \
> +			perf kvm -i "${perfdata}" report --stdio 2>&1 | grep -q "Event count"; then
> +			success=true
> +			break
> +		fi
> +		echo "No samples or report failed, retrying..."
> +	done

> diff --git a/tools/perf/tests/shell/lib/perf_record.sh b/tools/perf/tests/shell/lib/perf_record.sh
> new file mode 100644
> index 0000000000000..e137fa75370de
> --- /dev/null
> +++ b/tools/perf/tests/shell/lib/perf_record.sh
> @@ -0,0 +1,53 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +PERF_RECORD_LOGS=()
> +
> +perf_record_with_retry() {
[ ... ]
> +  # Save the e flag state and disable it
> +  local save_e
> +  if [[ $- == *e* ]]; then
> +    save_e="set -e"
> +  else
> +    save_e="set +e"
> +  fi

[Severity: Medium]
Does this logic permanently disable the `set -e` flag in the caller when 
invoked within an OR list?

In bash, executing a function on the left side of an `||` operator temporarily 
suspends `errexit` and explicitly removes the `e` flag from `$-`.

For example, when `record.sh` invokes this function:
    perf_record_with_retry ... || ret=$?

`[[ $- == *e* ]]` will evaluate to false, causing the function to save the state 
as `set +e`. When the function finishes, it evaluates this saved state, 
permanently disabling `set -e` globally for the remainder of the calling script 
and silently masking all subsequent command failures.

> +  set +e
> +
> +  local duration
> +  local first_run=true
> +  local ret=1

> diff --git a/tools/perf/tests/shell/pipe_test.sh b/tools/perf/tests/shell/pipe_test.sh
> index e459aa99a9515..ce68d850c9838 100755
> --- a/tools/perf/tests/shell/pipe_test.sh
> +++ b/tools/perf/tests/shell/pipe_test.sh
> @@ -12,8 +12,8 @@ skip_test_missing_symbol ${sym}
>  
>  data=$(mktemp /tmp/perf.data.XXXXXX)
>  data2=$(mktemp /tmp/perf.data2.XXXXXX)
> -prog="perf test -w noploop"
> -[ "$(uname -m)" = "s390x" ] && prog="$prog 3"
> +prog="perf test -w noploop 0.1"
> +[ "$(uname -m)" = "s390x" ] && prog="perf test -w noploop 3"

[Severity: Medium]
Will this hardcoded shorter duration make the test more brittle?

The commit message indicates this script was updated to use the robust retry 
logic, but it appears to only use a hardcoded 0.1s duration here. Relying on a 
single 0.1s run without retries might lead to transient failures on slow systems.

>  err=0
>  
>  set -e

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623012758.2291858-1-irogers@google.com?part=4

next prev parent reply	other threads:[~2026-06-23 21:28 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-16  1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers
2026-06-16  1:27 ` [PATCH v1 01/12] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers
2026-06-16  1:44   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 02/12] perf test: Truncate test description to fit terminal width Ian Rogers
2026-06-16  1:38   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 03/12] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers
2026-06-16  1:35   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 04/12] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers
2026-06-16  1:38   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 05/12] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers
2026-06-16  1:41   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 06/12] perf tests: Fix Python JIT dump profiling test failure Ian Rogers
2026-06-16  1:39   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 07/12] perf tests: Fix flakiness in trace record and replay test Ian Rogers
2026-06-16  1:42   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 08/12] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers
2026-06-16  1:35   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 09/12] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers
2026-06-16  1:27 ` [PATCH v1 10/12] perf tests: Speed up off-cpu profiling tests Ian Rogers
2026-06-16  1:41   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 11/12] perf tests: Speed up lock contention analysis shell test Ian Rogers
2026-06-16  1:27 ` [PATCH v1 12/12] perf tests: Speed up metrics checking shell tests Ian Rogers
2026-06-16  6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers
2026-06-16  6:13   ` [PATCH v2 01/12] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers
2026-06-16  6:31     ` sashiko-bot
2026-06-16 15:14     ` Arnaldo Carvalho de Melo
2026-06-16 15:17     ` Arnaldo Carvalho de Melo
2026-06-16  6:13   ` [PATCH v2 02/12] perf test: Truncate test description to fit terminal width Ian Rogers
2026-06-16  6:24     ` sashiko-bot
2026-06-16 15:25     ` Arnaldo Carvalho de Melo
2026-06-16  6:13   ` [PATCH v2 03/12] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers
2026-06-16  6:22     ` sashiko-bot
2026-06-16  6:13   ` [PATCH v2 04/12] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers
2026-06-16  6:27     ` sashiko-bot
2026-06-16  6:13   ` [PATCH v2 05/12] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers
2026-06-16  6:13   ` [PATCH v2 06/12] perf tests: Fix Python JIT dump profiling test failure Ian Rogers
2026-06-16  6:27     ` sashiko-bot
2026-06-16  6:13   ` [PATCH v2 07/12] perf tests: Fix flakiness in trace record and replay test Ian Rogers
2026-06-16  6:27     ` sashiko-bot
2026-06-16  6:14   ` [PATCH v2 08/12] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers
2026-06-16  6:14   ` [PATCH v2 09/12] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers
2026-06-16  6:14   ` [PATCH v2 10/12] perf tests: Speed up off-cpu profiling tests Ian Rogers
2026-06-16  6:25     ` sashiko-bot
2026-06-16  6:14   ` [PATCH v2 11/12] perf tests: Speed up lock contention analysis shell test Ian Rogers
2026-06-16  6:14   ` [PATCH v2 12/12] perf tests: Speed up metrics checking shell tests Ian Rogers
2026-06-16 16:48   ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers
2026-06-16 16:48     ` [PATCH v3 01/13] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers
2026-06-16 16:48     ` [PATCH v3 02/13] perf test: Truncate test description to fit terminal width Ian Rogers
2026-06-17 22:33       ` Namhyung Kim
2026-06-23  0:51         ` Ian Rogers
2026-06-16 16:48     ` [PATCH v3 03/13] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers
2026-06-16 16:48     ` [PATCH v3 04/13] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers
2026-06-17 22:37       ` Namhyung Kim
2026-06-18 13:24         ` Arnaldo Carvalho de Melo
2026-06-22 23:59           ` Ian Rogers
2026-06-23  6:06             ` Namhyung Kim
2026-06-16 16:48     ` [PATCH v3 05/13] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers
2026-06-16 16:48     ` [PATCH v3 06/13] perf tests: Fix Python JIT dump profiling test failure Ian Rogers
2026-06-16 16:48     ` [PATCH v3 07/13] perf tests: Fix flakiness in trace record and replay test Ian Rogers
2026-06-16 16:48     ` [PATCH v3 08/13] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers
2026-06-16 16:48     ` [PATCH v3 09/13] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers
2026-06-16 16:48     ` [PATCH v3 10/13] perf tests: Speed up off-cpu profiling tests Ian Rogers
2026-06-16 16:48     ` [PATCH v3 11/13] perf tests: Speed up lock contention analysis shell test Ian Rogers
2026-06-16 16:48     ` [PATCH v3 12/13] perf tests: Speed up metrics checking shell tests Ian Rogers
2026-06-16 16:48     ` [PATCH v3 13/13] perf tests: Include error output for skipped tests in JUnit XML Ian Rogers
2026-06-23  1:27     ` [PATCH v4 00/13] perf tests: Robustness and performance improvements Ian Rogers
2026-06-23  1:27       ` [PATCH v4 01/13] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers
2026-06-23  1:27       ` [PATCH v4 02/13] perf test: Truncate test description to fit terminal width Ian Rogers
2026-06-23 21:12         ` sashiko-bot
2026-06-23  1:27       ` [PATCH v4 03/13] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers
2026-06-23 21:20         ` sashiko-bot
2026-06-23  1:27       ` [PATCH v4 04/13] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers
2026-06-23 21:28         ` sashiko-bot [this message]
2026-06-23  1:27       ` [PATCH v4 05/13] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers
2026-06-23  1:27       ` [PATCH v4 06/13] perf tests: Fix Python JIT dump profiling test failure Ian Rogers
2026-06-23  1:27       ` [PATCH v4 07/13] perf tests: Fix flakiness in trace record and replay test Ian Rogers
2026-06-23  1:27       ` [PATCH v4 08/13] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers
2026-06-23  1:27       ` [PATCH v4 09/13] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers
2026-06-23 22:00         ` sashiko-bot
2026-06-23  1:27       ` [PATCH v4 10/13] perf tests: Speed up off-cpu profiling tests Ian Rogers
2026-06-23 22:08         ` sashiko-bot
2026-06-23  1:27       ` [PATCH v4 11/13] perf tests: Speed up lock contention analysis shell test Ian Rogers
2026-06-23 22:13         ` sashiko-bot
2026-06-23  1:27       ` [PATCH v4 12/13] perf tests: Speed up metrics checking shell tests Ian Rogers
2026-06-23  1:27       ` [PATCH v4 13/13] perf tests: Include error output for skipped tests in JUnit XML Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623212859.A63D31F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=irogers@google.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox