Re: [PATCH v4 04/13] perf tests: Add robust record retry helper and use subsecond workloads

All of lore.kernel.org
 help / color / mirror / Atom feed

From: sashiko-bot@kernel.org
To: "Ian Rogers" <irogers@google.com>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: [PATCH v4 04/13] perf tests: Add robust record retry helper and use subsecond workloads
Date: Tue, 23 Jun 2026 21:28:59 +0000	[thread overview]
Message-ID: <20260623212859.A63D31F000E9@smtp.kernel.org> (raw)
In-Reply-To: <20260623012758.2291858-5-irogers@google.com>

Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
- [Medium] Commit message inaccuracy and incomplete refactoring regarding `pipe_test.sh` and `stat_all_pfm.sh`.
- [Medium] Synchronous command execution in `test_kvm_record_report` lacks error checking, causing fatal script aborts under `set -e`.
- [Medium] `perf_record_with_retry` permanently leaks and disables the global `set -e` (errexit) flag when invoked within an OR list.
--

commit d89e4e460291f1a110feac7d87fda40226859958
Author: Ian Rogers <irogers@google.com>

perf tests: Add robust record retry helper and use subsecond workloads

Introduce perf_record_with_retry and perf_record_cleanup in a shared
library tests/shell/lib/perf_record.sh to prevent record test failures
caused by transient recording or workload delays.

> Update record.sh, record_lbr.sh, pipe_test.sh, kvm.sh, and
> stat_all_pfm.sh to use this robust record retry logic. These tests now
> start with very short durations (e.g. 0.01 seconds) and scale up if the
> initial recording failed to capture samples, significantly improving test
> execution speed on success while remaining resilient to slow systems.

[Severity: Medium]
This isn't a bug, but the commit message states that stat_all_pfm.sh is updated 
to use the robust retry logic, yet stat_all_pfm.sh isn't modified in this patch.

> diff --git a/tools/perf/tests/shell/kvm.sh b/tools/perf/tests/shell/kvm.sh
> index f88e859025c42..a5396f8e6fe5c 100755
> --- a/tools/perf/tests/shell/kvm.sh
> +++ b/tools/perf/tests/shell/kvm.sh
[ ... ]
> @@ -60,22 +71,26 @@ test_kvm_stat() {
>  test_kvm_record_report() {
>  	echo "Testing perf kvm record/report"
>  
> -	echo "Recording kvm profile for pid ${qemu_pid}..."
[ ... ]
> +	local duration
> +	local success=false
> +	for duration in 1 2 4 8; do
> +		echo "Recording kvm profile for pid ${qemu_pid} (duration ${duration}s)..."
> +		rm -f "${perfdata}" "${perfdata}".old
> +
> +		perf kvm --host record -p "${qemu_pid}" -o "${perfdata}" \
> +			-e cpu-clock sleep ${duration}

[Severity: Medium]
Does this unguarded execution bypass the retry loop if it fails?

Since the kvm.sh script operates with `set -e` enabled globally, if 
`perf kvm record` returns a non-zero exit code here, the script will immediately 
abort entirely.

Should this be guarded, perhaps by appending `|| true` to prevent the fatal 
abort so the retry loop can continue?

> +
> +		if [ -e "${perfdata}" ] && \
> +			perf kvm -i "${perfdata}" report --stdio 2>&1 | grep -q "Event count"; then
> +			success=true
> +			break
> +		fi
> +		echo "No samples or report failed, retrying..."
> +	done

> diff --git a/tools/perf/tests/shell/lib/perf_record.sh b/tools/perf/tests/shell/lib/perf_record.sh
> new file mode 100644
> index 0000000000000..e137fa75370de
> --- /dev/null
> +++ b/tools/perf/tests/shell/lib/perf_record.sh
> @@ -0,0 +1,53 @@
> +# SPDX-License-Identifier: GPL-2.0
> +
> +PERF_RECORD_LOGS=()
> +
> +perf_record_with_retry() {
[ ... ]
> +  # Save the e flag state and disable it
> +  local save_e
> +  if [[ $- == *e* ]]; then
> +    save_e="set -e"
> +  else
> +    save_e="set +e"
> +  fi

[Severity: Medium]
Does this logic permanently disable the `set -e` flag in the caller when 
invoked within an OR list?

In bash, executing a function on the left side of an `||` operator temporarily 
suspends `errexit` and explicitly removes the `e` flag from `$-`.

For example, when `record.sh` invokes this function:
    perf_record_with_retry ... || ret=$?

`[[ $- == *e* ]]` will evaluate to false, causing the function to save the state 
as `set +e`. When the function finishes, it evaluates this saved state, 
permanently disabling `set -e` globally for the remainder of the calling script 
and silently masking all subsequent command failures.

> +  set +e
> +
> +  local duration
> +  local first_run=true
> +  local ret=1

> diff --git a/tools/perf/tests/shell/pipe_test.sh b/tools/perf/tests/shell/pipe_test.sh
> index e459aa99a9515..ce68d850c9838 100755
> --- a/tools/perf/tests/shell/pipe_test.sh
> +++ b/tools/perf/tests/shell/pipe_test.sh
> @@ -12,8 +12,8 @@ skip_test_missing_symbol ${sym}
>  
>  data=$(mktemp /tmp/perf.data.XXXXXX)
>  data2=$(mktemp /tmp/perf.data2.XXXXXX)
> -prog="perf test -w noploop"
> -[ "$(uname -m)" = "s390x" ] && prog="$prog 3"
> +prog="perf test -w noploop 0.1"
> +[ "$(uname -m)" = "s390x" ] && prog="perf test -w noploop 3"

[Severity: Medium]
Will this hardcoded shorter duration make the test more brittle?

The commit message indicates this script was updated to use the robust retry 
logic, but it appears to only use a hardcoded 0.1s duration here. Relying on a 
single 0.1s run without retries might lead to transient failures on slow systems.

>  err=0
>  
>  set -e

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260623012758.2291858-1-irogers@google.com?part=4

next prev parent reply	other threads:[~2026-06-23 21:28 UTC|newest]

Thread overview: 85+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-16  1:27 [PATCH v1 00/12] perf tests: Enhancements, speedups, and flakiness fixes Ian Rogers
2026-06-16  1:27 ` [PATCH v1 01/12] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers
2026-06-16  1:44   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 02/12] perf test: Truncate test description to fit terminal width Ian Rogers
2026-06-16  1:38   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 03/12] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers
2026-06-16  1:35   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 04/12] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers
2026-06-16  1:38   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 05/12] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers
2026-06-16  1:41   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 06/12] perf tests: Fix Python JIT dump profiling test failure Ian Rogers
2026-06-16  1:39   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 07/12] perf tests: Fix flakiness in trace record and replay test Ian Rogers
2026-06-16  1:42   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 08/12] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers
2026-06-16  1:35   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 09/12] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers
2026-06-16  1:27 ` [PATCH v1 10/12] perf tests: Speed up off-cpu profiling tests Ian Rogers
2026-06-16  1:41   ` sashiko-bot
2026-06-16  1:27 ` [PATCH v1 11/12] perf tests: Speed up lock contention analysis shell test Ian Rogers
2026-06-16  1:27 ` [PATCH v1 12/12] perf tests: Speed up metrics checking shell tests Ian Rogers
2026-06-16  6:13 ` [PATCH v2 00/12] perf tests: Enhance robustness, speed up execution, and fix flakiness Ian Rogers
2026-06-16  6:13   ` [PATCH v2 01/12] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers
2026-06-16  6:31     ` sashiko-bot
2026-06-16 15:14     ` Arnaldo Carvalho de Melo
2026-06-16 15:17     ` Arnaldo Carvalho de Melo
2026-06-16  6:13   ` [PATCH v2 02/12] perf test: Truncate test description to fit terminal width Ian Rogers
2026-06-16  6:24     ` sashiko-bot
2026-06-16 15:25     ` Arnaldo Carvalho de Melo
2026-06-16  6:13   ` [PATCH v2 03/12] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers
2026-06-16  6:22     ` sashiko-bot
2026-06-16  6:13   ` [PATCH v2 04/12] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers
2026-06-16  6:27     ` sashiko-bot
2026-06-16  6:13   ` [PATCH v2 05/12] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers
2026-06-16  6:13   ` [PATCH v2 06/12] perf tests: Fix Python JIT dump profiling test failure Ian Rogers
2026-06-16  6:27     ` sashiko-bot
2026-06-16  6:13   ` [PATCH v2 07/12] perf tests: Fix flakiness in trace record and replay test Ian Rogers
2026-06-16  6:27     ` sashiko-bot
2026-06-16  6:14   ` [PATCH v2 08/12] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers
2026-06-16  6:14   ` [PATCH v2 09/12] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers
2026-06-16  6:14   ` [PATCH v2 10/12] perf tests: Speed up off-cpu profiling tests Ian Rogers
2026-06-16  6:25     ` sashiko-bot
2026-06-16  6:14   ` [PATCH v2 11/12] perf tests: Speed up lock contention analysis shell test Ian Rogers
2026-06-16  6:14   ` [PATCH v2 12/12] perf tests: Speed up metrics checking shell tests Ian Rogers
2026-06-16 16:48   ` [PATCH v3 00/13] perf tests: Robustness and performance improvements Ian Rogers
2026-06-16 16:48     ` [PATCH v3 01/13] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers
2026-06-16 16:48     ` [PATCH v3 02/13] perf test: Truncate test description to fit terminal width Ian Rogers
2026-06-17 22:33       ` Namhyung Kim
2026-06-23  0:51         ` Ian Rogers
2026-06-16 16:48     ` [PATCH v3 03/13] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers
2026-06-16 16:48     ` [PATCH v3 04/13] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers
2026-06-17 22:37       ` Namhyung Kim
2026-06-18 13:24         ` Arnaldo Carvalho de Melo
2026-06-22 23:59           ` Ian Rogers
2026-06-23  6:06             ` Namhyung Kim
2026-06-16 16:48     ` [PATCH v3 05/13] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers
2026-06-16 16:48     ` [PATCH v3 06/13] perf tests: Fix Python JIT dump profiling test failure Ian Rogers
2026-06-16 16:48     ` [PATCH v3 07/13] perf tests: Fix flakiness in trace record and replay test Ian Rogers
2026-06-16 16:48     ` [PATCH v3 08/13] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers
2026-06-16 16:48     ` [PATCH v3 09/13] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers
2026-06-16 16:48     ` [PATCH v3 10/13] perf tests: Speed up off-cpu profiling tests Ian Rogers
2026-06-16 16:48     ` [PATCH v3 11/13] perf tests: Speed up lock contention analysis shell test Ian Rogers
2026-06-16 16:48     ` [PATCH v3 12/13] perf tests: Speed up metrics checking shell tests Ian Rogers
2026-06-16 16:48     ` [PATCH v3 13/13] perf tests: Include error output for skipped tests in JUnit XML Ian Rogers
2026-06-23  1:27     ` [PATCH v4 00/13] perf tests: Robustness and performance improvements Ian Rogers
2026-06-23  1:27       ` [PATCH v4 01/13] perf parse-events: Restrict core PMU bypass to --cputype option Ian Rogers
2026-06-23  1:27       ` [PATCH v4 02/13] perf test: Truncate test description to fit terminal width Ian Rogers
2026-06-23 21:12         ` sashiko-bot
2026-06-23  1:27       ` [PATCH v4 03/13] perf tests workloads: Support sub-second durations in noploop and thloop Ian Rogers
2026-06-23 21:20         ` sashiko-bot
2026-06-23  1:27       ` [PATCH v4 04/13] perf tests: Add robust record retry helper and use subsecond workloads Ian Rogers
2026-06-23 21:28         ` sashiko-bot [this message]
2026-06-23  1:27       ` [PATCH v4 05/13] perf tests: Skip metrics validation if system-wide recording lacks permission Ian Rogers
2026-06-23  1:27       ` [PATCH v4 06/13] perf tests: Fix Python JIT dump profiling test failure Ian Rogers
2026-06-23  1:27       ` [PATCH v4 07/13] perf tests: Fix flakiness in trace record and replay test Ian Rogers
2026-06-23  1:27       ` [PATCH v4 08/13] perf tests: Fix flakiness in BPF counters test on hybrid systems Ian Rogers
2026-06-23  1:27       ` [PATCH v4 09/13] perf tests: Fix flakiness in branch stack sampling tests Ian Rogers
2026-06-23 22:00         ` sashiko-bot
2026-06-23  1:27       ` [PATCH v4 10/13] perf tests: Speed up off-cpu profiling tests Ian Rogers
2026-06-23 22:08         ` sashiko-bot
2026-06-23  1:27       ` [PATCH v4 11/13] perf tests: Speed up lock contention analysis shell test Ian Rogers
2026-06-23 22:13         ` sashiko-bot
2026-06-23  1:27       ` [PATCH v4 12/13] perf tests: Speed up metrics checking shell tests Ian Rogers
2026-06-23  1:27       ` [PATCH v4 13/13] perf tests: Include error output for skipped tests in JUnit XML Ian Rogers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260623212859.A63D31F000E9@smtp.kernel.org \
    --to=sashiko-bot@kernel.org \
    --cc=irogers@google.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sashiko-reviews@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.