Re: [PATCH v3 5/5] check: add -L <n> parameter to rerun failed tests

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Disseldorp <ddiss@suse.de>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: fstests@vger.kernel.org, tytso@mit.edu
Subject: Re: [PATCH v3 5/5] check: add -L <n> parameter to rerun failed tests
Date: Wed, 6 Jul 2022 23:54:52 +0200	[thread overview]
Message-ID: <20220706235452.694341f0@suse.de> (raw)
In-Reply-To: <YsXbt85xBNJIOwZu@magnolia>

Thanks for the follow-up feedback, Darrick...

On Wed, 6 Jul 2022 12:00:07 -0700, Darrick J. Wong wrote:

> On Wed, Jul 06, 2022 at 01:23:12PM +0200, David Disseldorp wrote:
> > If check is run with -L <n>, then a failed test will be rerun <n> times
> > before proceeding to the next test. Following completion of the rerun
> > loop, aggregate pass/fail statistics are printed.
> > 
> > Rerun tests will be tracked as a single failure in overall pass/fail
> > metrics (via @try and @bad), with .out.bad, .dmesg and .full saved using
> > a .rerun# suffix.
> > 
> > Suggested-by: Theodore Ts'o <tytso@mit.edu>
> > Link: https://lwn.net/Articles/897061/
> > Signed-off-by: David Disseldorp <ddiss@suse.de>
> > ---
> >  check | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 53 insertions(+), 3 deletions(-)
> > 
> > diff --git a/check b/check
> > index 6dbdb2a8..46fca6e6 100755
> > --- a/check
> > +++ b/check
> > @@ -26,6 +26,7 @@ do_report=false
> >  DUMP_OUTPUT=false
> >  iterations=1
> >  istop=false
> > +loop_on_fail=0
> >  
> >  # This is a global variable used to pass test failure text to reporting gunk
> >  _err_msg=""
> > @@ -78,6 +79,7 @@ check options
> >      --large-fs		optimise scratch device for large filesystems
> >      -s section		run only specified section from config file
> >      -S section		exclude the specified section from the config file
> > +    -L <n>		loop tests <n> times following a failure, measuring aggregate pass/fail metrics
> >  
> >  testlist options
> >      -g group[,group...]	include tests from these groups
> > @@ -336,6 +338,9 @@ while [ $# -gt 0 ]; do
> >  		;;
> >  	--large-fs) export LARGE_SCRATCH_DEV=yes ;;
> >  	--extra-space=*) export SCRATCH_DEV_EMPTY_SPACE=${r#*=} ;;
> > +	-L)	[[ $2 =~ ^[0-9]+$ ]] || usage
> > +		loop_on_fail=$2; shift
> > +		;;
> >  
> >  	-*)	usage ;;
> >  	*)	# not an argument, we've got tests now.
> > @@ -553,6 +558,18 @@ _expunge_test()
> >  	return 0
> >  }
> >  
> > +# retain files which would be overwritten in subsequent reruns of the same test
> > +_stash_fail_loop_files() {
> > +	local test_seq="$1"
> > +	local suffix="$2"
> > +
> > +	for i in "${REPORT_DIR}/${test_seq}.full" \
> > +		 "${REPORT_DIR}/${test_seq}.dmesg" \
> > +		 "${REPORT_DIR}/${test_seq}.out.bad"; do
> > +		[ -f "$i" ] && cp "$i" "${i}${suffix}"  
> 
> I wonder, is there any particular reason to copy the output file and let
> it get overwritten instead of simply mv'ing it?

The copy is left over from an earlier version I had where xunit report
generation was done after the copy. Looking closer:
- .full is removed in _begin_fstest()
- _check_dmesg() overwrites .dmesg and retains on failure or KEEP_DMESG
- out.bad is removed in the main check loop prior to seq invocation
- .notrun, .core and .hints are also removed in the check loop at
  various places before seq (.hints again in _begin_fstest())

One concern I have in changing this to a move is that external scripts
may check for presence / parse these files after check invocation. I'd
considered moving and then copying / symlinking back the .rerun0 files
on rerun-on-failure loop completion but that's also pretty ugly. IMO
leaving this as a copy, with the non-suffix file state left to reflect
the results of the last rerun-on-failure loop, would make the most
sense for now.

> > +	done
> > +}
> > +
> >  # Retain in @bad / @notrun the result of the just-run @test_seq. @try array
> >  # entries are added prior to execution.
> >  _stash_test_status() {
> > @@ -564,8 +581,35 @@ _stash_test_status() {
> >  				      "$test_status" "$((stop - start))"
> >  	fi
> >  
> > +	if ((${#loop_status[*]} > 0)); then
> > +		# continuing or completing rerun-on-failure loop
> > +		_stash_fail_loop_files "$test_seq" ".rerun${#loop_status[*]}"
> > +		loop_status+=("$test_status")
> > +		if ((${#loop_status[*]} > loop_on_fail)); then
> > +			printf "%s aggregate results across %d runs: " \
> > +				"$test_seq" "${#loop_status[*]}"
> > +			awk "BEGIN {
> > +				n=split(\"${loop_status[*]}\", arr);"'
> > +				for (i = 1; i <= n; i++)
> > +					stats[arr[i]]++;
> > +				for (x in stats)
> > +					printf("%s=%d (%.1f%%)",  
> 
> Hmm, if I parse this correctly, do you end up with something like:
> 
> "xfs/555 aggregate results across 15 runs: pass=5 (33.3%) fail=10 (66.7%)" ?

Yes, with a comma in between "... (33.3%), fail=10 ...".

> > +					       (i-- > n ? x : ", " x),
> > +					       stats[x], 100 * stats[x] / n);
> > +				}'
> > +			echo
> > +			loop_status=()
> > +		fi
> > +		return	# only stash @bad result for initial failure in loop
> > +	fi
> > +
> >  	case "$test_status" in
> >  	fail)
> > +		if ((loop_on_fail > 0)); then
> > +			# initial failure, start rerun-on-failure loop
> > +			_stash_fail_loop_files "$test_seq" ".rerun0"
> > +			loop_status+=("$test_status")  
> 
> So if I'm reading this right, the length of the $loop_status array is
> what gates us moving on or retrying, right?  If the length is zero, then
> we move on to the next test; otherwise, that loopy logic in
> _stash_test_result above will keep the same test running until the
> length exceeds loop_on_fail, at which point we print the aggregation
> report, empty out $loop_status, and then ix increments and we move on to
> the next test?

Yes, exactly.

Cheers, David

next prev parent reply	other threads:[~2022-07-06 21:54 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-06 11:23 [PATCH v3 0/5] check: add option to rerun failed tests David Disseldorp
2022-07-06 11:23 ` [PATCH v3 1/5] report: use array for REPORT_ENV_LIST David Disseldorp
2022-07-06 11:23 ` [PATCH v3 2/5] report: pass through most details as function parameters David Disseldorp
2022-07-06 11:23 ` [PATCH v3 3/5] check: make a few variables local David Disseldorp
2022-07-06 11:23 ` [PATCH v3 4/5] check: append bad / notrun arrays in helper function David Disseldorp
2022-07-06 18:33   ` Darrick J. Wong
2022-07-06 11:23 ` [PATCH v3 5/5] check: add -L <n> parameter to rerun failed tests David Disseldorp
2022-07-06 19:00   ` Darrick J. Wong
2022-07-06 21:54     ` David Disseldorp [this message]
2022-07-07 18:09       ` David Disseldorp
2022-07-08  0:34         ` Zorro Lang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220706235452.694341f0@suse.de \
    --to=ddiss@suse.de \
    --cc=djwong@kernel.org \
    --cc=fstests@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.