From: David Disseldorp <ddiss@suse.de>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: fstests@vger.kernel.org, tytso@mit.edu
Subject: Re: [PATCH v3 5/5] check: add -L <n> parameter to rerun failed tests
Date: Wed, 6 Jul 2022 23:54:52 +0200 [thread overview]
Message-ID: <20220706235452.694341f0@suse.de> (raw)
In-Reply-To: <YsXbt85xBNJIOwZu@magnolia>
Thanks for the follow-up feedback, Darrick...
On Wed, 6 Jul 2022 12:00:07 -0700, Darrick J. Wong wrote:
> On Wed, Jul 06, 2022 at 01:23:12PM +0200, David Disseldorp wrote:
> > If check is run with -L <n>, then a failed test will be rerun <n> times
> > before proceeding to the next test. Following completion of the rerun
> > loop, aggregate pass/fail statistics are printed.
> >
> > Rerun tests will be tracked as a single failure in overall pass/fail
> > metrics (via @try and @bad), with .out.bad, .dmesg and .full saved using
> > a .rerun# suffix.
> >
> > Suggested-by: Theodore Ts'o <tytso@mit.edu>
> > Link: https://lwn.net/Articles/897061/
> > Signed-off-by: David Disseldorp <ddiss@suse.de>
> > ---
> > check | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
> > 1 file changed, 53 insertions(+), 3 deletions(-)
> >
> > diff --git a/check b/check
> > index 6dbdb2a8..46fca6e6 100755
> > --- a/check
> > +++ b/check
> > @@ -26,6 +26,7 @@ do_report=false
> > DUMP_OUTPUT=false
> > iterations=1
> > istop=false
> > +loop_on_fail=0
> >
> > # This is a global variable used to pass test failure text to reporting gunk
> > _err_msg=""
> > @@ -78,6 +79,7 @@ check options
> > --large-fs optimise scratch device for large filesystems
> > -s section run only specified section from config file
> > -S section exclude the specified section from the config file
> > + -L <n> loop tests <n> times following a failure, measuring aggregate pass/fail metrics
> >
> > testlist options
> > -g group[,group...] include tests from these groups
> > @@ -336,6 +338,9 @@ while [ $# -gt 0 ]; do
> > ;;
> > --large-fs) export LARGE_SCRATCH_DEV=yes ;;
> > --extra-space=*) export SCRATCH_DEV_EMPTY_SPACE=${r#*=} ;;
> > + -L) [[ $2 =~ ^[0-9]+$ ]] || usage
> > + loop_on_fail=$2; shift
> > + ;;
> >
> > -*) usage ;;
> > *) # not an argument, we've got tests now.
> > @@ -553,6 +558,18 @@ _expunge_test()
> > return 0
> > }
> >
> > +# retain files which would be overwritten in subsequent reruns of the same test
> > +_stash_fail_loop_files() {
> > + local test_seq="$1"
> > + local suffix="$2"
> > +
> > + for i in "${REPORT_DIR}/${test_seq}.full" \
> > + "${REPORT_DIR}/${test_seq}.dmesg" \
> > + "${REPORT_DIR}/${test_seq}.out.bad"; do
> > + [ -f "$i" ] && cp "$i" "${i}${suffix}"
>
> I wonder, is there any particular reason to copy the output file and let
> it get overwritten instead of simply mv'ing it?
The copy is left over from an earlier version I had where xunit report
generation was done after the copy. Looking closer:
- .full is removed in _begin_fstest()
- _check_dmesg() overwrites .dmesg and retains on failure or KEEP_DMESG
- out.bad is removed in the main check loop prior to seq invocation
- .notrun, .core and .hints are also removed in the check loop at
various places before seq (.hints again in _begin_fstest())
One concern I have in changing this to a move is that external scripts
may check for presence / parse these files after check invocation. I'd
considered moving and then copying / symlinking back the .rerun0 files
on rerun-on-failure loop completion but that's also pretty ugly. IMO
leaving this as a copy, with the non-suffix file state left to reflect
the results of the last rerun-on-failure loop, would make the most
sense for now.
> > + done
> > +}
> > +
> > # Retain in @bad / @notrun the result of the just-run @test_seq. @try array
> > # entries are added prior to execution.
> > _stash_test_status() {
> > @@ -564,8 +581,35 @@ _stash_test_status() {
> > "$test_status" "$((stop - start))"
> > fi
> >
> > + if ((${#loop_status[*]} > 0)); then
> > + # continuing or completing rerun-on-failure loop
> > + _stash_fail_loop_files "$test_seq" ".rerun${#loop_status[*]}"
> > + loop_status+=("$test_status")
> > + if ((${#loop_status[*]} > loop_on_fail)); then
> > + printf "%s aggregate results across %d runs: " \
> > + "$test_seq" "${#loop_status[*]}"
> > + awk "BEGIN {
> > + n=split(\"${loop_status[*]}\", arr);"'
> > + for (i = 1; i <= n; i++)
> > + stats[arr[i]]++;
> > + for (x in stats)
> > + printf("%s=%d (%.1f%%)",
>
> Hmm, if I parse this correctly, do you end up with something like:
>
> "xfs/555 aggregate results across 15 runs: pass=5 (33.3%) fail=10 (66.7%)" ?
Yes, with a comma in between "... (33.3%), fail=10 ...".
> > + (i-- > n ? x : ", " x),
> > + stats[x], 100 * stats[x] / n);
> > + }'
> > + echo
> > + loop_status=()
> > + fi
> > + return # only stash @bad result for initial failure in loop
> > + fi
> > +
> > case "$test_status" in
> > fail)
> > + if ((loop_on_fail > 0)); then
> > + # initial failure, start rerun-on-failure loop
> > + _stash_fail_loop_files "$test_seq" ".rerun0"
> > + loop_status+=("$test_status")
>
> So if I'm reading this right, the length of the $loop_status array is
> what gates us moving on or retrying, right? If the length is zero, then
> we move on to the next test; otherwise, that loopy logic in
> _stash_test_result above will keep the same test running until the
> length exceeds loop_on_fail, at which point we print the aggregation
> report, empty out $loop_status, and then ix increments and we move on to
> the next test?
Yes, exactly.
Cheers, David
next prev parent reply other threads:[~2022-07-06 21:54 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-06 11:23 [PATCH v3 0/5] check: add option to rerun failed tests David Disseldorp
2022-07-06 11:23 ` [PATCH v3 1/5] report: use array for REPORT_ENV_LIST David Disseldorp
2022-07-06 11:23 ` [PATCH v3 2/5] report: pass through most details as function parameters David Disseldorp
2022-07-06 11:23 ` [PATCH v3 3/5] check: make a few variables local David Disseldorp
2022-07-06 11:23 ` [PATCH v3 4/5] check: append bad / notrun arrays in helper function David Disseldorp
2022-07-06 18:33 ` Darrick J. Wong
2022-07-06 11:23 ` [PATCH v3 5/5] check: add -L <n> parameter to rerun failed tests David Disseldorp
2022-07-06 19:00 ` Darrick J. Wong
2022-07-06 21:54 ` David Disseldorp [this message]
2022-07-07 18:09 ` David Disseldorp
2022-07-08 0:34 ` Zorro Lang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20220706235452.694341f0@suse.de \
--to=ddiss@suse.de \
--cc=djwong@kernel.org \
--cc=fstests@vger.kernel.org \
--cc=tytso@mit.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox