From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E8DCFC43334 for ; Wed, 6 Jul 2022 21:54:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232082AbiGFVy5 (ORCPT ); Wed, 6 Jul 2022 17:54:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40614 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230412AbiGFVy4 (ORCPT ); Wed, 6 Jul 2022 17:54:56 -0400 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E068D275E6 for ; Wed, 6 Jul 2022 14:54:55 -0700 (PDT) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 0CF1F21D41; Wed, 6 Jul 2022 21:54:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1657144494; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pLC/Moe5rQl1vuy9xEX43Pr9T7Gv3WjznWWknnytn80=; b=BGX+Olvctw6AOVM8Q7VGiUNJ7qiDU1KWq2xWk2hacJkvvs7N4JaAcQanDtVyUEuiQcw8Oo kCkb9NQ7Z+ON4d3eNxRKm38M6fvqLavUi5FnWQVaZQDQy3civoji425wiMzedsw205jf3Y QxSNzLymPeP5+vjOJ2OYEbKbHluiOvE= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1657144494; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pLC/Moe5rQl1vuy9xEX43Pr9T7Gv3WjznWWknnytn80=; b=qgvQbPaINhYMsjbpn0y+e1963ZWUGWLeiwLGDITmdMAgEJ1fazqPP+9aJZRLwZ63/0zo0X +oVJazIa6sGFfLDQ== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id DC5DE134CF; Wed, 6 Jul 2022 21:54:53 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id pRg+NK0ExmKBHgAAMHmgww (envelope-from ); Wed, 06 Jul 2022 21:54:53 +0000 Date: Wed, 6 Jul 2022 23:54:52 +0200 From: David Disseldorp To: "Darrick J. Wong" Cc: fstests@vger.kernel.org, tytso@mit.edu Subject: Re: [PATCH v3 5/5] check: add -L parameter to rerun failed tests Message-ID: <20220706235452.694341f0@suse.de> In-Reply-To: References: <20220706112312.4349-1-ddiss@suse.de> <20220706112312.4349-6-ddiss@suse.de> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: fstests@vger.kernel.org Thanks for the follow-up feedback, Darrick... On Wed, 6 Jul 2022 12:00:07 -0700, Darrick J. Wong wrote: > On Wed, Jul 06, 2022 at 01:23:12PM +0200, David Disseldorp wrote: > > If check is run with -L , then a failed test will be rerun times > > before proceeding to the next test. Following completion of the rerun > > loop, aggregate pass/fail statistics are printed. > > > > Rerun tests will be tracked as a single failure in overall pass/fail > > metrics (via @try and @bad), with .out.bad, .dmesg and .full saved using > > a .rerun# suffix. > > > > Suggested-by: Theodore Ts'o > > Link: https://lwn.net/Articles/897061/ > > Signed-off-by: David Disseldorp > > --- > > check | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++--- > > 1 file changed, 53 insertions(+), 3 deletions(-) > > > > diff --git a/check b/check > > index 6dbdb2a8..46fca6e6 100755 > > --- a/check > > +++ b/check > > @@ -26,6 +26,7 @@ do_report=false > > DUMP_OUTPUT=false > > iterations=1 > > istop=false > > +loop_on_fail=0 > > > > # This is a global variable used to pass test failure text to reporting gunk > > _err_msg="" > > @@ -78,6 +79,7 @@ check options > > --large-fs optimise scratch device for large filesystems > > -s section run only specified section from config file > > -S section exclude the specified section from the config file > > + -L loop tests times following a failure, measuring aggregate pass/fail metrics > > > > testlist options > > -g group[,group...] include tests from these groups > > @@ -336,6 +338,9 @@ while [ $# -gt 0 ]; do > > ;; > > --large-fs) export LARGE_SCRATCH_DEV=yes ;; > > --extra-space=*) export SCRATCH_DEV_EMPTY_SPACE=${r#*=} ;; > > + -L) [[ $2 =~ ^[0-9]+$ ]] || usage > > + loop_on_fail=$2; shift > > + ;; > > > > -*) usage ;; > > *) # not an argument, we've got tests now. > > @@ -553,6 +558,18 @@ _expunge_test() > > return 0 > > } > > > > +# retain files which would be overwritten in subsequent reruns of the same test > > +_stash_fail_loop_files() { > > + local test_seq="$1" > > + local suffix="$2" > > + > > + for i in "${REPORT_DIR}/${test_seq}.full" \ > > + "${REPORT_DIR}/${test_seq}.dmesg" \ > > + "${REPORT_DIR}/${test_seq}.out.bad"; do > > + [ -f "$i" ] && cp "$i" "${i}${suffix}" > > I wonder, is there any particular reason to copy the output file and let > it get overwritten instead of simply mv'ing it? The copy is left over from an earlier version I had where xunit report generation was done after the copy. Looking closer: - .full is removed in _begin_fstest() - _check_dmesg() overwrites .dmesg and retains on failure or KEEP_DMESG - out.bad is removed in the main check loop prior to seq invocation - .notrun, .core and .hints are also removed in the check loop at various places before seq (.hints again in _begin_fstest()) One concern I have in changing this to a move is that external scripts may check for presence / parse these files after check invocation. I'd considered moving and then copying / symlinking back the .rerun0 files on rerun-on-failure loop completion but that's also pretty ugly. IMO leaving this as a copy, with the non-suffix file state left to reflect the results of the last rerun-on-failure loop, would make the most sense for now. > > + done > > +} > > + > > # Retain in @bad / @notrun the result of the just-run @test_seq. @try array > > # entries are added prior to execution. > > _stash_test_status() { > > @@ -564,8 +581,35 @@ _stash_test_status() { > > "$test_status" "$((stop - start))" > > fi > > > > + if ((${#loop_status[*]} > 0)); then > > + # continuing or completing rerun-on-failure loop > > + _stash_fail_loop_files "$test_seq" ".rerun${#loop_status[*]}" > > + loop_status+=("$test_status") > > + if ((${#loop_status[*]} > loop_on_fail)); then > > + printf "%s aggregate results across %d runs: " \ > > + "$test_seq" "${#loop_status[*]}" > > + awk "BEGIN { > > + n=split(\"${loop_status[*]}\", arr);"' > > + for (i = 1; i <= n; i++) > > + stats[arr[i]]++; > > + for (x in stats) > > + printf("%s=%d (%.1f%%)", > > Hmm, if I parse this correctly, do you end up with something like: > > "xfs/555 aggregate results across 15 runs: pass=5 (33.3%) fail=10 (66.7%)" ? Yes, with a comma in between "... (33.3%), fail=10 ...". > > + (i-- > n ? x : ", " x), > > + stats[x], 100 * stats[x] / n); > > + }' > > + echo > > + loop_status=() > > + fi > > + return # only stash @bad result for initial failure in loop > > + fi > > + > > case "$test_status" in > > fail) > > + if ((loop_on_fail > 0)); then > > + # initial failure, start rerun-on-failure loop > > + _stash_fail_loop_files "$test_seq" ".rerun0" > > + loop_status+=("$test_status") > > So if I'm reading this right, the length of the $loop_status array is > what gates us moving on or retrying, right? If the length is zero, then > we move on to the next test; otherwise, that loopy logic in > _stash_test_result above will keep the same test running until the > length exceeds loop_on_fail, at which point we print the aggregation > report, empty out $loop_status, and then ix increments and we move on to > the next test? Yes, exactly. Cheers, David