From: "Theodore Ts'o" <tytso@mit.edu>
To: "Darrick J. Wong" <djwong@kernel.org>
Cc: zlang@redhat.com, linux-xfs@vger.kernel.org,
fstests@vger.kernel.org, guan@eryu.me, leah.rumancik@gmail.com,
quwenruo.btrfs@gmx.com
Subject: Re: [PATCH 1/8] check: generate section reports between tests
Date: Mon, 19 Dec 2022 22:16:43 -0500 [thread overview]
Message-ID: <Y6EpG8cpQDH0XuGz@mit.edu> (raw)
In-Reply-To: <167149446946.332657.17186597494532662986.stgit@magnolia>
On Mon, Dec 19, 2022 at 04:01:09PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Generate the section report between tests so that the summary report
> always reflects the outcome of the most recent test. Two usecases are
> envisioned here -- if a cluster-based test runner anticipates that the
> testrun could crash the VM, they can set REPORT_DIR to (say) an NFS
> mount to preserve the intermediate results. If the VM does indeed
> crash, the scheduler can examine the state of the crashed VM and move
> the tests to another VM. The second usecase is a reporting agent that
> runs in the VM to upload live results to a test dashboard.
Leah has been working on adding crash recovery for gce-xfstests.
It'll be interesting to see how her work dovetails with your patches.
The basic design we've worked out works by having the test framework
recognize whether the VM had been had been previously been running
tests. We keep track of the last test that was run by hooking into
$LOGGER_PROG. We then use a python script[1] to append to the xunit file
a test result for the test that was running at the time of the crash,
and we set the test result to "error", and then we resume running
tests from where we had left off.
[1] https://github.com/lrumancik/xfstests-bld/blob/ltm-auto-resume-new/test-appliance/files/usr/local/bin/add_error_xunit
To deal with cases where the kernel has deadlocked, when the test VM
is launched by the LTM server, the LTM server will monitor the test
VM, if the LTM server notices that the test VM has failed to make
forward progress within a set time, it will force the test VM to
reboot, at which point the recovery process described above kicks in.
Eventually, we'll have the LTM server examine the serial console of
the test VM, looking for indications of kernel panics and RCU / soft
lockup warnings, so we can more quickly force a reboot when the system
under test is clearly unhappy.
The advantage of this design is that it doesen't require using NFS to
store the results, and in theory we don't even need to use a separate
monitoring VM; we could just use a software and kernel watchdogs to
notice when the tests have stopped making forward progress.
- Ted
P.S. We're not using section reporting since we generally use launch
separate VM's for each "section" so we can speed up the test run time
by sharding across those VM's. And then we have the LTM server merge
the results together into a single test run report.
next prev parent reply other threads:[~2022-12-20 3:17 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-20 0:01 [PATCHSET RFC 0/8] fstests: improve junit xml reporting Darrick J. Wong
2022-12-20 0:01 ` [PATCH 1/8] check: generate section reports between tests Darrick J. Wong
2022-12-20 1:14 ` Qu Wenruo
2023-02-14 18:46 ` Darrick J. Wong
2023-02-15 2:06 ` Qu Wenruo
2022-12-20 3:16 ` Theodore Ts'o [this message]
2022-12-20 18:18 ` Leah Rumancik
2022-12-20 0:01 ` [PATCH 2/8] report: derive an xml schema for the xunit report Darrick J. Wong
2022-12-20 2:18 ` Qu Wenruo
2023-02-14 18:54 ` Darrick J. Wong
2022-12-20 0:01 ` [PATCH 3/8] report: capture the time zone in the test report timestamp Darrick J. Wong
2022-12-20 2:19 ` Qu Wenruo
2022-12-20 0:01 ` [PATCH 4/8] report: sort properties by name Darrick J. Wong
2022-12-20 0:01 ` [PATCH 5/8] report: pass property value to _xunit_add_property Darrick J. Wong
2022-12-20 0:01 ` [PATCH 6/8] report: collect basic information about a test run Darrick J. Wong
2022-12-20 3:29 ` Theodore Ts'o
2023-02-14 18:59 ` Darrick J. Wong
2022-12-20 0:01 ` [PATCH 7/8] report: record xfs-specific " Darrick J. Wong
2022-12-20 0:01 ` [PATCH 8/8] report: record ext*-specific " Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y6EpG8cpQDH0XuGz@mit.edu \
--to=tytso@mit.edu \
--cc=djwong@kernel.org \
--cc=fstests@vger.kernel.org \
--cc=guan@eryu.me \
--cc=leah.rumancik@gmail.com \
--cc=linux-xfs@vger.kernel.org \
--cc=quwenruo.btrfs@gmx.com \
--cc=zlang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox