From: "Theodore Ts'o" <tytso@mit.edu>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: Leah Rumancik <leah.rumancik@gmail.com>,
Amir Goldstein <amir73il@gmail.com>,
Josef Bacik <josef@toxicpanda.com>,
Chuck Lever <chuck.lever@oracle.com>,
chandanrmail@gmail.com,
Sweet Tea Dorminy <sweettea-kernel@dorminy.me>,
Pankaj Raghav <pankydev8@gmail.com>,
linux-xfs@vger.kernel.org, fstests <fstests@vger.kernel.org>
Subject: Re: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)
Date: Wed, 22 Jun 2022 17:44:30 -0400 [thread overview]
Message-ID: <YrONPrBgopZQ2EUj@mit.edu> (raw)
In-Reply-To: <YrJdLhHBsolF83Rq@bombadil.infradead.org>
On Tue, Jun 21, 2022 at 05:07:10PM -0700, Luis Chamberlain wrote:
> On Thu, Jun 16, 2022 at 11:27:41AM -0700, Leah Rumancik wrote:
> > https://gist.github.com/lrumancik/5a9d85d2637f878220224578e173fc23.
>
> The coverage for XFS is using profiles which seem to come inspired
> by ext4's different mkfs configurations.
That's not correct, actually. It's using the gce-xfstests test
framework which is part of the xfstests-bld[1][2] system that I
maintain, yes. However, the actual config profiles were obtained via
discussions from Darrick and represent the actual configs which the
XFS maintainer uses to test the upstream XFS tree before deciding to
push to Linus. We figure if it's good enough for the XFS Maintainer,
it's good enough for us. :-)
[1] https://thunk.org/gce-xfstests
[2] https://github.com/tytso/xfstests-bld
If you think the XFS Maintainer should be running more configs, I
invite you to have that conversation with Darrick.
> GCE is supported as well, so is Azure and OpenStack, and even custom
> openstack solutions...
The way kdevops work is quite different from how gce-xfstests work,
since it is a VM native solution. Which is to say, when we kick off a
test, VM's are launched, one per each config, whih provide for better
parallelization, and then once everything is completed, the VM's are
automatically shutdown and they go away; so it's far more efficient in
terms of using cloud resources. The Lightweight Test Manager will ten
take the Junit XML files, plus all of the test artifacts, and these
get combined into a single test report.
The lightweight test manager runs in a small VM, and this is the only
VM which is consuming resources until we ask it to do some work. For
example:
gce-xfstests ltm -c xfs --repo stable.git --commit v5.18.6 -c xfs/all -g auto
That single command will result in the LTM launching a large builder
VM which quickly build the kernel. (And it uses ccache, and a
persistent cache disk, but even if we've never built the kernel, it
can complete the build in a few minutes.) Then we launch 12 VM's, one
for each config, and since they don't need to be optimized for fast
builds, we can run most of the VM's with a smaller amount of memory,
to better stress test the file system. (But for the dax config, we'll
launch a VM with more memory, since we need to simulate the PMEM
device using raw memory.) Once each VM completes each test run, it
uploads its test artifiacts and results XML file to Google Cloud
Storage. When all of the VM's complete, the LTM VM will download all
of the results files from GCS, combines them together into a single
result file, and then sends e-mail with a summary of the results.
It's optimized for developers, and for our use cases. I'm sure
kdevops is much more general, since it can work for hardware-based
test machines, as well as many other cloud stacks, and it's also
optimized for the QA department --- not surprising, since where
kdevops has come from.
> Also, I see on the above URL you posted there is a TODO in the gist which
> says, "find a better route for publishing these". If you were to use
> kdevops for this it would have the immediate gain in that kdevops users
> could reproduce your findings and help augment it.
Sure, but with our system, kvm-xfstests and gce-xfstests users can
*easily* reproduce our findings and can help augment it. :-)
As far as sharing expunge files, as I've observed before, these files
tend to be very specific to the test configuration --- the number of
CPU's, and the amount of memory, the characteristics of the storage
device, etc. So what works for one developer's test setup will not
necessarily work for others --- and I'm not convinced that trying to
get everyone standardized on the One True Test Setup is actually an
advantage. Some people may be using large RAID Arrays; some might be
using fast flash; some might be using some kind of emulated log
structured block device; some might be using eMMC flash. And that's a
*good* thing.
We also have a very different philosophy about how to use expunge
files. In paticular, if there is test which is only failing 0.5% of
the time, I don't think it makes sense to put that test into an
expunge file.
In general, we are only placing tests into expunge files when
it causes the system under test to crash, or it takes *WAAAY* too
long, or it's a clear test bug that is too hard to fix for real, so we
just suppress the test for that config for now. (Example: tests in
xfstests for quota don't understand clustered allocation.)
So we want to run the tests, even if we know it will fail, and have a
way of annotating that a test is known to fail for a particular kernel
version, or if it's a flaky test, what the expected flake percentage
is for that particular test. For flaky tests, we'd like to be able
automatically retry running the test, and so we can flag when a flaky
test has become a hard failure, or a flaky test has radically changed
how often it fails. We haven't implemented all of this yet, but this
is something that we're exploring the design space at the moment.
More generally, I think competition is a good thing, and for areas
where we are still exploring the best way to automate tests, not just
from a QA department's perspective, but from a file system developer's
perspective, having multiple systems where we can explore these ideas
can be a good thing.
Cheers,
- Ted
next prev parent reply other threads:[~2022-06-22 21:44 UTC|newest]
Thread overview: 32+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-06-16 18:27 [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 1/8] xfs: use kmem_cache_free() for kmem_cache objects Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 2/8] xfs: punch out data fork delalloc blocks on COW writeback failure Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 3/8] xfs: Fix the free logic of state in xfs_attr_node_hasname Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 4/8] xfs: remove all COW fork extents when remounting readonly Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 5/8] xfs: check sb_meta_uuid for dabuf buffer recovery Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 6/8] xfs: prevent UAF in xfs_log_item_in_current_chkpt Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 7/8] xfs: only bother with sync_filesystem during readonly remount Leah Rumancik
2022-06-16 18:27 ` [PATCH 5.15 CANDIDATE v2 8/8] xfs: use setattr_copy to set vfs inode attributes Leah Rumancik
2022-06-17 7:27 ` Amir Goldstein
2022-06-22 0:07 ` [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Luis Chamberlain
2022-06-22 21:44 ` Theodore Ts'o [this message]
2022-06-23 5:31 ` Amir Goldstein
2022-06-23 21:39 ` Luis Chamberlain
2022-06-23 21:31 ` Luis Chamberlain
2022-06-24 5:32 ` Theodore Ts'o
2022-06-24 22:54 ` Luis Chamberlain
2022-06-25 2:21 ` Theodore Ts'o
2022-06-25 18:49 ` Luis Chamberlain
2022-06-25 21:14 ` Theodore Ts'o
2022-07-01 23:08 ` Luis Chamberlain
2022-06-25 7:28 ` sharing fstests results (Was: [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1)) Amir Goldstein
2022-06-25 19:35 ` Luis Chamberlain
2022-06-25 21:50 ` Theodore Ts'o
2022-07-01 23:13 ` Luis Chamberlain
2022-06-22 21:52 ` [PATCH 5.15 CANDIDATE v2 0/8] xfs stable candidate patches for 5.15.y (part 1) Leah Rumancik
2022-06-23 21:40 ` Luis Chamberlain
2022-06-22 16:23 ` Darrick J. Wong
2022-06-22 16:35 ` Darrick J. Wong
2022-06-22 21:29 ` Leah Rumancik
2022-06-23 4:53 ` Amir Goldstein
2022-06-23 6:28 ` Amir Goldstein
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YrONPrBgopZQ2EUj@mit.edu \
--to=tytso@mit.edu \
--cc=amir73il@gmail.com \
--cc=chandanrmail@gmail.com \
--cc=chuck.lever@oracle.com \
--cc=fstests@vger.kernel.org \
--cc=josef@toxicpanda.com \
--cc=leah.rumancik@gmail.com \
--cc=linux-xfs@vger.kernel.org \
--cc=mcgrof@kernel.org \
--cc=pankydev8@gmail.com \
--cc=sweettea-kernel@dorminy.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).