From: "Darrick J. Wong" <djwong@kernel.org>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: kdevops@lists.linux.dev, Chandan Babu R <chandan.babu@oracle.com>,
Amir Goldstein <amir73il@gmail.com>,
p.raghav@samsung.com,
"Pankaj Raghav (Samsung)" <kernel@pankajraghav.com>,
zlang@redhat.com, Daniel Gomez <da.gomez@samsung.com>,
Dave Chinner <david@fromorbit.com>,
Matthew Wilcox <willy@infradead.org>,
Kent Overstreet <kent.overstreet@linux.dev>,
fstests@vger.kernel.org, gost.dev@samsung.com
Subject: Re: [PATCH kdevops] fstests: provide kconfig guidance for SOAK_DURATION
Date: Thu, 25 Jan 2024 14:29:56 -0800 [thread overview]
Message-ID: <20240125222956.GD6188@frogsfrogsfrogs> (raw)
In-Reply-To: <ZbLcXWI9bCdc9PZO@bombadil.infradead.org>
On Thu, Jan 25, 2024 at 02:10:37PM -0800, Luis Chamberlain wrote:
> The kdevops test runner has supported a custom SOAK_DURATION for
> fstests, however we were not providing any guidance. This means folks
> likely disable this. Throw a bone and provide some basic guidance and
> use 2.5 hours as the default value. There are about 46 tests today
> which use soak duration, this means if you are testing serially it
> increase total test time by about 5 days than the previously known
> total test time.
>
> Note that if you are using kernel-ci and using a max loop goal of 100
> that means 500 days extra, so about 1.3 years extra total test time.
> If enabling soak duration you may want to then re-evaluate your loop
> target goal for kernel-ci for kdevops.
Yikes, I wouldn't combine multiple runs with large SOAK_DURATION. ;)
You all might consider kicking all the soak tests over to a separate VM
or VMs so that the long soak test do not hold up the rest of the run.
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>
> Chandan, Amir, lemme know what you think of a default 2.5 hours default
> if soak duration is enabled. The only thing is the math indicates that
> if you are going to enable kernel-ci we won't finish this year.
>
> To be clear, we've picked up testing with soak duration seriously for
> our LBS testing. It is why we've been able to find pretty hard to
> reproduce issues even on the page cache for the baseline [0], ie, without
> LBS. While folks have seemed to have found value in adopting 2.5 hours
> and of the results we have found, it obviously means a scaling issue
> to consider to decide when we're done with testing our baseline.
>
> At first I wrote this patch just to provide basic guidance for kdevops,
> but after doing a bit of the math on how it also extends total test
> time, *with* our kernel-ci effort, it reveals clearly we should probably
> reconsider lowering the kernel-ci threshold a bit if adopting soak
> duration.
>
> CC'ing a bit wider audience so to get a bit better idea of what folks
> might consider a sensible value for your own testing too. From what
> we've been observing, SOAK_DURATION allows us to catch bugs faster than
> just increasing the kernel-ci count, however, using both let's us catch
> even more bugs too.
>
> To help *reduce* the amount of time to test we've deployed many kdevops
> XFS clusters to help test the baseline. This is why our count time on
> kernel-ci no is about 50-60 with a soak duration of about 2.5 hours.
>
> Also please not that the reported bugs so far are the ones with crashes,
> there are other failures too, but we just haven't had the time to disect
> and report failures which are non-fatal (crashes) as crashes have been
> our priority.
>
> [0] https://github.com/linux-kdevops/kdevops/blob/master/docs/xfs-bugs.md
>
> playbooks/roles/fstests/defaults/main.yml | 3 +
> workflows/fstests/Kconfig | 89 ++++++++++++++++++++---
> workflows/fstests/Makefile.sparsefiles | 4 +
> 3 files changed, 87 insertions(+), 9 deletions(-)
>
> diff --git a/playbooks/roles/fstests/defaults/main.yml b/playbooks/roles/fstests/defaults/main.yml
> index 2f70f9549cde..4a1f5dec5827 100644
> --- a/playbooks/roles/fstests/defaults/main.yml
> +++ b/playbooks/roles/fstests/defaults/main.yml
> @@ -30,6 +30,9 @@ fstests_test_logdev_mkfs_opts: "/dev/null"
> fstests_test_dev_zns: "/dev/null"
> fstests_zns_enabled: False
>
> +fstests_soak_duration_enable: False
> +fstests_soak_duration: 0
> +
> fstests_uses_no_devices: False
> fstests_generate_simple_config_enable: False
> fstests_generate_nvme_live_config_enable: False
> diff --git a/workflows/fstests/Kconfig b/workflows/fstests/Kconfig
> index 985a7847b6c7..bbd8927b3cd3 100644
> --- a/workflows/fstests/Kconfig
> +++ b/workflows/fstests/Kconfig
> @@ -760,15 +760,23 @@ config FSTESTS_RUN_LARGE_DISK_TESTS
> to run. The "large disk" requirement is test dependent, but
> typically, it means a disk with capacity of at several 10G.
>
> -config FSTESTS_SOAK_DURATION
> - int "Custom Soak duration to be used"
> - default 0
> +config FSTESTS_ENABLE_SOAK_DURATION
> + bool "Enable custom soak duration time"
> help
> - Custom Soak duration to be used during test execution. If you set this
> - to a non-zero value then fstests will increase the amount of time it
> - takes to run certain tests which are time based and support using
> - SOAK_DURATION. A moderate high value setting for this is 9900 which is
> - 2.5 hours.
> + Enable soak duration to be used during test execution. If you are not
> + interested in extending your testing then leave this disabled.
> +
> + Using a custom soak duration to a non-zero value then fstests will
> + increase the amount of time it takes to run certain tests which are
> + time based and support using SOAK_DURATION. A moderate high value
> + setting for this is 9900 which is 2.5 hours.
"A moderately high setting for this is "2.5h" for 2.5 hours."
FWIW, the ./check parser translates floating point numbers with suffixes
to integer seconds; see soak_duration.awk.
The part I don't know is if kdevops merely passes through the value as a
string; or actually treats this as an integer. If the latter, then
please ignore my comment.
> +
> + Note that we have 46 tests today which will be able to use soak
> + duration if set. This means your test time will increase by the
> + soak duration * these number of tests. When soak duration is
> + enabled the test specific watchdog fstests_watchdog.py will be
> + aware of tests which require soak duration and consider before
> + reporting a possible hang.
>
> As of 2023-10-31 that consists of the following tests which use either
> fsstress or fsx or fio. Tests either use SOAK_DURATION directly or they
> @@ -786,7 +794,7 @@ config FSTESTS_SOAK_DURATION
> - generic/648 - fsstress + disk failures on loopback
> - generic/650 - fsstress - multithreaded write + CPU hotplug
>
> - The tests below use _scratch_xfs_stress_scrub() to stress
> + All the tests below use _scratch_xfs_stress_scrub() to stress
> test an with fsstress with scrub or an alternate xfs_db operation.
>
> - xfs/285
> @@ -825,4 +833,67 @@ config FSTESTS_SOAK_DURATION
> - xfs/729
> - xfs/800
>
> +if FSTESTS_ENABLE_SOAK_DURATION
> +
> +choice
> + prompt "Soak duration value to use"
> + default FSTESTS_SOAK_DURATION_HIGH
> +
> +config FSTESTS_SOAK_DURATION_CUSTOM
> + bool "Custom"
> + help
> + You want to specify the value yourself.
> +
> +config FSTESTS_SOAK_DURATION_PATHALOGICAL
"PATHOLOGICAL", and yes that high a setting is pathological. ;)
(Unless you're allocating one soak-fstest per VM in which case "1w"
might be appropriate.)
> + bool "High (48 hours)"
> + help
> + Use 48 hours for soak duration.
> +
> + Using this with 46 tests known to use soak duration means your test
> + time will increase by about 92 days, or a bit over 3 months if run
> + serially.
> +
> +config FSTESTS_SOAK_DURATION_HIGH
> + bool "High (2.5 hours)"
> + help
> + Use 2.5 hours for soak duration.
> +
> + Using this with 46 tests known to use soak duration means your test
> + time will increase by about 5 days if run serially.
> +
> +config FSTESTS_SOAK_DURATION_MID
> + bool "Mid (1 hour)"
> + help
> + Use 1 hour for soak duration.
> +
> + Using this with 46 tests known to use soak duration means your test
> + time will increase by about 2 days if run serially.
I wonder, is there any way to scan the number of soak test to generate
these figures automatically at configure time? I'd guess no, since
kdevops kconfig comes before pulling and compiling, right?
--D
> +
> +config FSTESTS_SOAK_DURATION_LOW
> + bool "Low (30 minutes)
> + help
> + Use 30 minutes for soak duration.
> +
> + Using this with 46 tests known to use soak duration means your test
> + time will increase by about 1 day if run serially.
> +
> +endchoice
> +
> +config FSTESTS_SOAK_DURATION_CUSTOM_VAL
> + int "Custom soak duration value (seconds)"
> + default 0
> + depends on FSTESTS_SOAK_DURATION_CUSTOM
> + help
> + Enter your custom soak duration value in seconds.
> +
> +endif # FSTESTS_ENABLE_SOAK_DURATION
> +
> +config FSTESTS_SOAK_DURATION
> + default 0 if !FSTESTS_ENABLE_SOAK_DURATION
> + default FSTESTS_SOAK_DURATION_CUSTOM_VAL if FSTESTS_SOAK_DURATION_CUSTOM
> + default 1800 if FSTESTS_SOAK_DURATION_LOW
> + default 3600 if FSTESTS_SOAK_DURATION_MID
> + default 9900 if FSTESTS_SOAK_DURATION_HIGH
> + default 172800 if FSTESTS_SOAK_DURATION_PATHALOGICAL
> +
> endif # KDEVOPS_WORKFLOW_ENABLE_FSTESTS
> diff --git a/workflows/fstests/Makefile.sparsefiles b/workflows/fstests/Makefile.sparsefiles
> index c5ca20a9c462..7dd129c4f9cc 100644
> --- a/workflows/fstests/Makefile.sparsefiles
> +++ b/workflows/fstests/Makefile.sparsefiles
> @@ -44,6 +44,10 @@ FSTESTS_ARGS += run_large_disk_tests='$(FSTESTS_RUN_LARGE_DISK_TESTS)'
> FSTESTS_ARGS += run_auto_group_tests='$(FSTESTS_RUN_AUTO_GROUP_TESTS)'
> FSTESTS_ARGS += run_custom_group_tests='$(FSTESTS_RUN_CUSTOM_GROUP_TESTS)'
> FSTESTS_ARGS += exclude_test_groups='$(CONFIG_FSTESTS_EXCLUDE_TEST_GROUPS)'
> +
> +ifeq (y,$(CONFIG_FSTESTS_ENABLE_SOAK_DURATION))
> +FSTESTS_ARGS += fstests_soak_duration_enable='True'
> +endif
> FSTESTS_ARGS += fstests_soak_duration='$(CONFIG_FSTESTS_SOAK_DURATION)'
>
> ifeq (y,$(CONFIG_FSTESTS_ENABLE_RUN_CUSTOM_TESTS))
> --
> 2.42.0
>
>
next prev parent reply other threads:[~2024-01-25 22:29 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-25 22:10 [PATCH kdevops] fstests: provide kconfig guidance for SOAK_DURATION Luis Chamberlain
2024-01-25 22:29 ` Darrick J. Wong [this message]
2024-01-29 20:49 ` Luis Chamberlain
2024-01-25 23:24 ` Matthew Wilcox
2024-01-29 20:57 ` Luis Chamberlain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240125222956.GD6188@frogsfrogsfrogs \
--to=djwong@kernel.org \
--cc=amir73il@gmail.com \
--cc=chandan.babu@oracle.com \
--cc=da.gomez@samsung.com \
--cc=david@fromorbit.com \
--cc=fstests@vger.kernel.org \
--cc=gost.dev@samsung.com \
--cc=kdevops@lists.linux.dev \
--cc=kent.overstreet@linux.dev \
--cc=kernel@pankajraghav.com \
--cc=mcgrof@kernel.org \
--cc=p.raghav@samsung.com \
--cc=willy@infradead.org \
--cc=zlang@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox