public inbox for kdevops@lists.linux.dev
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Luis Chamberlain <mcgrof@kernel.org>
Cc: kdevops@lists.linux.dev, Chandan Babu R <chandan.babu@oracle.com>,
	Amir Goldstein <amir73il@gmail.com>,
	p.raghav@samsung.com,
	"Pankaj Raghav (Samsung)" <kernel@pankajraghav.com>,
	zlang@redhat.com, Daniel Gomez <da.gomez@samsung.com>,
	Dave Chinner <david@fromorbit.com>,
	Matthew Wilcox <willy@infradead.org>,
	Kent Overstreet <kent.overstreet@linux.dev>,
	fstests@vger.kernel.org, gost.dev@samsung.com
Subject: Re: [PATCH kdevops] fstests: provide kconfig guidance for SOAK_DURATION
Date: Thu, 25 Jan 2024 14:29:56 -0800	[thread overview]
Message-ID: <20240125222956.GD6188@frogsfrogsfrogs> (raw)
In-Reply-To: <ZbLcXWI9bCdc9PZO@bombadil.infradead.org>

On Thu, Jan 25, 2024 at 02:10:37PM -0800, Luis Chamberlain wrote:
> The kdevops test runner has supported a custom SOAK_DURATION for
> fstests, however we were not providing any guidance. This means folks
> likely disable this. Throw a bone and provide some basic guidance and
> use 2.5 hours as the default value. There are about 46 tests today
> which use soak duration, this means if you are testing serially it
> increase total test time by about 5 days than the previously known
> total test time.
> 
> Note that if you are using kernel-ci and using a max loop goal of 100
> that means 500 days extra, so about 1.3 years extra total test time.
> If enabling soak duration you may want to then re-evaluate your loop
> target goal for kernel-ci for kdevops.

Yikes, I wouldn't combine multiple runs with large SOAK_DURATION. ;)

You all might consider kicking all the soak tests over to a separate VM
or VMs so that the long soak test do not hold up the rest of the run.

> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
> 
> Chandan, Amir, lemme know what you think of a default 2.5 hours default
> if soak duration is enabled. The only thing is the math indicates that
> if you are going to enable kernel-ci we won't finish this year.
> 
> To be clear, we've picked up testing with soak duration seriously for
> our LBS testing. It is why we've been able to find pretty hard to
> reproduce issues even on the page cache for the baseline [0], ie, without
> LBS. While folks have seemed to have found value in adopting 2.5 hours
> and of the results we have found, it obviously means a scaling issue
> to consider to decide when we're done with testing our baseline.
> 
> At first I wrote this patch just to provide basic guidance for kdevops,
> but after doing a bit of the math on how it also extends total test
> time, *with* our kernel-ci effort, it reveals clearly we should probably
> reconsider lowering the kernel-ci threshold a bit if adopting soak
> duration.
> 
> CC'ing a bit wider audience so to get a bit better idea of what folks
> might consider a sensible value for your own testing too. From what
> we've been observing, SOAK_DURATION allows us to catch bugs faster than
> just increasing the kernel-ci count, however, using both let's us catch
> even more bugs too.
> 
> To help *reduce* the amount of time to test we've deployed many kdevops
> XFS clusters to help test the baseline. This is why our count time on
> kernel-ci no is about 50-60 with a soak duration of about 2.5 hours.
> 
> Also please not that the reported bugs so far are the ones with crashes,
> there are other failures too, but we just haven't had the time to disect
> and report failures which are non-fatal (crashes) as crashes have been
> our priority.
> 
> [0] https://github.com/linux-kdevops/kdevops/blob/master/docs/xfs-bugs.md
> 
>  playbooks/roles/fstests/defaults/main.yml |  3 +
>  workflows/fstests/Kconfig                 | 89 ++++++++++++++++++++---
>  workflows/fstests/Makefile.sparsefiles    |  4 +
>  3 files changed, 87 insertions(+), 9 deletions(-)
> 
> diff --git a/playbooks/roles/fstests/defaults/main.yml b/playbooks/roles/fstests/defaults/main.yml
> index 2f70f9549cde..4a1f5dec5827 100644
> --- a/playbooks/roles/fstests/defaults/main.yml
> +++ b/playbooks/roles/fstests/defaults/main.yml
> @@ -30,6 +30,9 @@ fstests_test_logdev_mkfs_opts: "/dev/null"
>  fstests_test_dev_zns: "/dev/null"
>  fstests_zns_enabled: False
>  
> +fstests_soak_duration_enable: False
> +fstests_soak_duration: 0
> +
>  fstests_uses_no_devices: False
>  fstests_generate_simple_config_enable: False
>  fstests_generate_nvme_live_config_enable: False
> diff --git a/workflows/fstests/Kconfig b/workflows/fstests/Kconfig
> index 985a7847b6c7..bbd8927b3cd3 100644
> --- a/workflows/fstests/Kconfig
> +++ b/workflows/fstests/Kconfig
> @@ -760,15 +760,23 @@ config FSTESTS_RUN_LARGE_DISK_TESTS
>  	  to run. The "large disk" requirement is test dependent, but
>  	  typically, it means a disk with capacity of at several 10G.
>  
> -config FSTESTS_SOAK_DURATION
> -	int "Custom Soak duration to be used"
> -	default 0
> +config FSTESTS_ENABLE_SOAK_DURATION
> +	bool "Enable custom soak duration time"
>  	help
> -	  Custom Soak duration to be used during test execution. If you set this
> -	  to a non-zero value then fstests will increase the amount of time it
> -	  takes to run certain tests which are time based and support using
> -	  SOAK_DURATION. A moderate high value setting for this is 9900 which is
> -	  2.5 hours.
> +	  Enable soak duration to be used during test execution. If you are not
> +	  interested in extending your testing then leave this disabled.
> +
> +	  Using a custom soak duration to a non-zero value then fstests will
> +	  increase the amount of time it takes to run certain tests which are
> +	  time based and support using SOAK_DURATION. A moderate high value
> +	  setting for this is 9900 which is 2.5 hours.

"A moderately high setting for this is "2.5h" for 2.5 hours."

FWIW, the ./check parser translates floating point numbers with suffixes
to integer seconds; see soak_duration.awk.

The part I don't know is if kdevops merely passes through the value as a
string; or actually treats this as an integer.  If the latter, then
please ignore my comment.

> +
> +	  Note that we have 46 tests today which will be able to use soak
> +	  duration if set. This means your test time will increase by the
> +	  soak duration * these number of tests. When soak duration is
> +	  enabled the test specific watchdog fstests_watchdog.py will be
> +	  aware of tests which require soak duration and consider before
> +	  reporting a possible hang.
>  
>  	  As of 2023-10-31 that consists of the following tests which use either
>  	  fsstress or fsx or fio. Tests either use SOAK_DURATION directly or they
> @@ -786,7 +794,7 @@ config FSTESTS_SOAK_DURATION
>  	  - generic/648 - fsstress + disk failures on loopback
>  	  - generic/650 - fsstress - multithreaded write + CPU hotplug
>  
> -	  The tests below use _scratch_xfs_stress_scrub() to stress
> +	  All the tests below use _scratch_xfs_stress_scrub() to stress
>  	  test an with fsstress with scrub or an alternate xfs_db operation.
>  
>  	  - xfs/285
> @@ -825,4 +833,67 @@ config FSTESTS_SOAK_DURATION
>  	  - xfs/729
>  	  - xfs/800
>  
> +if FSTESTS_ENABLE_SOAK_DURATION
> +
> +choice
> +	prompt "Soak duration value to use"
> +	default FSTESTS_SOAK_DURATION_HIGH
> +
> +config FSTESTS_SOAK_DURATION_CUSTOM
> +	bool "Custom"
> +	help
> +	  You want to specify the value yourself.
> +
> +config FSTESTS_SOAK_DURATION_PATHALOGICAL

"PATHOLOGICAL", and yes that high a setting is pathological. ;)

(Unless you're allocating one soak-fstest per VM in which case "1w"
might be appropriate.)

> +	bool "High (48 hours)"
> +	help
> +	  Use 48 hours for soak duration.
> +
> +	  Using this with 46 tests known to use soak duration means your test
> +	  time will increase by about 92 days, or a bit over 3 months if run
> +	  serially.
> +
> +config FSTESTS_SOAK_DURATION_HIGH
> +	bool "High (2.5 hours)"
> +	help
> +	  Use 2.5 hours for soak duration.
> +
> +	  Using this with 46 tests known to use soak duration means your test
> +	  time will increase by about 5 days if run serially.
> +
> +config FSTESTS_SOAK_DURATION_MID
> +	bool "Mid (1 hour)"
> +	help
> +	  Use 1 hour for soak duration.
> +
> +	  Using this with 46 tests known to use soak duration means your test
> +	  time will increase by about 2 days if run serially.

I wonder, is there any way to scan the number of soak test to generate
these figures automatically at configure time?  I'd guess no, since
kdevops kconfig comes before pulling and compiling, right?

--D

> +
> +config FSTESTS_SOAK_DURATION_LOW
> +	bool "Low (30 minutes)
> +	help
> +	  Use 30 minutes for soak duration.
> +
> +	  Using this with 46 tests known to use soak duration means your test
> +	  time will increase by about 1 day if run serially.
> +
> +endchoice
> +
> +config FSTESTS_SOAK_DURATION_CUSTOM_VAL
> +	int "Custom soak duration value (seconds)"
> +	default 0
> +	depends on FSTESTS_SOAK_DURATION_CUSTOM
> +	help
> +	  Enter your custom soak duration value in seconds.
> +
> +endif # FSTESTS_ENABLE_SOAK_DURATION
> +
> +config FSTESTS_SOAK_DURATION
> +	default 0 if !FSTESTS_ENABLE_SOAK_DURATION
> +	default FSTESTS_SOAK_DURATION_CUSTOM_VAL if FSTESTS_SOAK_DURATION_CUSTOM
> +	default 1800 if FSTESTS_SOAK_DURATION_LOW
> +	default 3600 if FSTESTS_SOAK_DURATION_MID
> +	default 9900 if FSTESTS_SOAK_DURATION_HIGH
> +	default 172800 if FSTESTS_SOAK_DURATION_PATHALOGICAL
> +
>  endif # KDEVOPS_WORKFLOW_ENABLE_FSTESTS
> diff --git a/workflows/fstests/Makefile.sparsefiles b/workflows/fstests/Makefile.sparsefiles
> index c5ca20a9c462..7dd129c4f9cc 100644
> --- a/workflows/fstests/Makefile.sparsefiles
> +++ b/workflows/fstests/Makefile.sparsefiles
> @@ -44,6 +44,10 @@ FSTESTS_ARGS += run_large_disk_tests='$(FSTESTS_RUN_LARGE_DISK_TESTS)'
>  FSTESTS_ARGS += run_auto_group_tests='$(FSTESTS_RUN_AUTO_GROUP_TESTS)'
>  FSTESTS_ARGS += run_custom_group_tests='$(FSTESTS_RUN_CUSTOM_GROUP_TESTS)'
>  FSTESTS_ARGS += exclude_test_groups='$(CONFIG_FSTESTS_EXCLUDE_TEST_GROUPS)'
> +
> +ifeq (y,$(CONFIG_FSTESTS_ENABLE_SOAK_DURATION))
> +FSTESTS_ARGS += fstests_soak_duration_enable='True'
> +endif
>  FSTESTS_ARGS += fstests_soak_duration='$(CONFIG_FSTESTS_SOAK_DURATION)'
>  
>  ifeq (y,$(CONFIG_FSTESTS_ENABLE_RUN_CUSTOM_TESTS))
> -- 
> 2.42.0
> 
> 

  reply	other threads:[~2024-01-25 22:29 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-25 22:10 [PATCH kdevops] fstests: provide kconfig guidance for SOAK_DURATION Luis Chamberlain
2024-01-25 22:29 ` Darrick J. Wong [this message]
2024-01-29 20:49   ` Luis Chamberlain
2024-01-25 23:24 ` Matthew Wilcox
2024-01-29 20:57   ` Luis Chamberlain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240125222956.GD6188@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=amir73il@gmail.com \
    --cc=chandan.babu@oracle.com \
    --cc=da.gomez@samsung.com \
    --cc=david@fromorbit.com \
    --cc=fstests@vger.kernel.org \
    --cc=gost.dev@samsung.com \
    --cc=kdevops@lists.linux.dev \
    --cc=kent.overstreet@linux.dev \
    --cc=kernel@pankajraghav.com \
    --cc=mcgrof@kernel.org \
    --cc=p.raghav@samsung.com \
    --cc=willy@infradead.org \
    --cc=zlang@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox