All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <djwong@kernel.org>
To: Boris Burkov <boris@bur.io>
Cc: lsf-pc@lists.linux-foundation.org, linux-fsdevel@vger.kernel.org
Subject: Re: [LSF/MM/BPF TOPIC] Long Duration Stress Testing Filesystems
Date: Mon, 3 Feb 2025 11:53:43 -0800	[thread overview]
Message-ID: <20250203195343.GA134490@frogsfrogsfrogs> (raw)
In-Reply-To: <20250203185519.GA2888598@zen.localdomain>

On Mon, Feb 03, 2025 at 10:55:19AM -0800, Boris Burkov wrote:
> At Meta, we currently primarily rely on fstests 'auto' runs for
> validating Btrfs as a general purpose filesystem for all of our root
> drives. While this has obviously proven to be a very useful test suite
> with rich collaboration across teams and filesystems, we have observed a
> recent trend in our production filesystem issues that makes us question
> if it is sufficient.
> 
> Over the last few years, we have had a number of issues (primarily in
> Btrfs, but at least one notable one in Xfs) that have been detected in
> production, then reproduced with an unreliable non-specific stressor
> that takes hours or even days to trigger the issue.
> Examples:
> - Btrfs relocation bugs
> https://lore.kernel.org/linux-btrfs/68766e66ed15ca2e7550585ed09434249db912a2.1727212293.git.josef@toxicpanda.com/
> https://lore.kernel.org/linux-btrfs/fc61fb63e534111f5837c204ec341c876637af69.1731513908.git.josef@toxicpanda.com/
> - Btrfs extent map merging corruption
> https://lore.kernel.org/linux-btrfs/9b98ba80e2cf32f6fb3b15dae9ee92507a9d59c7.1729537596.git.boris@bur.io/
> - Btrfs dio data corruptions from bio splitting
> (mostly our internal errors trying to make minimal backports of
> https://lore.kernel.org/linux-btrfs/cover.1679512207.git.boris@bur.io/
> and Christoph's related series)
> - Xfs large folios 
> https://lore.kernel.org/linux-fsdevel/effc0ec7-cf9d-44dc-aee5-563942242522@meta.com/
> 
> In my view, the common threads between these are that:
> - we used fstests to validate these systems, in some cases even with
>   specific regression tests for highly related bugs, but still missed
>   the bugs until they hit us during our production release process. In
>   all cases, we had passing 'fstests -g auto' runs.
> - were able to reproduce the bugs with a predictable concoction of "run
>   a workload and some known nasty btrfs operations in parallel". The most
>   common form of this was running 'fsstress' and 'btrfs balance', but it
>   wasn't quite universal. Sometimes we needed reflink threads, or
>   drop_caches, or memory pressure, etc. to trigger a bug.
> - The relatively generic stressing reproducers took hours or days to
>   produce an issue then the investigating engineer could try to tweak and
>   tune it by trial and error to bring that time down for a particular bug.
> 
> This leads me to the conclusion that there is some room for improvement in
> stress testing filesystems (at least Btrfs).
> 
> I attempted to study the prior art on this and so far have found:
> - fsstress/fsx and the attendant tests in fstests/. There are ~150-200
>   tests using fsstress and fsx in fstests/. Most of them are xfs and
>   btrfs tests following the aforementioned pattern of racing fsstress
>   with some scary operations. Most of them tend to run for 30s, though
>   some are longer (and of course subject to TIME_FACTOR configuration)
> - Similar duration error injection tests in fstests (e.g. generic/475)
> - The NFSv4 Test Project
>   https://www.kernel.org/doc/ols/2006/ols2006v2-pages-275-294.pdf 
>   A choice quote regarding stress testing:
>   "One year after we started using FSSTRESS (in April 2005) Linux NFSv4
>   was able to sustain the concurrent load of 10 processes during 24
>   hours, without any problem. Three months later, NFSv4 reached 72 hours
>   of stress under FSSTRESS, without any bugs. From this date, NFSv4
>   filesystem tree manipulation is considered to be stable."
> 
> 
> I would like to discuss:
> - Am I missing other strategies people are employing? Apologies if there
>   are obvious ones, but I tried to hunt around for a few days :)

At the moment I start six VMs per "configuration", which each run one of:

generic/521	(directio)
generic/522	(bufferedio)
generic/476	(fsstress)
generic/388	(fsstress + log recovery)
xfs/285		(online fsck)
xfs/286		(online metadata rebuild)

with SOAK_DURATION=6.5d so that they wrap up right around the time that
each rc release drops.  I also set FSSTRESS_AVOID="-m 16" so that we
don't end up with gigantic quota files.

There are two "configurations" per kernel tree.  The dot product of them
are:

djwong-dev:
-m metadir=1,autofsck=1,uquota,gquota,pquota,
-m metadir=1,autofsck=1,uquota,gquota,pquota, -d rtinherit=1,

tot mainline:
-m autofsck=1, -d rtinherit=1,
-m autofsck=1,

for-next:
-m metadir=1,autofsck=1,uquota,gquota,pquota,
-m metadir=1,autofsck=1,uquota,gquota,pquota, -d rtinherit=1,

Actually, I just realized that with 6.14 I need to update the tot
mainline configuration to have metadir=1.

> - What is the universe of interesting stressors (e.g., reflink, scrub,
>   online repair, balance, etc.)

Prodding djwong and everyone else into loading up fsx/fsstress with
all their weird new file io calls. ;)

> - What is the universe of interesting validation conditions (e.g.,
>   kernel panic, read only fs, fsck failure, data integrity error, etc.)
> - Is there any interest in automating longer running fsstress runs? Are
>   people already doing this with varying TIME_FACTOR configurations in
>   fstests?

I don't run with SOAK_DURATION > 14 days because I generally haven't
found larger values to be useful in finding bugs.  However, these weekly
long soak tests runs have been going since 2016.

FWIW that actually started because we had a lot of customer complaints
in that era about log recovery failures in xfs, and only later did I
spread it beyond generic/388 to the six profiles above.

> - There is relatively less testing with fsx than fsstress in fstests.
>   I believe this creates gaps for data corruption bugs rather than
>   "feature logic" issues that the fsstress feature set tends to hit.

Probably.  I wonder how much we're really flexing io_uring?

--D

> - Can we standardize on some modular "stressors" and stress durations
>   to run to validate file systems?
> 
> In the short term, I have been working on these ideas in a separate
> barebones stress testing framework which I am happy to share, but isn't
> particularly interesting in and of itself. It is basically just a
> skeleton for concurrently running some concurrent "stressors" and then
> validating the fs with some generic "validators". I plan to run it
> internally just to see if I can get some useful results on our next few
> major kernel releases.
> 
> And of course, I would love to discuss anything else of interest to
> people who like stress testing filesystems!
> 
> Thanks,
> Boris
> 

  parent reply	other threads:[~2025-02-03 19:53 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-03 18:55 [LSF/MM/BPF TOPIC] Long Duration Stress Testing Filesystems Boris Burkov
2025-02-03 19:12 ` Amir Goldstein
2025-02-04  0:57   ` Dave Chinner
2025-02-04 19:58     ` Boris Burkov
2025-02-04 21:14       ` Dave Chinner
2025-02-03 19:14 ` Sweet Tea Dorminy
2025-02-03 19:53 ` Darrick J. Wong [this message]
2025-02-04 19:38   ` Boris Burkov
2025-02-04 22:09     ` Darrick J. Wong
2025-02-05  4:38       ` Dave Chinner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250203195343.GA134490@frogsfrogsfrogs \
    --to=djwong@kernel.org \
    --cc=boris@bur.io \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.