* Re: [LSF/MM/BPF TOPIC] Long Duration Stress Testing Filesystems [not found] <20250203185519.GA2888598@zen.localdomain> @ 2025-02-03 19:12 ` Amir Goldstein 2025-02-04 0:57 ` Dave Chinner 0 siblings, 1 reply; 4+ messages in thread From: Amir Goldstein @ 2025-02-03 19:12 UTC (permalink / raw) To: Boris Burkov; +Cc: lsf-pc, linux-fsdevel, fstests CC fstests On Mon, Feb 3, 2025 at 7:54 PM Boris Burkov <boris@bur.io> wrote: > > At Meta, we currently primarily rely on fstests 'auto' runs for > validating Btrfs as a general purpose filesystem for all of our root > drives. While this has obviously proven to be a very useful test suite > with rich collaboration across teams and filesystems, we have observed a > recent trend in our production filesystem issues that makes us question > if it is sufficient. > > Over the last few years, we have had a number of issues (primarily in > Btrfs, but at least one notable one in Xfs) that have been detected in > production, then reproduced with an unreliable non-specific stressor > that takes hours or even days to trigger the issue. > Examples: > - Btrfs relocation bugs > https://lore.kernel.org/linux-btrfs/68766e66ed15ca2e7550585ed09434249db912a2.1727212293.git.josef@toxicpanda.com/ > https://lore.kernel.org/linux-btrfs/fc61fb63e534111f5837c204ec341c876637af69.1731513908.git.josef@toxicpanda.com/ > - Btrfs extent map merging corruption > https://lore.kernel.org/linux-btrfs/9b98ba80e2cf32f6fb3b15dae9ee92507a9d59c7.1729537596.git.boris@bur.io/ > - Btrfs dio data corruptions from bio splitting > (mostly our internal errors trying to make minimal backports of > https://lore.kernel.org/linux-btrfs/cover.1679512207.git.boris@bur.io/ > and Christoph's related series) > - Xfs large folios > https://lore.kernel.org/linux-fsdevel/effc0ec7-cf9d-44dc-aee5-563942242522@meta.com/ > > In my view, the common threads between these are that: > - we used fstests to validate these systems, in some cases even with > specific regression tests for highly related bugs, but still missed > the bugs until they hit us during our production release process. In > all cases, we had passing 'fstests -g auto' runs. > - were able to reproduce the bugs with a predictable concoction of "run > a workload and some known nasty btrfs operations in parallel". The most > common form of this was running 'fsstress' and 'btrfs balance', but it > wasn't quite universal. Sometimes we needed reflink threads, or > drop_caches, or memory pressure, etc. to trigger a bug. > - The relatively generic stressing reproducers took hours or days to > produce an issue then the investigating engineer could try to tweak and > tune it by trial and error to bring that time down for a particular bug. > > This leads me to the conclusion that there is some room for improvement in > stress testing filesystems (at least Btrfs). > > I attempted to study the prior art on this and so far have found: > - fsstress/fsx and the attendant tests in fstests/. There are ~150-200 > tests using fsstress and fsx in fstests/. Most of them are xfs and > btrfs tests following the aforementioned pattern of racing fsstress > with some scary operations. Most of them tend to run for 30s, though > some are longer (and of course subject to TIME_FACTOR configuration) > - Similar duration error injection tests in fstests (e.g. generic/475) > - The NFSv4 Test Project > https://www.kernel.org/doc/ols/2006/ols2006v2-pages-275-294.pdf > A choice quote regarding stress testing: > "One year after we started using FSSTRESS (in April 2005) Linux NFSv4 > was able to sustain the concurrent load of 10 processes during 24 > hours, without any problem. Three months later, NFSv4 reached 72 hours > of stress under FSSTRESS, without any bugs. From this date, NFSv4 > filesystem tree manipulation is considered to be stable." > > > I would like to discuss: > - Am I missing other strategies people are employing? Apologies if there > are obvious ones, but I tried to hunt around for a few days :) > - What is the universe of interesting stressors (e.g., reflink, scrub, > online repair, balance, etc.) > - What is the universe of interesting validation conditions (e.g., > kernel panic, read only fs, fsck failure, data integrity error, etc.) > - Is there any interest in automating longer running fsstress runs? Are > people already doing this with varying TIME_FACTOR configurations in > fstests? > - There is relatively less testing with fsx than fsstress in fstests. > I believe this creates gaps for data corruption bugs rather than > "feature logic" issues that the fsstress feature set tends to hit. > - Can we standardize on some modular "stressors" and stress durations > to run to validate file systems? > > In the short term, I have been working on these ideas in a separate > barebones stress testing framework which I am happy to share, but isn't > particularly interesting in and of itself. It is basically just a > skeleton for concurrently running some concurrent "stressors" and then > validating the fs with some generic "validators". I plan to run it > internally just to see if I can get some useful results on our next few > major kernel releases. > > And of course, I would love to discuss anything else of interest to > people who like stress testing filesystems! > > Thanks, > Boris > ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Long Duration Stress Testing Filesystems 2025-02-03 19:12 ` [LSF/MM/BPF TOPIC] Long Duration Stress Testing Filesystems Amir Goldstein @ 2025-02-04 0:57 ` Dave Chinner 2025-02-04 19:58 ` Boris Burkov 0 siblings, 1 reply; 4+ messages in thread From: Dave Chinner @ 2025-02-04 0:57 UTC (permalink / raw) To: Amir Goldstein; +Cc: Boris Burkov, lsf-pc, linux-fsdevel, fstests On Mon, Feb 03, 2025 at 08:12:59PM +0100, Amir Goldstein wrote: > CC fstests > > On Mon, Feb 3, 2025 at 7:54 PM Boris Burkov <boris@bur.io> wrote: > > > > At Meta, we currently primarily rely on fstests 'auto' runs for > > validating Btrfs as a general purpose filesystem for all of our root > > drives. While this has obviously proven to be a very useful test suite > > with rich collaboration across teams and filesystems, we have observed a > > recent trend in our production filesystem issues that makes us question > > if it is sufficient. > > > > Over the last few years, we have had a number of issues (primarily in > > Btrfs, but at least one notable one in Xfs) that have been detected in > > production, then reproduced with an unreliable non-specific stressor > > that takes hours or even days to trigger the issue. > > Examples: > > - Btrfs relocation bugs > > https://lore.kernel.org/linux-btrfs/68766e66ed15ca2e7550585ed09434249db912a2.1727212293.git.josef@toxicpanda.com/ > > https://lore.kernel.org/linux-btrfs/fc61fb63e534111f5837c204ec341c876637af69.1731513908.git.josef@toxicpanda.com/ > > - Btrfs extent map merging corruption > > https://lore.kernel.org/linux-btrfs/9b98ba80e2cf32f6fb3b15dae9ee92507a9d59c7.1729537596.git.boris@bur.io/ > > - Btrfs dio data corruptions from bio splitting > > (mostly our internal errors trying to make minimal backports of > > https://lore.kernel.org/linux-btrfs/cover.1679512207.git.boris@bur.io/ > > and Christoph's related series) > > - Xfs large folios > > https://lore.kernel.org/linux-fsdevel/effc0ec7-cf9d-44dc-aee5-563942242522@meta.com/ > > > > In my view, the common threads between these are that: > > - we used fstests to validate these systems, in some cases even with > > specific regression tests for highly related bugs, but still missed > > the bugs until they hit us during our production release process. In > > all cases, we had passing 'fstests -g auto' runs. Have you considered the 'soak' test group with a long SOAK_DURATION and then increasing the load using LOAD_FACTOR? Also there is a 'stress' group that TIME_FACTOR acts on. For XFS, there's also bunch of fuzzing tests (in the dangerous_fuzzers group) that use the same SOAK_DURATION infrastructure via common/fuzzy. > > - were able to reproduce the bugs with a predictable concoction of "run > > a workload and some known nasty btrfs operations in parallel". The most > > common form of this was running 'fsstress' and 'btrfs balance', but it > > wasn't quite universal. Sometimes we needed reflink threads, or > > drop_caches, or memory pressure, etc. to trigger a bug. That's pretty much what check-parallel does to a system. Loads of tests run things like drop_caches, memory compaction, CPU hotplug, etc. check-parallel essentially exposes every test to these sorts of background perturbations rather than just the one test that is running that perturbation. IOWs, even the most basic correctness test now gets exercised while cpu hotplug and memory compaction are going on in the background.... Eventually, I plan to implement these background perturbations as separate control tasks for check-parallel so we don't need specific tests that run a background perturbation whilst the rest of the system is under test. > > - The relatively generic stressing reproducers took hours or days to > > produce an issue then the investigating engineer could try to tweak and > > tune it by trial and error to bring that time down for a particular bug. > > > > This leads me to the conclusion that there is some room for improvement in > > stress testing filesystems (at least Btrfs). > > > > I attempted to study the prior art on this and so far have found: > > - fsstress/fsx and the attendant tests in fstests/. There are ~150-200 > > tests using fsstress and fsx in fstests/. Most of them are xfs and > > btrfs tests following the aforementioned pattern of racing fsstress > > with some scary operations. Most of them tend to run for 30s, though > > some are longer (and of course subject to TIME_FACTOR configuration) As per above, SOAK_DURATION. > > - Similar duration error injection tests in fstests (e.g. generic/475) > > - The NFSv4 Test Project > > https://www.kernel.org/doc/ols/2006/ols2006v2-pages-275-294.pdf > > A choice quote regarding stress testing: > > "One year after we started using FSSTRESS (in April 2005) Linux NFSv4 > > was able to sustain the concurrent load of 10 processes during 24 > > hours, without any problem. Three months later, NFSv4 reached 72 hours > > of stress under FSSTRESS, without any bugs. From this date, NFSv4 > > filesystem tree manipulation is considered to be stable." > > > > > > I would like to discuss: > > - Am I missing other strategies people are employing? Apologies if there > > are obvious ones, but I tried to hunt around for a few days :) check-parallel. > > - What is the universe of interesting stressors (e.g., reflink, scrub, > > online repair, balance, etc.) memory compaction, cpu hotplug, random reflinks of the underlying loop device image files to simulate dynamic VM image file snapshots, etc. > > - What is the universe of interesting validation conditions (e.g., > > kernel panic, read only fs, fsck failure, data integrity error, etc.) All of them. That's the point of check-parallel - it uses simple, existing filesystem correctness tests to generate a massively stressful load on the system... > > - Is there any interest in automating longer running fsstress runs? Are > > people already doing this with varying TIME_FACTOR configurations in > > fstests? At least for XFS, Darrick is already doing that, and I think Carlos may be as well. > > - There is relatively less testing with fsx than fsstress in fstests. > > I believe this creates gaps for data corruption bugs rather than > > "feature logic" issues that the fsstress feature set tends to hit. > > - Can we standardize on some modular "stressors" and stress durations > > to run to validate file systems? I think we already have that with the "soak" and "stress" groups... > > In the short term, I have been working on these ideas in a separate > > barebones stress testing framework which I am happy to share, but isn't > > particularly interesting in and of itself. It is basically just a > > skeleton for concurrently running some concurrent "stressors" and then > > validating the fs with some generic "validators". I plan to run it > > internally just to see if I can get some useful results on our next few > > major kernel releases. check-parallel is effectively a massive concurrent stress workload for the system. It does this by running many individual correctness tests concurrently. Run it on a 64p system or larger, and it will hammer both the test filesystems and base filesystem that all the loop device image files are laid out on. I'm seeing it generate 5-6GB/s of IO load, 40-50GB of memory usage, and consistently use >90% of the CPU in the system stress the scheduler at over half a million context switches/s. > > And of course, I would love to discuss anything else of interest to > > people who like stress testing filesystems! Filesystem stress testing by itself isn't really interesting to me. Using filesystem correctness tests to create massively stressful workloads, OTOH, attacks the problem from multiple angles and exercises the system well outside the bounds of just filesystem code. -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Long Duration Stress Testing Filesystems 2025-02-04 0:57 ` Dave Chinner @ 2025-02-04 19:58 ` Boris Burkov 2025-02-04 21:14 ` Dave Chinner 0 siblings, 1 reply; 4+ messages in thread From: Boris Burkov @ 2025-02-04 19:58 UTC (permalink / raw) To: Dave Chinner; +Cc: Amir Goldstein, lsf-pc, linux-fsdevel, fstests On Tue, Feb 04, 2025 at 11:57:09AM +1100, Dave Chinner wrote: > On Mon, Feb 03, 2025 at 08:12:59PM +0100, Amir Goldstein wrote: > > CC fstests > > > > On Mon, Feb 3, 2025 at 7:54 PM Boris Burkov <boris@bur.io> wrote: > > > > > > At Meta, we currently primarily rely on fstests 'auto' runs for > > > validating Btrfs as a general purpose filesystem for all of our root > > > drives. While this has obviously proven to be a very useful test suite > > > with rich collaboration across teams and filesystems, we have observed a > > > recent trend in our production filesystem issues that makes us question > > > if it is sufficient. > > > > > > Over the last few years, we have had a number of issues (primarily in > > > Btrfs, but at least one notable one in Xfs) that have been detected in > > > production, then reproduced with an unreliable non-specific stressor > > > that takes hours or even days to trigger the issue. > > > Examples: > > > - Btrfs relocation bugs > > > https://lore.kernel.org/linux-btrfs/68766e66ed15ca2e7550585ed09434249db912a2.1727212293.git.josef@toxicpanda.com/ > > > https://lore.kernel.org/linux-btrfs/fc61fb63e534111f5837c204ec341c876637af69.1731513908.git.josef@toxicpanda.com/ > > > - Btrfs extent map merging corruption > > > https://lore.kernel.org/linux-btrfs/9b98ba80e2cf32f6fb3b15dae9ee92507a9d59c7.1729537596.git.boris@bur.io/ > > > - Btrfs dio data corruptions from bio splitting > > > (mostly our internal errors trying to make minimal backports of > > > https://lore.kernel.org/linux-btrfs/cover.1679512207.git.boris@bur.io/ > > > and Christoph's related series) > > > - Xfs large folios > > > https://lore.kernel.org/linux-fsdevel/effc0ec7-cf9d-44dc-aee5-563942242522@meta.com/ > > > > > > In my view, the common threads between these are that: > > > - we used fstests to validate these systems, in some cases even with > > > specific regression tests for highly related bugs, but still missed > > > the bugs until they hit us during our production release process. In > > > all cases, we had passing 'fstests -g auto' runs. > > Have you considered the 'soak' test group with a long SOAK_DURATION > and then increasing the load using LOAD_FACTOR? Also there is a > 'stress' group that TIME_FACTOR acts on. > > For XFS, there's also bunch of fuzzing tests (in the > dangerous_fuzzers group) that use the same SOAK_DURATION > infrastructure via common/fuzzy. I hadn't realized people were running these for multi-day durations. Thanks for pointing them out and for your other inline answers to my questions. > > > > > - were able to reproduce the bugs with a predictable concoction of "run > > > a workload and some known nasty btrfs operations in parallel". The most > > > common form of this was running 'fsstress' and 'btrfs balance', but it > > > wasn't quite universal. Sometimes we needed reflink threads, or > > > drop_caches, or memory pressure, etc. to trigger a bug. > > That's pretty much what check-parallel does to a system. Loads of > tests run things like drop_caches, memory compaction, CPU hotplug, > etc. check-parallel essentially exposes every test to these sorts > of background perturbations rather than just the one test that is > running that perturbation. IOWs, even the most basic correctness > test now gets exercised while cpu hotplug and memory compaction are > going on in the background.... > > Eventually, I plan to implement these background perturbations as > separate control tasks for check-parallel so we don't need specific > tests that run a background perturbation whilst the rest of the > system is under test. I think that a framework for introducing background perturbations while running tests is definitely what I'm getting at. If check-parallel is a good version of that, then that sounds great to me. I am particularly excited about your point that it will smash together *every* stimulus with *every* test. I do have some questions in my head about how that would work in practice. My main questions/concerns are: How much do you randomize the interleaving of tests? Does check-parallel run them in a random order? Similarly, their durations are not at all tuned to maximize interesting interactions. If test X and test Y would collide on some faulty interaction, but test X runs once in 1 second, then you would likely never see test X interfere with some interesting moment during test Y. Are you considering feeding the tests back into the run-queue as they finish for these stress style runs? It seems that the two objectives of the test harness are sort of in tension with using check-parallel to stress things. On one hand you want tests to independently succeed or fail and on the other hand you want noise from one test to disturb the other. I fear more of the failures will turn out to be "Oh, well, when THAT happens, we would expect this condition to be violated". Especially for the more "unit test" style fstests that carefully use sync to check specific conditions during a run. This variant also feels like it would be at the extreme of difficulty for attempting to distill a failure into a reproducer. > > > > - The relatively generic stressing reproducers took hours or days to > > > produce an issue then the investigating engineer could try to tweak and > > > tune it by trial and error to bring that time down for a particular bug. > > > > > > This leads me to the conclusion that there is some room for improvement in > > > stress testing filesystems (at least Btrfs). > > > > > > I attempted to study the prior art on this and so far have found: > > > - fsstress/fsx and the attendant tests in fstests/. There are ~150-200 > > > tests using fsstress and fsx in fstests/. Most of them are xfs and > > > btrfs tests following the aforementioned pattern of racing fsstress > > > with some scary operations. Most of them tend to run for 30s, though > > > some are longer (and of course subject to TIME_FACTOR configuration) > > As per above, SOAK_DURATION. > > > > - Similar duration error injection tests in fstests (e.g. generic/475) > > > - The NFSv4 Test Project > > > https://www.kernel.org/doc/ols/2006/ols2006v2-pages-275-294.pdf > > > A choice quote regarding stress testing: > > > "One year after we started using FSSTRESS (in April 2005) Linux NFSv4 > > > was able to sustain the concurrent load of 10 processes during 24 > > > hours, without any problem. Three months later, NFSv4 reached 72 hours > > > of stress under FSSTRESS, without any bugs. From this date, NFSv4 > > > filesystem tree manipulation is considered to be stable." > > > > > > > > > I would like to discuss: > > > - Am I missing other strategies people are employing? Apologies if there > > > are obvious ones, but I tried to hunt around for a few days :) > > check-parallel. > > > > - What is the universe of interesting stressors (e.g., reflink, scrub, > > > online repair, balance, etc.) > > memory compaction, cpu hotplug, random reflinks of the underlying > loop device image files to simulate dynamic VM image file snapshots, > etc. > > > > - What is the universe of interesting validation conditions (e.g., > > > kernel panic, read only fs, fsck failure, data integrity error, etc.) > > All of them. That's the point of check-parallel - it uses simple, > existing filesystem correctness tests to generate a massively > stressful load on the system... > > > > - Is there any interest in automating longer running fsstress runs? Are > > > people already doing this with varying TIME_FACTOR configurations in > > > fstests? > > At least for XFS, Darrick is already doing that, and I think Carlos > may be as well. > > > > - There is relatively less testing with fsx than fsstress in fstests. > > > I believe this creates gaps for data corruption bugs rather than > > > "feature logic" issues that the fsstress feature set tends to hit. > > > - Can we standardize on some modular "stressors" and stress durations > > > to run to validate file systems? > > I think we already have that with the "soak" and "stress" groups... > > > > In the short term, I have been working on these ideas in a separate > > > barebones stress testing framework which I am happy to share, but isn't > > > particularly interesting in and of itself. It is basically just a > > > skeleton for concurrently running some concurrent "stressors" and then > > > validating the fs with some generic "validators". I plan to run it > > > internally just to see if I can get some useful results on our next few > > > major kernel releases. > > check-parallel is effectively a massive concurrent stress workload > for the system. It does this by running many individual correctness > tests concurrently. > > Run it on a 64p system or larger, and it will hammer both the test > filesystems and base filesystem that all the loop device image files > are laid out on. I'm seeing it generate 5-6GB/s of IO load, 40-50GB > of memory usage, and consistently use >90% of the CPU in the system > stress the scheduler at over half a million context switches/s. I will definitely invest some time into getting check-parallel to run with btrfs, and hopefully it turns up some interesting stuff. > > > > And of course, I would love to discuss anything else of interest to > > > people who like stress testing filesystems! > > Filesystem stress testing by itself isn't really interesting to me. > Using filesystem correctness tests to create massively stressful > workloads, OTOH, attacks the problem from multiple angles and > exercises the system well outside the bounds of just filesystem > code. From what I see, today we have a handful of tests which race fsx or fsstress with 0-2 operations under test, and you are proposing using check-parallel to hammer the computer with the entirety of all 1000 tests in parallel (awesome). I think I am proposing something in between where we run fsx AND fsstress AND ~10 known scary operations. That has proven to dredge up bugs in btrfs (where the simpler fsstress plus one thing doesn't). I think check-parallel will be more stressful, but that this "mega fsstress run" will be more predictable and easier to tune/get reproducers out of. Thanks again for your thoughts, Boris > > -Dave. > -- > Dave Chinner > david@fromorbit.com ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [LSF/MM/BPF TOPIC] Long Duration Stress Testing Filesystems 2025-02-04 19:58 ` Boris Burkov @ 2025-02-04 21:14 ` Dave Chinner 0 siblings, 0 replies; 4+ messages in thread From: Dave Chinner @ 2025-02-04 21:14 UTC (permalink / raw) To: Boris Burkov; +Cc: Amir Goldstein, lsf-pc, linux-fsdevel, fstests On Tue, Feb 04, 2025 at 11:58:46AM -0800, Boris Burkov wrote: > On Tue, Feb 04, 2025 at 11:57:09AM +1100, Dave Chinner wrote: > > > > - were able to reproduce the bugs with a predictable concoction of "run > > > > a workload and some known nasty btrfs operations in parallel". The most > > > > common form of this was running 'fsstress' and 'btrfs balance', but it > > > > wasn't quite universal. Sometimes we needed reflink threads, or > > > > drop_caches, or memory pressure, etc. to trigger a bug. > > > > That's pretty much what check-parallel does to a system. Loads of > > tests run things like drop_caches, memory compaction, CPU hotplug, > > etc. check-parallel essentially exposes every test to these sorts > > of background perturbations rather than just the one test that is > > running that perturbation. IOWs, even the most basic correctness > > test now gets exercised while cpu hotplug and memory compaction are > > going on in the background.... > > > > Eventually, I plan to implement these background perturbations as > > separate control tasks for check-parallel so we don't need specific > > tests that run a background perturbation whilst the rest of the > > system is under test. > > I think that a framework for introducing background perturbations while > running tests is definitely what I'm getting at. If check-parallel is a > good version of that, then that sounds great to me. I am particularly > excited about your point that it will smash together *every* stimulus > with *every* test. I do have some questions in my head about how that > would work in practice. > > My main questions/concerns are: > > How much do you randomize the interleaving of tests? Does > check-parallel run them in a random order? Same as check - the "-r" option will randomise the test run order. The test run order is also somewhat randomised by default in that it sorts the test run order based on the runtime of each test in the previous test run. Hence test run order is not static - it generally runs long running tests before slow running tests, but the exact order is not fixed. > Similarly, their durations are not at all tuned to maximize > interesting interactions. If test X and test Y would collide on some > faulty interaction, but test X runs once in 1 second, then you would > likely never see test X interfere with some interesting moment during > test Y. Are you considering feeding the tests back into the run-queue > as they finish for these stress style runs? Not yet - the infrastructure to directly manage and run tests from check-parallel is not yet in place. It currently generates a test list for each runner thread then executes that via a check instance per runner thread. I plan to have check-parallel execute tests individually itself by factoring the run loop out of check (similar to how I'm doing the test list parsing). Once there is direct control of the test execution, stuff like dynamic test queues where runners just pull the next test to run off the queue and they keep going until the queue is empty will be possible. > It seems that the two objectives of the test harness are sort of in > tension with using check-parallel to stress things. On one hand you > want tests to independently succeed or fail and on the other hand you > want noise from one test to disturb the other. Yes. Tests are largely written such that they don't interfere with each other. > I fear more of the > failures will turn out to be "Oh, well, when THAT happens, we would > expect this condition to be violated". Especially for the more "unit > test" style fstests that carefully use sync to check specific conditions > during a run. That's why I currently have a "unreliable_in_parallel" test group definition and check-parallel excludes that test group. There's about 20 tests I've classified this way, most of them xfs specific tests that are reliant on exact fragmentation patterns being created. This tests are perturbed by things like sync(1) calls from other tests which results in a different fragmentation pattern than the test expects to see. In each case, there is a comment in the test explaining the condition that makes the test unreliable in parallel, and so we have some idea of what needs fixing to be able to remove it from the unreliable_in_parallel group. Essentially, I'm using this as a marker and note for future improvements once all the (more important) infrastructure work is done and solid. > This variant also feels like it would be at the extreme of difficulty > for attempting to distill a failure into a reproducer. It's pretty obvious when a test is doing something that is influenced by an outside event. The biggest problem for debugging them comes when the test failures appear to be real bugs (e.g. all the weird and whacky off-by-one quota failures that check-parallel triggers on XFS) but they cannot be reproduced when the tests are run serially. ..... > > > > And of course, I would love to discuss anything else of interest to > > > > people who like stress testing filesystems! > > > > Filesystem stress testing by itself isn't really interesting to me. > > Using filesystem correctness tests to create massively stressful > > workloads, OTOH, attacks the problem from multiple angles and > > exercises the system well outside the bounds of just filesystem > > code. > > From what I see, today we have a handful of tests which race fsx or > fsstress with 0-2 operations under test, and you are proposing using > check-parallel to hammer the computer with the entirety of all 1000 > tests in parallel (awesome). It's currently running one test per CPU in parallel, not all at once. Many tests run lots of stuff in parallel themselves, too, and some of them hammer large CPU count machines really hard just by themselves, let alone when there's another 63 tests running concurrently.... > I think I am proposing something in between > where we run fsx AND fsstress AND ~10 known scary operations. Write a set of tests that do this for btrfs and put them in the auto/stress/soak groups. Then run 'check-parallel -g soak,stress ....' -Dave. -- Dave Chinner david@fromorbit.com ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-02-04 21:14 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <20250203185519.GA2888598@zen.localdomain>
2025-02-03 19:12 ` [LSF/MM/BPF TOPIC] Long Duration Stress Testing Filesystems Amir Goldstein
2025-02-04 0:57 ` Dave Chinner
2025-02-04 19:58 ` Boris Burkov
2025-02-04 21:14 ` Dave Chinner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox