* [PATCH 1/4] generic/251: fix infinite looping if fstrim_loop configuration fails
2025-05-21 22:40 [PATCHSET 2/2] fstests: more random fixes for v2025.05.11 Darrick J. Wong
@ 2025-05-21 22:42 ` Darrick J. Wong
2025-05-23 5:13 ` Christoph Hellwig
2025-05-21 22:42 ` [PATCH 2/4] generic/251: skip this test if fstrim geometry detection fails Darrick J. Wong
` (2 subsequent siblings)
3 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2025-05-21 22:42 UTC (permalink / raw)
To: zlang, djwong; +Cc: fstests, fstests, linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
In generic/251, the topmost scope of the test creates a run file
($tmp.fstrim_loop), starts fstrim_loop as a background shell, and then
waits for the file to disappear. Unfortunately, the set_*_constraints
helpers called by fstrim_loop can abort the subshell by invoking
_notrun, in which case the loop never deletes the runfile and the upper
scope waits forever.
Fix this by amending _destroy_fstrim to delete the runfile always, and
move the trap call to the top of the function with a note about why it
must always be called.
Oh but wait, there's a second runfile related bug in run_process -- if
the fstrim loop exits while run_process is looping, it'll keep looping
even though there isn't anything else going on. Break out of the loopin
this case.
Oh but wait, there's a /third/ runfile bug -- if the fstrim_loop exits
before the run_process children, it will delete the runfile. Then the
top level scope, in trying to empty out the runfile to get the
fstrim_loop to exit, recreates the runfile and waits forever for nobody
to delete the run file.
I hate process management in bash.
Cc: <fstests@vger.kernel.org> # v2024.12.01
Fixes: 2d6e7681acff1e ("generic/251: use sentinel files to kill the fstrim loop")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
tests/generic/251 | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/tests/generic/251 b/tests/generic/251
index ec486c277c6828..644adf07cc42d5 100755
--- a/tests/generic/251
+++ b/tests/generic/251
@@ -40,8 +40,9 @@ _destroy()
_destroy_fstrim()
{
- kill $fpid 2> /dev/null
- wait $fpid 2> /dev/null
+ test -n "$fpid" && kill $fpid 2> /dev/null
+ test -n "$fpid" && wait $fpid 2> /dev/null
+ rm -f $tmp.fstrim_loop
}
_fail()
@@ -61,14 +62,14 @@ set_minlen_constraints()
done
test $mmlen -gt 0 || \
_notrun "could not determine maximum FSTRIM minlen param"
- FSTRIM_MAX_MINLEN=$mmlen
+ export FSTRIM_MAX_MINLEN=$mmlen
for ((mmlen = 1; mmlen < FSTRIM_MAX_MINLEN; mmlen *= 2)); do
$FSTRIM_PROG -l $(($mmlen*2))k -m ${mmlen}k $SCRATCH_MNT &> /dev/null && break
done
test $mmlen -le $FSTRIM_MAX_MINLEN || \
_notrun "could not determine minimum FSTRIM minlen param"
- FSTRIM_MIN_MINLEN=$mmlen
+ export FSTRIM_MIN_MINLEN=$mmlen
}
# Set FSTRIM_{MIN,MAX}_LEN to the lower and upper bounds of the -l(ength)
@@ -82,14 +83,14 @@ set_length_constraints()
done
test $mmlen -gt 0 || \
_notrun "could not determine maximum FSTRIM length param"
- FSTRIM_MAX_LEN=$mmlen
+ export FSTRIM_MAX_LEN=$mmlen
for ((mmlen = 1; mmlen < FSTRIM_MAX_LEN; mmlen *= 2)); do
$FSTRIM_PROG -l ${mmlen}k $SCRATCH_MNT &> /dev/null && break
done
test $mmlen -le $FSTRIM_MAX_LEN || \
_notrun "could not determine minimum FSTRIM length param"
- FSTRIM_MIN_LEN=$mmlen
+ export FSTRIM_MIN_LEN=$mmlen
}
##
@@ -99,12 +100,14 @@ set_length_constraints()
##
fstrim_loop()
{
+ # always remove the $tmp.fstrim_loop file on exit
+ trap "_destroy_fstrim; exit \$status" 2 15 EXIT
+
set_minlen_constraints
set_length_constraints
echo "MINLEN max=$FSTRIM_MAX_MINLEN min=$FSTRIM_MIN_MINLEN" >> $seqres.full
echo "LENGTH max=$FSTRIM_MAX_LEN min=$FSTRIM_MIN_LEN" >> $seqres.full
- trap "_destroy_fstrim; exit \$status" 2 15
fsize=$(_discard_max_offset_kb "$SCRATCH_MNT" "$SCRATCH_DEV")
while true ; do
@@ -168,6 +171,7 @@ function run_process() {
cp -axT $content/ $SCRATCH_MNT/$p/
check_sums
+ test -e $tmp.fstrim_loop || break
done
}
@@ -197,7 +201,7 @@ done
echo "done."
wait $pids
-truncate -s 0 $tmp.fstrim_loop
+test -e "$tmp.fstrim_loop" && truncate -s 0 $tmp.fstrim_loop
while test -e $tmp.fstrim_loop; do
sleep 1
done
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 2/4] generic/251: skip this test if fstrim geometry detection fails
2025-05-21 22:40 [PATCHSET 2/2] fstests: more random fixes for v2025.05.11 Darrick J. Wong
2025-05-21 22:42 ` [PATCH 1/4] generic/251: fix infinite looping if fstrim_loop configuration fails Darrick J. Wong
@ 2025-05-21 22:42 ` Darrick J. Wong
2025-05-23 5:13 ` Christoph Hellwig
2025-05-21 22:42 ` [PATCH 3/4] check: unbreak iam Darrick J. Wong
2025-05-21 22:42 ` [PATCH 4/4] check: check and fix the test filesystem after failed tests Darrick J. Wong
3 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2025-05-21 22:42 UTC (permalink / raw)
To: zlang, djwong; +Cc: fstests, fstests, linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
There's no reason to wait until fstrim_loop to detect the length
constraints on the FITRIM call since failing to detect any constraints
just leads to _notrun. Run that stuff from the top level scope so that
we don't start up a bunch of exercisers for no good reason.
Cc: <fstests@vger.kernel.org> # v2023.10.29
Fixes: 95b0db739400c2 ("generic/251: check min and max length and minlen for FSTRIM")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
tests/generic/251 | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/tests/generic/251 b/tests/generic/251
index 644adf07cc42d5..1fb26f4caa3732 100755
--- a/tests/generic/251
+++ b/tests/generic/251
@@ -103,11 +103,6 @@ fstrim_loop()
# always remove the $tmp.fstrim_loop file on exit
trap "_destroy_fstrim; exit \$status" 2 15 EXIT
- set_minlen_constraints
- set_length_constraints
- echo "MINLEN max=$FSTRIM_MAX_MINLEN min=$FSTRIM_MIN_MINLEN" >> $seqres.full
- echo "LENGTH max=$FSTRIM_MAX_LEN min=$FSTRIM_MIN_LEN" >> $seqres.full
-
fsize=$(_discard_max_offset_kb "$SCRATCH_MNT" "$SCRATCH_DEV")
while true ; do
@@ -177,6 +172,12 @@ function run_process() {
nproc=$((4 * LOAD_FACTOR))
+# Discover the fstrim constraints before we do anything else
+set_minlen_constraints
+set_length_constraints
+echo "MINLEN max=$FSTRIM_MAX_MINLEN min=$FSTRIM_MIN_MINLEN" >> $seqres.full
+echo "LENGTH max=$FSTRIM_MAX_LEN min=$FSTRIM_MIN_LEN" >> $seqres.full
+
# Use fsstress to create a directory tree with some variability
content=$SCRATCH_MNT/orig
mkdir -p $content
^ permalink raw reply related [flat|nested] 12+ messages in thread* [PATCH 3/4] check: unbreak iam
2025-05-21 22:40 [PATCHSET 2/2] fstests: more random fixes for v2025.05.11 Darrick J. Wong
2025-05-21 22:42 ` [PATCH 1/4] generic/251: fix infinite looping if fstrim_loop configuration fails Darrick J. Wong
2025-05-21 22:42 ` [PATCH 2/4] generic/251: skip this test if fstrim geometry detection fails Darrick J. Wong
@ 2025-05-21 22:42 ` Darrick J. Wong
2025-05-23 5:13 ` Christoph Hellwig
2025-05-23 13:36 ` Zorro Lang
2025-05-21 22:42 ` [PATCH 4/4] check: check and fix the test filesystem after failed tests Darrick J. Wong
3 siblings, 2 replies; 12+ messages in thread
From: Darrick J. Wong @ 2025-05-21 22:42 UTC (permalink / raw)
To: zlang, djwong; +Cc: fstests, fstests, linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
I don't know why this change was made:
iam=check
to
iam=check.$$
The only users of this variable are:
check:36:iam=check
check:52:rm -f $tmp.list $tmp.tmp $tmp.grep $here/$iam.out $tmp.report.* $tmp.arglist
common/btrfs:216: if [ "$iam" != "check" ]; then
common/overlay:407: if [ "$iam" != "check" ]; then
common/rc:3565: if [ "$iam" != "check" ]; then
common/xfs:1021: if [ "$iam" != "check" ]; then
new:9:iam=new
None of them were ported to notice the pid. Consequently,
_check_generic_filesystem (aka _check_test_fs on an ext4 filesystem)
failing will cause ./check to exit the entire test suite when the test
filesystem is corrupt. That's not what we wanted, particularly since
Leah added a patch to repair the test filesystem between tests.
Cc: <fstests@vger.kernel.org> # v2024.12.08
Fixes: fa0e9712283f0b ("fstests: check-parallel")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
check | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/check b/check
index ede54f6987bcc3..826641268f8b52 100755
--- a/check
+++ b/check
@@ -33,7 +33,7 @@ exclude_tests=()
_err_msg=""
# start the initialisation work now
-iam=check.$$
+iam=check
# mkfs.xfs uses the presence of both of these variables to enable formerly
# supported tiny filesystem configurations that fstests use for fuzz testing
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH 3/4] check: unbreak iam
2025-05-21 22:42 ` [PATCH 3/4] check: unbreak iam Darrick J. Wong
@ 2025-05-23 5:13 ` Christoph Hellwig
2025-05-23 13:36 ` Zorro Lang
1 sibling, 0 replies; 12+ messages in thread
From: Christoph Hellwig @ 2025-05-23 5:13 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: zlang, fstests, linux-xfs
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 3/4] check: unbreak iam
2025-05-21 22:42 ` [PATCH 3/4] check: unbreak iam Darrick J. Wong
2025-05-23 5:13 ` Christoph Hellwig
@ 2025-05-23 13:36 ` Zorro Lang
1 sibling, 0 replies; 12+ messages in thread
From: Zorro Lang @ 2025-05-23 13:36 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: fstests, linux-xfs
On Wed, May 21, 2025 at 03:42:38PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> I don't know why this change was made:
>
> iam=check
>
> to
>
> iam=check.$$
>
> The only users of this variable are:
>
> check:36:iam=check
> check:52:rm -f $tmp.list $tmp.tmp $tmp.grep $here/$iam.out $tmp.report.* $tmp.arglist
> common/btrfs:216: if [ "$iam" != "check" ]; then
> common/overlay:407: if [ "$iam" != "check" ]; then
> common/rc:3565: if [ "$iam" != "check" ]; then
> common/xfs:1021: if [ "$iam" != "check" ]; then
> new:9:iam=new
>
> None of them were ported to notice the pid. Consequently,
> _check_generic_filesystem (aka _check_test_fs on an ext4 filesystem)
> failing will cause ./check to exit the entire test suite when the test
> filesystem is corrupt. That's not what we wanted, particularly since
> Leah added a patch to repair the test filesystem between tests.
>
> Cc: <fstests@vger.kernel.org> # v2024.12.08
> Fixes: fa0e9712283f0b ("fstests: check-parallel")
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
I should revert this line too, when I removed privatens and run_setsid from check.
Thanks for this fixing.
Reviewed-by: Zorro Lang <zlang@redhat.com>
> check | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>
> diff --git a/check b/check
> index ede54f6987bcc3..826641268f8b52 100755
> --- a/check
> +++ b/check
> @@ -33,7 +33,7 @@ exclude_tests=()
> _err_msg=""
>
> # start the initialisation work now
> -iam=check.$$
> +iam=check
>
> # mkfs.xfs uses the presence of both of these variables to enable formerly
> # supported tiny filesystem configurations that fstests use for fuzz testing
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH 4/4] check: check and fix the test filesystem after failed tests
2025-05-21 22:40 [PATCHSET 2/2] fstests: more random fixes for v2025.05.11 Darrick J. Wong
` (2 preceding siblings ...)
2025-05-21 22:42 ` [PATCH 3/4] check: unbreak iam Darrick J. Wong
@ 2025-05-21 22:42 ` Darrick J. Wong
2025-05-23 5:14 ` Christoph Hellwig
3 siblings, 1 reply; 12+ messages in thread
From: Darrick J. Wong @ 2025-05-21 22:42 UTC (permalink / raw)
To: zlang, djwong; +Cc: fstests, fstests, linux-xfs
From: Darrick J. Wong <djwong@kernel.org>
Currently, ./check calls _check_filesystems after a test passes to make
sure that the test and scratch filesystems are ok, and repairs the test
filesystem if it's not ok.
However, we don't do this for failed tests. If a test fails /and/
corrupts the test filesystem, every subsequent passing test will be
marked as a failure because of latent corruptions on the test
filesystem.
This is a little silly, so let's call _check_filesystems on the test
filesystem after a test fail so that the badness doesn't spread.
Cc: <fstests@vger.kernel.org> # v2023.05.01
Fixes: 4a444bc19a836f ("check: _check_filesystems for errors even if test failed")
Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
---
check | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/check b/check
index 826641268f8b52..818ce44da28f65 100755
--- a/check
+++ b/check
@@ -986,8 +986,13 @@ function run_section()
_dump_err_cont "[failed, exit status $sts]"
_test_unmount 2> /dev/null
_scratch_unmount 2> /dev/null
- rm -f ${RESULT_DIR}/require_test*
rm -f ${RESULT_DIR}/require_scratch*
+
+ # Make sure the test filesystem is ready to go since
+ # we don't call _check_filesystems for failed tests
+ (_adjust_oom_score 250; _check_filesystems) || tc_status="fail"
+
+ rm -f ${RESULT_DIR}/require_test*
# Even though we failed, there may be something interesting in
# dmesg which can help debugging.
_check_dmesg
^ permalink raw reply related [flat|nested] 12+ messages in thread* Re: [PATCH 4/4] check: check and fix the test filesystem after failed tests
2025-05-21 22:42 ` [PATCH 4/4] check: check and fix the test filesystem after failed tests Darrick J. Wong
@ 2025-05-23 5:14 ` Christoph Hellwig
2025-05-23 13:39 ` Zorro Lang
0 siblings, 1 reply; 12+ messages in thread
From: Christoph Hellwig @ 2025-05-23 5:14 UTC (permalink / raw)
To: Darrick J. Wong; +Cc: zlang, fstests, linux-xfs
On Wed, May 21, 2025 at 03:42:54PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> Currently, ./check calls _check_filesystems after a test passes to make
> sure that the test and scratch filesystems are ok, and repairs the test
> filesystem if it's not ok.
>
> However, we don't do this for failed tests. If a test fails /and/
> corrupts the test filesystem, every subsequent passing test will be
> marked as a failure because of latent corruptions on the test
> filesystem.
>
> This is a little silly, so let's call _check_filesystems on the test
> filesystem after a test fail so that the badness doesn't spread.
>
> Cc: <fstests@vger.kernel.org> # v2023.05.01
> Fixes: 4a444bc19a836f ("check: _check_filesystems for errors even if test failed")
> Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> ---
> check | 7 ++++++-
> 1 file changed, 6 insertions(+), 1 deletion(-)
>
>
> diff --git a/check b/check
> index 826641268f8b52..818ce44da28f65 100755
> --- a/check
> +++ b/check
> @@ -986,8 +986,13 @@ function run_section()
> _dump_err_cont "[failed, exit status $sts]"
> _test_unmount 2> /dev/null
> _scratch_unmount 2> /dev/null
> - rm -f ${RESULT_DIR}/require_test*
> rm -f ${RESULT_DIR}/require_scratch*
> +
> + # Make sure the test filesystem is ready to go since
> + # we don't call _check_filesystems for failed tests
> + (_adjust_oom_score 250; _check_filesystems) || tc_status="fail"
Maybe break the line after the || to improve readability?
Otherwise looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 4/4] check: check and fix the test filesystem after failed tests
2025-05-23 5:14 ` Christoph Hellwig
@ 2025-05-23 13:39 ` Zorro Lang
2025-05-23 13:48 ` Zorro Lang
0 siblings, 1 reply; 12+ messages in thread
From: Zorro Lang @ 2025-05-23 13:39 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Darrick J. Wong, fstests, linux-xfs
On Thu, May 22, 2025 at 10:14:08PM -0700, Christoph Hellwig wrote:
> On Wed, May 21, 2025 at 03:42:54PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> >
> > Currently, ./check calls _check_filesystems after a test passes to make
> > sure that the test and scratch filesystems are ok, and repairs the test
> > filesystem if it's not ok.
> >
> > However, we don't do this for failed tests. If a test fails /and/
> > corrupts the test filesystem, every subsequent passing test will be
> > marked as a failure because of latent corruptions on the test
> > filesystem.
> >
> > This is a little silly, so let's call _check_filesystems on the test
> > filesystem after a test fail so that the badness doesn't spread.
> >
> > Cc: <fstests@vger.kernel.org> # v2023.05.01
> > Fixes: 4a444bc19a836f ("check: _check_filesystems for errors even if test failed")
> > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> > ---
> > check | 7 ++++++-
> > 1 file changed, 6 insertions(+), 1 deletion(-)
> >
> >
> > diff --git a/check b/check
> > index 826641268f8b52..818ce44da28f65 100755
> > --- a/check
> > +++ b/check
> > @@ -986,8 +986,13 @@ function run_section()
> > _dump_err_cont "[failed, exit status $sts]"
> > _test_unmount 2> /dev/null
> > _scratch_unmount 2> /dev/null
> > - rm -f ${RESULT_DIR}/require_test*
> > rm -f ${RESULT_DIR}/require_scratch*
> > +
> > + # Make sure the test filesystem is ready to go since
> > + # we don't call _check_filesystems for failed tests
> > + (_adjust_oom_score 250; _check_filesystems) || tc_status="fail"
>
> Maybe break the line after the || to improve readability?
If you mean this:
(_adjust_oom_score 250; _check_filesystems) || \
tc_status="fail"
I'll help that when I merge it.
Thanks,
Zorro
>
> Otherwise looks good:
>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
>
>
^ permalink raw reply [flat|nested] 12+ messages in thread* Re: [PATCH 4/4] check: check and fix the test filesystem after failed tests
2025-05-23 13:39 ` Zorro Lang
@ 2025-05-23 13:48 ` Zorro Lang
0 siblings, 0 replies; 12+ messages in thread
From: Zorro Lang @ 2025-05-23 13:48 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Darrick J. Wong, fstests, linux-xfs
On Fri, May 23, 2025 at 09:39:00PM +0800, Zorro Lang wrote:
> On Thu, May 22, 2025 at 10:14:08PM -0700, Christoph Hellwig wrote:
> > On Wed, May 21, 2025 at 03:42:54PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > >
> > > Currently, ./check calls _check_filesystems after a test passes to make
> > > sure that the test and scratch filesystems are ok, and repairs the test
> > > filesystem if it's not ok.
> > >
> > > However, we don't do this for failed tests. If a test fails /and/
> > > corrupts the test filesystem, every subsequent passing test will be
> > > marked as a failure because of latent corruptions on the test
> > > filesystem.
> > >
> > > This is a little silly, so let's call _check_filesystems on the test
> > > filesystem after a test fail so that the badness doesn't spread.
> > >
> > > Cc: <fstests@vger.kernel.org> # v2023.05.01
> > > Fixes: 4a444bc19a836f ("check: _check_filesystems for errors even if test failed")
> > > Signed-off-by: "Darrick J. Wong" <djwong@kernel.org>
> > > ---
> > > check | 7 ++++++-
> > > 1 file changed, 6 insertions(+), 1 deletion(-)
> > >
> > >
> > > diff --git a/check b/check
> > > index 826641268f8b52..818ce44da28f65 100755
> > > --- a/check
> > > +++ b/check
> > > @@ -986,8 +986,13 @@ function run_section()
> > > _dump_err_cont "[failed, exit status $sts]"
> > > _test_unmount 2> /dev/null
> > > _scratch_unmount 2> /dev/null
> > > - rm -f ${RESULT_DIR}/require_test*
> > > rm -f ${RESULT_DIR}/require_scratch*
> > > +
> > > + # Make sure the test filesystem is ready to go since
> > > + # we don't call _check_filesystems for failed tests
> > > + (_adjust_oom_score 250; _check_filesystems) || tc_status="fail"
> >
> > Maybe break the line after the || to improve readability?
>
> If you mean this:
> (_adjust_oom_score 250; _check_filesystems) || \
> tc_status="fail"
>
> I'll help that when I merge it.
Hmm... I just found there's another line write as:
(_adjust_oom_score 250; _check_filesystems) || tc_status="fail"
nearby above patch. So I think we change both or keep consistency temporarily:)
Thanks,
Zorro
>
> Thanks,
> Zorro
>
> >
> > Otherwise looks good:
> >
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> >
> >
^ permalink raw reply [flat|nested] 12+ messages in thread