[PATCH 0/5] fstests: add some new LBS inspired tests

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/5] fstests: add some new LBS inspired tests
@ 2024-06-11  3:01 Luis Chamberlain
  2024-06-11  3:01 ` [PATCH 1/5] common: move mread() to generic helper _mread() Luis Chamberlain
                   ` (4 more replies)
  0 siblings, 5 replies; 22+ messages in thread
From: Luis Chamberlain @ 2024-06-11  3:01 UTC (permalink / raw)
  To: patches, fstests
  Cc: linux-xfs, linux-mm, linux-fsdevel, akpm, ziy, vbabka, seanjc,
	willy, david, hughd, linmiaohe, muchun.song, osalvador, p.raghav,
	da.gomez, hare, john.g.garry, mcgrof

While working on LBS we've come accross some existing issues, some of them deal
existing kernels without LBS, while the only new corner case specific to LBS is
the xarray bug Willy fixed to help with truncation to larger order folios and
races with writeback.

This adds 3 new test to help reproduce these issues right away. One test
reproduces an otherwise extremely difficult to reproduce deadlock, we have one
patch fix already merged to help with that deadlock, however the test also
also gives us more homework todo, as more deadlocks are still possible with that
test even on v6.10-rc2.

The mmap page boundary test let's us discover that a patch on the LBS series
fixes the mmap page boundary restriction when huge pages are enabled on tmpfs.
This is a corner case POSIX semantic issue, so likley not critical to most users.

The fsstress + compaction test reproduces a really difficult to reproduce hang
which is possible without some recent fixes, however the test reveals there is
yet more work is left to do.

The stress truncation + writeback test is the only test in this series specific
to LBS, but likely will be useful later for other future uses in the kernel.

Luis Chamberlain (5):
  common: move mread() to generic helper _mread()
  fstests: add mmap page boundary tests
  fstests: add fsstress + compaction test
  _require_debugfs(): simplify and fix for debian
  fstests: add stress truncation + writeback test

 common/rc             |  54 +++++++++-
 tests/generic/574     |  36 +------
 tests/generic/749     | 238 ++++++++++++++++++++++++++++++++++++++++++
 tests/generic/749.out |   2 +
 tests/generic/750     |  62 +++++++++++
 tests/generic/750.out |   2 +
 tests/generic/751     | 127 ++++++++++++++++++++++
 tests/generic/751.out |   2 +
 8 files changed, 490 insertions(+), 33 deletions(-)
 create mode 100755 tests/generic/749
 create mode 100644 tests/generic/749.out
 create mode 100755 tests/generic/750
 create mode 100644 tests/generic/750.out
 create mode 100755 tests/generic/751
 create mode 100644 tests/generic/751.out

-- 
2.43.0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 1/5] common: move mread() to generic helper _mread()
  2024-06-11  3:01 [PATCH 0/5] fstests: add some new LBS inspired tests Luis Chamberlain
@ 2024-06-11  3:01 ` Luis Chamberlain
  2024-06-11 14:28   ` Darrick J. Wong
  2024-06-11  3:01 ` [PATCH 2/5] fstests: add mmap page boundary tests Luis Chamberlain
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 22+ messages in thread
From: Luis Chamberlain @ 2024-06-11  3:01 UTC (permalink / raw)
  To: patches, fstests
  Cc: linux-xfs, linux-mm, linux-fsdevel, akpm, ziy, vbabka, seanjc,
	willy, david, hughd, linmiaohe, muchun.song, osalvador, p.raghav,
	da.gomez, hare, john.g.garry, mcgrof, Darrick J. Wong

We want a shared way to use mmap in a way that we can test
for the SIGBUS, provide a shared routine which other tests can
leverage.

Suggested-by: "Darrick J. Wong" <djwong@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 common/rc         | 28 ++++++++++++++++++++++++++++
 tests/generic/574 | 36 ++++--------------------------------
 2 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/common/rc b/common/rc
index 163041fea5b9..fa7942809d6c 100644
--- a/common/rc
+++ b/common/rc
@@ -52,6 +52,34 @@ _pwrite_byte() {
 	$XFS_IO_PROG $xfs_io_args -f -c "pwrite -S $pattern $offset $len" "$file"
 }
 
+_round_up_to_page_boundary()
+{
+	local n=$1
+	local page_size=$(_get_page_size)
+
+	echo $(( (n + page_size - 1) & ~(page_size - 1) ))
+}
+
+_mread()
+{
+	local file=$1
+	local offset=$2
+	local length=$3
+	local map_len=$(_round_up_to_page_boundary $(_get_filesize $file))
+
+	# Some callers expect xfs_io to crash with SIGBUS due to the mread,
+	# causing the shell to print "Bus error" to stderr.  To allow this
+	# message to be redirected, execute xfs_io in a new shell instance.
+	# However, for this to work reliably, we also need to prevent the new
+	# shell instance from optimizing out the fork and directly exec'ing
+	# xfs_io.  The easiest way to do that is to append 'true' to the
+	# commands, so that xfs_io is no longer the last command the shell sees.
+	# Don't let it write core files to the filesystem.
+	bash -c "trap '' SIGBUS; ulimit -c 0; $XFS_IO_PROG -r $file \
+		-c 'mmap -r 0 $map_len' \
+		-c 'mread -v $offset $length'; true"
+}
+
 # mmap-write a byte into a range of a file
 _mwrite_byte() {
 	local pattern="$1"
diff --git a/tests/generic/574 b/tests/generic/574
index cb42baaa67aa..d44c23e5abc2 100755
--- a/tests/generic/574
+++ b/tests/generic/574
@@ -52,34 +52,6 @@ setup_zeroed_file()
 	cmp $fsv_orig_file $fsv_file
 }
 
-round_up_to_page_boundary()
-{
-	local n=$1
-	local page_size=$(_get_page_size)
-
-	echo $(( (n + page_size - 1) & ~(page_size - 1) ))
-}
-
-mread()
-{
-	local file=$1
-	local offset=$2
-	local length=$3
-	local map_len=$(round_up_to_page_boundary $(_get_filesize $file))
-
-	# Some callers expect xfs_io to crash with SIGBUS due to the mread,
-	# causing the shell to print "Bus error" to stderr.  To allow this
-	# message to be redirected, execute xfs_io in a new shell instance.
-	# However, for this to work reliably, we also need to prevent the new
-	# shell instance from optimizing out the fork and directly exec'ing
-	# xfs_io.  The easiest way to do that is to append 'true' to the
-	# commands, so that xfs_io is no longer the last command the shell sees.
-	# Don't let it write core files to the filesystem.
-	bash -c "trap '' SIGBUS; ulimit -c 0; $XFS_IO_PROG -r $file \
-		-c 'mmap -r 0 $map_len' \
-		-c 'mread -v $offset $length'; true"
-}
-
 corruption_test()
 {
 	local block_size=$1
@@ -142,7 +114,7 @@ corruption_test()
 	fi
 
 	# Reading the full file via mmap should fail.
-	mread $fsv_file 0 $file_len >/dev/null 2>$tmp.err
+	_mread $fsv_file 0 $file_len >/dev/null 2>$tmp.err
 	if ! grep -q 'Bus error' $tmp.err; then
 		echo "Didn't see SIGBUS when reading file via mmap"
 		cat $tmp.err
@@ -150,7 +122,7 @@ corruption_test()
 
 	# Reading just the corrupted part via mmap should fail.
 	if ! $is_merkle_tree; then
-		mread $fsv_file $zap_offset $zap_len >/dev/null 2>$tmp.err
+		_mread $fsv_file $zap_offset $zap_len >/dev/null 2>$tmp.err
 		if ! grep -q 'Bus error' $tmp.err; then
 			echo "Didn't see SIGBUS when reading corrupted part via mmap"
 			cat $tmp.err
@@ -174,10 +146,10 @@ corrupt_eof_block_test()
 	head -c $zap_len /dev/zero | tr '\0' X \
 		| _fsv_scratch_corrupt_bytes $fsv_file $file_len
 
-	mread $fsv_file $file_len $zap_len >$tmp.out 2>$tmp.err
+	_mread $fsv_file $file_len $zap_len >$tmp.out 2>$tmp.err
 
 	head -c $file_len /dev/zero >$tmp.zeroes
-	mread $tmp.zeroes $file_len $zap_len >$tmp.zeroes_out
+	_mread $tmp.zeroes $file_len $zap_len >$tmp.zeroes_out
 
 	grep -q 'Bus error' $tmp.err || diff $tmp.out $tmp.zeroes_out
 }
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 2/5] fstests: add mmap page boundary tests
  2024-06-11  3:01 [PATCH 0/5] fstests: add some new LBS inspired tests Luis Chamberlain
  2024-06-11  3:01 ` [PATCH 1/5] common: move mread() to generic helper _mread() Luis Chamberlain
@ 2024-06-11  3:01 ` Luis Chamberlain
  2024-06-11 16:48   ` Darrick J. Wong
  2024-06-12  8:06   ` Zorro Lang
  2024-06-11  3:02 ` [PATCH 3/5] fstests: add fsstress + compaction test Luis Chamberlain
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 22+ messages in thread
From: Luis Chamberlain @ 2024-06-11  3:01 UTC (permalink / raw)
  To: patches, fstests
  Cc: linux-xfs, linux-mm, linux-fsdevel, akpm, ziy, vbabka, seanjc,
	willy, david, hughd, linmiaohe, muchun.song, osalvador, p.raghav,
	da.gomez, hare, john.g.garry, mcgrof

mmap() POSIX compliance says we should zero fill data beyond a file
size up to page boundary, and issue a SIGBUS if we go beyond. While fsx
helps us test zero-fill sometimes, fsstress also let's us sometimes test
for SIGBUS however that is based on a random value and its not likely we
always test it. Dedicate a specic test for this to make testing for
this specific situation and to easily expand on other corner cases.

The only filesystem currently known to fail is tmpfs with huge pages,
but the pending upstream patch "filemap: cap PTE range to be created to
allowed zero fill in folio_map_range()" fixes this issue for tmpfs and
also for LBS support.

Suggested-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 common/rc             |   5 +-
 tests/generic/749     | 238 ++++++++++++++++++++++++++++++++++++++++++
 tests/generic/749.out |   2 +
 3 files changed, 244 insertions(+), 1 deletion(-)
 create mode 100755 tests/generic/749
 create mode 100644 tests/generic/749.out

diff --git a/common/rc b/common/rc
index fa7942809d6c..e812a2f7cc67 100644
--- a/common/rc
+++ b/common/rc
@@ -60,12 +60,15 @@ _round_up_to_page_boundary()
 	echo $(( (n + page_size - 1) & ~(page_size - 1) ))
 }
 
+# You can override the $map_len but its optional, by default we use the
+# max allowed size. If you use a length greater than the default you can
+# expect a SIBGUS and test for it.
 _mread()
 {
 	local file=$1
 	local offset=$2
 	local length=$3
-	local map_len=$(_round_up_to_page_boundary $(_get_filesize $file))
+	local map_len=${4:-$(_round_up_to_page_boundary $(_get_filesize $file)) }
 
 	# Some callers expect xfs_io to crash with SIGBUS due to the mread,
 	# causing the shell to print "Bus error" to stderr.  To allow this
diff --git a/tests/generic/749 b/tests/generic/749
new file mode 100755
index 000000000000..c1d3a2028549
--- /dev/null
+++ b/tests/generic/749
@@ -0,0 +1,238 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) Luis Chamberlain. All Rights Reserved.
+#
+# FS QA Test 749
+#
+# As per POSIX NOTES mmap(2) maps multiples of the system page size, but if the
+# data mapped is not multiples of the page size the remaining bytes are zeroed
+# out when mapped and modifications to that region are not written to the file.
+# On Linux when you write data to such partial page after the end of the
+# object, the data stays in the page cache even after the file is closed and
+# unmapped and  even  though  the data  is never written to the file itself,
+# subsequent mappings may see the modified content. If you go *beyond* this
+# page, you should get a SIGBUS. This test verifies we zero-fill to page
+# boundary and ensures we get a SIGBUS if we write to data beyond the system
+# page size even if the block size is greater than the system page size.
+. ./common/preamble
+. ./common/rc
+_begin_fstest auto quick prealloc
+
+# Import common functions.
+. ./common/filter
+
+# real QA test starts here
+_supported_fs generic
+_require_scratch_nocheck
+_require_test
+_require_xfs_io_command "truncate"
+_require_xfs_io_command "falloc"
+
+# _fixed_by_git_commit kernel <pending-upstream> \
+#        "filemap: cap PTE range to be created to allowed zero fill in folio_map_range()"
+
+filter_xfs_io_data_unique()
+{
+    _filter_xfs_io_offset | sed -e 's| |\n|g' | grep -E -v "\.|XX|\*" | \
+	sort -u | tr -d '\n'
+}
+
+
+setup_zeroed_file()
+{
+	local file_len=$1
+	local sparse=$2
+
+	if $sparse; then
+		$XFS_IO_PROG -f -c "truncate $file_len" $test_file
+	else
+		$XFS_IO_PROG -f -c "falloc 0 $file_len" $test_file
+	fi
+}
+
+mwrite()
+{
+       local file=$1
+       local offset=$2
+       local length=$3
+       local map_len=${4:-$(_round_up_to_page_boundary $(_get_filesize $file)) }
+
+       # Some callers expect xfs_io to crash with SIGBUS due to the mread,
+       # causing the shell to print "Bus error" to stderr.  To allow this
+       # message to be redirected, execute xfs_io in a new shell instance.
+       # However, for this to work reliably, we also need to prevent the new
+       # shell instance from optimizing out the fork and directly exec'ing
+       # xfs_io.  The easiest way to do that is to append 'true' to the
+       # commands, so that xfs_io is no longer the last command the shell sees.
+       bash -c "trap '' SIGBUS; ulimit -c 0; \
+		$XFS_IO_PROG $file \
+               -c 'mmap -w 0 $map_len' \
+               -c 'mwrite $offset $length'; \
+	       true"
+}
+
+do_mmap_tests()
+{
+	local block_size=$1
+	local file_len=$2
+	local offset=$3
+	local len=$4
+	local use_sparse_file=${5:-false}
+	local new_filelen=0
+	local map_len=0
+	local csum=0
+	local fs_block_size=$(_get_file_block_size $SCRATCH_MNT)
+
+	echo -en "\n\n==> Testing blocksize $block_size " >> $seqres.full
+	echo -en "file_len: $file_len offset: $offset " >> $seqres.full
+	echo -e "len: $len sparse: $use_sparse_file" >> $seqres.full
+
+	if ((fs_block_size != block_size)); then
+		_fail "Block size created ($block_size) doesn't match _get_file_block_size on mount ($fs_block_size)"
+	fi
+
+	rm -rf "${SCRATCH_MNT:?}"/*
+
+	# This let's us also test against sparse files
+	setup_zeroed_file $file_len $use_sparse_file
+
+	# This will overwrite the old data, the file size is the
+	# delta between offset and len now.
+	$XFS_IO_PROG -f -c "pwrite -S 0xaa -b 512 $offset $len" \
+		$test_file >> $seqres.full
+
+	sync
+	new_filelen=$(_get_filesize $test_file)
+	map_len=$(_round_up_to_page_boundary $new_filelen)
+	csum_orig="$(_md5_checksum $test_file)"
+
+	# A couple of mmap() tests:
+	#
+	# We are allowed to mmap() up to the boundary of the page size of a
+	# data object, but there a few rules to follow we must check for:
+	#
+	# a) zero-fill test for the data: POSIX says we should zero fill any
+	#    partial page after the end of the object. Verify zero-fill.
+	# b) do not write this bogus data to disk: on Linux, if we write data
+	#    to a partially filled page, it will stay in the page cache even
+	#    after the file is closed and unmapped even if it never reaches the
+	#    file. Subsequent mappings *may* see the modified content, but it
+	#    also can get other data. Since the data read after the actual
+	#    object data can vary we just verify the filesize does not change.
+	if [[ $map_len -gt $new_filelen ]]; then
+		zero_filled_data_len=$((map_len - new_filelen))
+		_scratch_cycle_mount
+		expected_zero_data="00"
+		zero_filled_data=$($XFS_IO_PROG -r $test_file \
+			-c "mmap -r 0 $map_len" \
+			-c "mread -v $new_filelen $zero_filled_data_len" \
+			-c "munmap" | \
+			filter_xfs_io_data_unique)
+		if [[ "$zero_filled_data" != "$expected_zero_data" ]]; then
+			echo "Expected data: $expected_zero_data"
+			echo "  Actual data: $zero_filled_data"
+			_fail "Zero-fill expectations with mmap() not respected"
+		fi
+
+		_scratch_cycle_mount
+		$XFS_IO_PROG $test_file \
+			-c "mmap -w 0 $map_len" \
+			-c "mwrite $new_filelen $zero_filled_data_len" \
+			-c "munmap"
+		sync
+		csum_post="$(_md5_checksum $test_file)"
+		if [[ "$csum_orig" != "$csum_post" ]]; then
+			echo "Expected csum: $csum_orig"
+			echo " Actual  csum: $csum_post"
+			_fail "mmap() write up to page boundary should not change actual file contents"
+		fi
+
+		local filelen_test=$(_get_filesize $test_file)
+		if [[ "$filelen_test" != "$new_filelen" ]]; then
+			echo "Expected file length: $new_filelen"
+			echo " Actual  file length: $filelen_test"
+			_fail "mmap() write up to page boundary should not change actual file size"
+		fi
+	fi
+
+	# Now lets ensure we get SIGBUS when we go beyond the page boundary
+	_scratch_cycle_mount
+	new_filelen=$(_get_filesize $test_file)
+	map_len=$(_round_up_to_page_boundary $new_filelen)
+	csum_orig="$(_md5_checksum $test_file)"
+	_mread $test_file 0 $map_len >> $seqres.full  2>$tmp.err
+	if grep -q 'Bus error' $tmp.err; then
+		cat $tmp.err
+		_fail "Not expecting SIGBUS when reading up to page boundary"
+	fi
+
+	# This should just work
+	_mread $test_file 0 $map_len >> $seqres.full  2>$tmp.err
+	if [[ $? -ne 0 ]]; then
+		_fail "mmap() read up to page boundary should work"
+	fi
+
+	# This should just work
+	mwrite $map_len 0 $map_len >> $seqres.full  2>$tmp.err
+	if [[ $? -ne 0 ]]; then
+		_fail "mmap() write up to page boundary should work"
+	fi
+
+	# If we mmap() on the boundary but try to read beyond it just
+	# fails, we don't get a SIGBUS
+	$XFS_IO_PROG -r $test_file \
+		-c "mmap -r 0 $map_len" \
+		-c "mread 0 $((map_len + 10))" >> $seqres.full  2>$tmp.err
+	local mread_err=$?
+	if [[ $mread_err -eq 0 ]]; then
+		_fail "mmap() to page boundary works as expected but reading beyond should fail: $mread_err"
+	fi
+
+	$XFS_IO_PROG -w $test_file \
+		-c "mmap -w 0 $map_len" \
+		-c "mwrite 0 $((map_len + 10))" >> $seqres.full  2>$tmp.err
+	local mwrite_err=$?
+	if [[ $mwrite_err -eq 0 ]]; then
+		_fail "mmap() to page boundary works as expected but writing beyond should fail: $mwrite_err"
+	fi
+
+	# Now let's go beyond the allowed mmap() page boundary
+	_mread $test_file 0 $((map_len + 10)) $((map_len + 10)) >> $seqres.full  2>$tmp.err
+	if ! grep -q 'Bus error' $tmp.err; then
+		_fail "Expected SIGBUS when mmap() reading beyond page boundary"
+	fi
+
+	mwrite $test_file 0 $((map_len + 10)) $((map_len + 10))  >> $seqres.full  2>$tmp.err
+	if ! grep -q 'Bus error' $tmp.err; then
+		_fail "Expected SIGBUS when mmap() writing beyond page boundary"
+	fi
+
+	local filelen_test=$(_get_filesize $test_file)
+	if [[ "$filelen_test" != "$new_filelen" ]]; then
+		echo "Expected file length: $new_filelen"
+		echo " Actual  file length: $filelen_test"
+		_fail "reading or writing beyond file size up to mmap() page boundary should not change file size"
+	fi
+}
+
+test_block_size()
+{
+	local block_size=$1
+
+	do_mmap_tests $block_size 512 3 5
+	do_mmap_tests $block_size 11k 0 $((4096 * 3 + 3))
+	do_mmap_tests $block_size 16k 0 $((16384+3))
+	do_mmap_tests $block_size 16k $((16384-10)) $((16384+20))
+	do_mmap_tests $block_size 64k 0 $((65536+3))
+	do_mmap_tests $block_size 4k 4090 30 true
+}
+
+_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
+_scratch_mount
+test_file=$SCRATCH_MNT/file
+block_size=$(_get_file_block_size "$SCRATCH_MNT")
+test_block_size $block_size
+
+echo "Silence is golden"
+status=0
+exit
diff --git a/tests/generic/749.out b/tests/generic/749.out
new file mode 100644
index 000000000000..24658deddb99
--- /dev/null
+++ b/tests/generic/749.out
@@ -0,0 +1,2 @@
+QA output created by 749
+Silence is golden
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 3/5] fstests: add fsstress + compaction test
  2024-06-11  3:01 [PATCH 0/5] fstests: add some new LBS inspired tests Luis Chamberlain
  2024-06-11  3:01 ` [PATCH 1/5] common: move mread() to generic helper _mread() Luis Chamberlain
  2024-06-11  3:01 ` [PATCH 2/5] fstests: add mmap page boundary tests Luis Chamberlain
@ 2024-06-11  3:02 ` Luis Chamberlain
  2024-06-11 14:48   ` Darrick J. Wong
  2024-06-12  8:00   ` Zorro Lang
  2024-06-11  3:02 ` [PATCH 4/5] _require_debugfs(): simplify and fix for debian Luis Chamberlain
  2024-06-11  3:02 ` [PATCH 5/5] fstests: add stress truncation + writeback test Luis Chamberlain
  4 siblings, 2 replies; 22+ messages in thread
From: Luis Chamberlain @ 2024-06-11  3:02 UTC (permalink / raw)
  To: patches, fstests
  Cc: linux-xfs, linux-mm, linux-fsdevel, akpm, ziy, vbabka, seanjc,
	willy, david, hughd, linmiaohe, muchun.song, osalvador, p.raghav,
	da.gomez, hare, john.g.garry, mcgrof

Running compaction while we run fsstress can crash older kernels as per
korg#218227 [0], the fix for that [0] has been posted [1] that patch
was merged on v6.9-rc6 fixed by commit d99e3140a4d3 ("mm: turn
folio_test_hugetlb into a PageType"). However even on v6.10-rc2 where
this kernel commit is already merged we can still deadlock when running
fsstress and at the same time triggering compaction, this is a new
issue being reported now this through patch, but this patch also
serves as a reproducer with a high confidence. It always deadlocks.
If you enable CONFIG_PROVE_LOCKING with the defaults you will end up
with a complaint about increasing MAX_LOCKDEP_CHAIN_HLOCKS [1], if
you adjust that you then end up with a few soft lockup complaints and
some possible deadlock candidates to evaluate [2].

Provide a simple reproducer and pave the way so we keep on testing this.

Without lockdep enabled we silently deadlock on the first run of the
test without the fix applied. With lockdep enabled you get a splat about
the possible deadlock on the first run of the test.

[0] https://bugzilla.kernel.org/show_bug.cgi?id=218227
[1] https://gist.github.com/mcgrof/824913b645892214effeb1631df75072
[2] https://gist.github.com/mcgrof/926e183d21c5c4c55d74ec90197bd77a

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 common/rc             |  7 +++++
 tests/generic/750     | 62 +++++++++++++++++++++++++++++++++++++++++++
 tests/generic/750.out |  2 ++
 3 files changed, 71 insertions(+)
 create mode 100755 tests/generic/750
 create mode 100644 tests/generic/750.out

diff --git a/common/rc b/common/rc
index e812a2f7cc67..18ad25662d5c 100644
--- a/common/rc
+++ b/common/rc
@@ -151,6 +151,13 @@ _require_hugepages()
 		_notrun "Kernel does not report huge page size"
 }
 
+# Requires CONFIG_COMPACTION
+_require_vm_compaction()
+{
+	if [ ! -f /proc/sys/vm/compact_memory ]; then
+	    _notrun "Need compaction enabled CONFIG_COMPACTION=y"
+	fi
+}
 # Get hugepagesize in bytes
 _get_hugepagesize()
 {
diff --git a/tests/generic/750 b/tests/generic/750
new file mode 100755
index 000000000000..334ab011dfa0
--- /dev/null
+++ b/tests/generic/750
@@ -0,0 +1,62 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2024 Luis Chamberlain.  All Rights Reserved.
+#
+# FS QA Test 750
+#
+# fsstress + memory compaction test
+#
+. ./common/preamble
+_begin_fstest auto rw long_rw stress soak smoketest
+
+_cleanup()
+{
+	cd /
+	rm -f $runfile
+	rm -f $tmp.*
+	kill -9 $trigger_compaction_pid > /dev/null 2>&1
+	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
+
+	wait > /dev/null 2>&1
+}
+
+# Import common functions.
+
+# real QA test starts here
+
+_supported_fs generic
+
+_require_scratch
+_require_vm_compaction
+_require_command "$KILLALL_PROG" "killall"
+
+# We still deadlock with this test on v6.10-rc2, we need more work.
+# but the below makes things better.
+_fixed_by_git_commit kernel d99e3140a4d3 \
+	"mm: turn folio_test_hugetlb into a PageType"
+
+echo "Silence is golden"
+
+_scratch_mkfs > $seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+nr_cpus=$((LOAD_FACTOR * 4))
+nr_ops=$((25000 * nr_cpus * TIME_FACTOR))
+fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus)
+
+# start a background trigger for memory compaction
+runfile="$tmp.compaction"
+touch $runfile
+while [ -e $runfile ]; do
+	echo 1 > /proc/sys/vm/compact_memory
+	sleep 5
+done &
+trigger_compaction_pid=$!
+
+test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION")
+
+$FSSTRESS_PROG $FSSTRESS_AVOID "${fsstress_args[@]}" >> $seqres.full
+wait > /dev/null 2>&1
+
+status=0
+exit
diff --git a/tests/generic/750.out b/tests/generic/750.out
new file mode 100644
index 000000000000..bd79507b632e
--- /dev/null
+++ b/tests/generic/750.out
@@ -0,0 +1,2 @@
+QA output created by 750
+Silence is golden
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 4/5] _require_debugfs(): simplify and fix for debian
  2024-06-11  3:01 [PATCH 0/5] fstests: add some new LBS inspired tests Luis Chamberlain
                   ` (2 preceding siblings ...)
  2024-06-11  3:02 ` [PATCH 3/5] fstests: add fsstress + compaction test Luis Chamberlain
@ 2024-06-11  3:02 ` Luis Chamberlain
  2024-06-11 14:35   ` Darrick J. Wong
  2024-06-12  7:51   ` Zorro Lang
  2024-06-11  3:02 ` [PATCH 5/5] fstests: add stress truncation + writeback test Luis Chamberlain
  4 siblings, 2 replies; 22+ messages in thread
From: Luis Chamberlain @ 2024-06-11  3:02 UTC (permalink / raw)
  To: patches, fstests
  Cc: linux-xfs, linux-mm, linux-fsdevel, akpm, ziy, vbabka, seanjc,
	willy, david, hughd, linmiaohe, muchun.song, osalvador, p.raghav,
	da.gomez, hare, john.g.garry, mcgrof

Using findmnt -S debugfs arguments does not really output anything on
debian, and is not needed, fix that.

Fixes: 8e8fb3da709e ("fstests: fix _require_debugfs and call it properly")
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 common/rc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/common/rc b/common/rc
index 18ad25662d5c..30beef4e5c02 100644
--- a/common/rc
+++ b/common/rc
@@ -3025,7 +3025,7 @@ _require_debugfs()
 	local type
 
 	if [ -d "$DEBUGFS_MNT" ];then
-		type=$(findmnt -rncv -T $DEBUGFS_MNT -S debugfs -o FSTYPE)
+		type=$(findmnt -rncv -T $DEBUGFS_MNT -o FSTYPE)
 		[ "$type" = "debugfs" ] && return 0
 	fi
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH 5/5] fstests: add stress truncation + writeback test
  2024-06-11  3:01 [PATCH 0/5] fstests: add some new LBS inspired tests Luis Chamberlain
                   ` (3 preceding siblings ...)
  2024-06-11  3:02 ` [PATCH 4/5] _require_debugfs(): simplify and fix for debian Luis Chamberlain
@ 2024-06-11  3:02 ` Luis Chamberlain
  2024-06-11 14:45   ` Darrick J. Wong
  4 siblings, 1 reply; 22+ messages in thread
From: Luis Chamberlain @ 2024-06-11  3:02 UTC (permalink / raw)
  To: patches, fstests
  Cc: linux-xfs, linux-mm, linux-fsdevel, akpm, ziy, vbabka, seanjc,
	willy, david, hughd, linmiaohe, muchun.song, osalvador, p.raghav,
	da.gomez, hare, john.g.garry, mcgrof

Stress test folio splits by using the new debugfs interface to a target
a new smaller folio order while triggering writeback at the same time.

This is known to only creates a crash with min order enabled, so for example
with a 16k block sized XFS test profile, an xarray fix for that is merged
already. This issue is fixed by kernel commit 2a0774c2886d ("XArray: set the
marks correctly when splitting an entry").

If inspecting more closely, you'll want to enable on your kernel boot:

	dyndbg='file mm/huge_memory.c +p'

Since we want to race large folio splits we also augment the full test
output log $seqres.full with the test specific number of successful
splits from vmstat thp_split_page and thp_split_page_failed. The larger
the vmstat thp_split_page the more we stress test this path.

This test reproduces a really hard to reproduce crash immediately.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 common/rc             |  14 +++++
 tests/generic/751     | 127 ++++++++++++++++++++++++++++++++++++++++++
 tests/generic/751.out |   2 +
 3 files changed, 143 insertions(+)
 create mode 100755 tests/generic/751
 create mode 100644 tests/generic/751.out

diff --git a/common/rc b/common/rc
index 30beef4e5c02..60f572108818 100644
--- a/common/rc
+++ b/common/rc
@@ -158,6 +158,20 @@ _require_vm_compaction()
 	    _notrun "Need compaction enabled CONFIG_COMPACTION=y"
 	fi
 }
+
+# Requires CONFIG_DEBUGFS and truncation knobs
+_require_split_debugfs()
+{
+       if [ ! -f $DEBUGFS_MNT/split_huge_pages ]; then
+           _notrun "Needs CONFIG_DEBUGFS and split_huge_pages"
+       fi
+}
+
+_split_huge_pages_all()
+{
+	echo 1 > $DEBUGFS_MNT/split_huge_pages
+}
+
 # Get hugepagesize in bytes
 _get_hugepagesize()
 {
diff --git a/tests/generic/751 b/tests/generic/751
new file mode 100755
index 000000000000..7cc96054a5a9
--- /dev/null
+++ b/tests/generic/751
@@ -0,0 +1,127 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (C) 2024 Luis Chamberlain. All Rights Reserved.
+#
+# FS QA Test No. 751
+#
+# stress page cache truncation + writeback
+#
+# This aims at trying to reproduce a difficult to reproduce bug found with
+# min order. The issue was root caused to an xarray bug when we split folios
+# to another order other than 0. This functionality is used to support min
+# order. The crash:
+#
+# https://gist.github.com/mcgrof/d12f586ec6ebe32b2472b5d634c397df
+#
+# This may also find future truncation bugs in the future, as truncating any
+# mapped file through the collateral of using echo 1 > split_huge_pages will
+# always respect the min order. Truncating to a larger order then is excercised
+# when this test is run against any filesystem LBS profile or an LBS device.
+#
+# If you're enabling this and want to check underneath the hood you may want to
+# enable:
+#
+# dyndbg='file mm/huge_memory.c +p'
+#
+# This tests aims at increasing the rate of successful truncations so we want
+# to increase the value of thp_split_page in $seqres.full. Using echo 1 >
+# split_huge_pages is extremely aggressive, and even accounts for anonymous
+# memory on a system, however we accept that tradeoff for the efficiency of
+# doing the work in-kernel for any mapped file too. Our general goal here is to
+# race with folio truncation + writeback.
+
+. ./common/preamble
+
+_begin_fstest auto long_rw stress soak smoketest
+
+# Override the default cleanup function.
+_cleanup()
+{
+	cd /
+	rm -f $tmp.*
+	rm -f $runfile
+	kill -9 $split_huge_pages_files_pid > /dev/null 2>&1
+}
+
+fio_config=$tmp.fio
+fio_out=$tmp.fio.out
+
+# real QA test starts here
+_supported_fs generic
+_require_test
+_require_scratch
+_require_debugfs
+_require_split_debugfs
+_require_command "$KILLALL_PROG" "killall"
+_fixed_by_git_commit kernel 2a0774c2886d \
+	"XArray: set the marks correctly when splitting an entry"
+
+# we need buffered IO to force truncation races with writeback in the
+# page cache
+cat >$fio_config <<EOF
+[force_large_large_folio_parallel_writes]
+nrfiles=10
+direct=0
+bs=4M
+group_reporting=1
+filesize=1GiB
+readwrite=write
+fallocate=none
+numjobs=$(nproc)
+directory=$SCRATCH_MNT
+runtime=100*${TIME_FACTOR}
+time_based
+EOF
+
+_require_fio $fio_config
+
+echo "Silence is golden"
+
+_scratch_mkfs >>$seqres.full 2>&1
+_scratch_mount >> $seqres.full 2>&1
+
+# used to let our loops know when to stop
+runfile="$tmp.keep.running.loop"
+touch $runfile
+
+# The background ops are out of bounds, the goal is to race with fsstress.
+
+# Force folio split if possible, this seems to be screaming for MADV_NOHUGEPAGE
+# for large folios.
+while [ -e $runfile ]; do
+	_split_huge_pages_all >/dev/null 2>&1
+done &
+split_huge_pages_files_pid=$!
+
+split_count_before=0
+split_count_failed_before=0
+
+if grep -q thp_split_page /proc/vmstat; then
+	split_count_before=$(grep ^thp_split_page /proc/vmstat | head -1 | awk '{print $2}')
+	split_count_failed_before=$(grep ^thp_split_page_failed /proc/vmstat | head -1 | awk '{print $2}')
+else
+	echo "no thp_split_page in /proc/vmstat" >> $seqres.full
+fi
+
+# we blast away with large writes to force large folio writes when
+# possible.
+echo -e "Running fio with config:\n" >> $seqres.full
+cat $fio_config >> $seqres.full
+$FIO_PROG $fio_config --alloc-size=$(( $(nproc) * 8192 )) --output=$fio_out
+
+rm -f $runfile
+
+wait > /dev/null 2>&1
+
+if grep -q thp_split_page /proc/vmstat; then
+	split_count_after=$(grep ^thp_split_page /proc/vmstat | head -1 | awk '{print $2}')
+	split_count_failed_after=$(grep ^thp_split_page_failed /proc/vmstat | head -1 | awk '{print $2}')
+	thp_split_page=$((split_count_after - split_count_before))
+	thp_split_page_failed=$((split_count_failed_after - split_count_failed_before))
+
+	echo "vmstat thp_split_page: $thp_split_page" >> $seqres.full
+	echo "vmstat thp_split_page_failed: $thp_split_page_failed" >> $seqres.full
+fi
+
+status=0
+exit
diff --git a/tests/generic/751.out b/tests/generic/751.out
new file mode 100644
index 000000000000..6479fa6f1404
--- /dev/null
+++ b/tests/generic/751.out
@@ -0,0 +1,2 @@
+QA output created by 751
+Silence is golden
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH 1/5] common: move mread() to generic helper _mread()
  2024-06-11  3:01 ` [PATCH 1/5] common: move mread() to generic helper _mread() Luis Chamberlain
@ 2024-06-11 14:28   ` Darrick J. Wong
  0 siblings, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2024-06-11 14:28 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Mon, Jun 10, 2024 at 08:01:58PM -0700, Luis Chamberlain wrote:
> We want a shared way to use mmap in a way that we can test
> for the SIGBUS, provide a shared routine which other tests can
> leverage.
> 
> Suggested-by: "Darrick J. Wong" <djwong@kernel.org>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>

Looks ok,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  common/rc         | 28 ++++++++++++++++++++++++++++
>  tests/generic/574 | 36 ++++--------------------------------
>  2 files changed, 32 insertions(+), 32 deletions(-)
> 
> diff --git a/common/rc b/common/rc
> index 163041fea5b9..fa7942809d6c 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -52,6 +52,34 @@ _pwrite_byte() {
>  	$XFS_IO_PROG $xfs_io_args -f -c "pwrite -S $pattern $offset $len" "$file"
>  }
>  
> +_round_up_to_page_boundary()
> +{
> +	local n=$1
> +	local page_size=$(_get_page_size)
> +
> +	echo $(( (n + page_size - 1) & ~(page_size - 1) ))
> +}
> +
> +_mread()
> +{
> +	local file=$1
> +	local offset=$2
> +	local length=$3
> +	local map_len=$(_round_up_to_page_boundary $(_get_filesize $file))
> +
> +	# Some callers expect xfs_io to crash with SIGBUS due to the mread,
> +	# causing the shell to print "Bus error" to stderr.  To allow this
> +	# message to be redirected, execute xfs_io in a new shell instance.
> +	# However, for this to work reliably, we also need to prevent the new
> +	# shell instance from optimizing out the fork and directly exec'ing
> +	# xfs_io.  The easiest way to do that is to append 'true' to the
> +	# commands, so that xfs_io is no longer the last command the shell sees.
> +	# Don't let it write core files to the filesystem.
> +	bash -c "trap '' SIGBUS; ulimit -c 0; $XFS_IO_PROG -r $file \
> +		-c 'mmap -r 0 $map_len' \
> +		-c 'mread -v $offset $length'; true"
> +}
> +
>  # mmap-write a byte into a range of a file
>  _mwrite_byte() {
>  	local pattern="$1"
> diff --git a/tests/generic/574 b/tests/generic/574
> index cb42baaa67aa..d44c23e5abc2 100755
> --- a/tests/generic/574
> +++ b/tests/generic/574
> @@ -52,34 +52,6 @@ setup_zeroed_file()
>  	cmp $fsv_orig_file $fsv_file
>  }
>  
> -round_up_to_page_boundary()
> -{
> -	local n=$1
> -	local page_size=$(_get_page_size)
> -
> -	echo $(( (n + page_size - 1) & ~(page_size - 1) ))
> -}
> -
> -mread()
> -{
> -	local file=$1
> -	local offset=$2
> -	local length=$3
> -	local map_len=$(round_up_to_page_boundary $(_get_filesize $file))
> -
> -	# Some callers expect xfs_io to crash with SIGBUS due to the mread,
> -	# causing the shell to print "Bus error" to stderr.  To allow this
> -	# message to be redirected, execute xfs_io in a new shell instance.
> -	# However, for this to work reliably, we also need to prevent the new
> -	# shell instance from optimizing out the fork and directly exec'ing
> -	# xfs_io.  The easiest way to do that is to append 'true' to the
> -	# commands, so that xfs_io is no longer the last command the shell sees.
> -	# Don't let it write core files to the filesystem.
> -	bash -c "trap '' SIGBUS; ulimit -c 0; $XFS_IO_PROG -r $file \
> -		-c 'mmap -r 0 $map_len' \
> -		-c 'mread -v $offset $length'; true"
> -}
> -
>  corruption_test()
>  {
>  	local block_size=$1
> @@ -142,7 +114,7 @@ corruption_test()
>  	fi
>  
>  	# Reading the full file via mmap should fail.
> -	mread $fsv_file 0 $file_len >/dev/null 2>$tmp.err
> +	_mread $fsv_file 0 $file_len >/dev/null 2>$tmp.err
>  	if ! grep -q 'Bus error' $tmp.err; then
>  		echo "Didn't see SIGBUS when reading file via mmap"
>  		cat $tmp.err
> @@ -150,7 +122,7 @@ corruption_test()
>  
>  	# Reading just the corrupted part via mmap should fail.
>  	if ! $is_merkle_tree; then
> -		mread $fsv_file $zap_offset $zap_len >/dev/null 2>$tmp.err
> +		_mread $fsv_file $zap_offset $zap_len >/dev/null 2>$tmp.err
>  		if ! grep -q 'Bus error' $tmp.err; then
>  			echo "Didn't see SIGBUS when reading corrupted part via mmap"
>  			cat $tmp.err
> @@ -174,10 +146,10 @@ corrupt_eof_block_test()
>  	head -c $zap_len /dev/zero | tr '\0' X \
>  		| _fsv_scratch_corrupt_bytes $fsv_file $file_len
>  
> -	mread $fsv_file $file_len $zap_len >$tmp.out 2>$tmp.err
> +	_mread $fsv_file $file_len $zap_len >$tmp.out 2>$tmp.err
>  
>  	head -c $file_len /dev/zero >$tmp.zeroes
> -	mread $tmp.zeroes $file_len $zap_len >$tmp.zeroes_out
> +	_mread $tmp.zeroes $file_len $zap_len >$tmp.zeroes_out
>  
>  	grep -q 'Bus error' $tmp.err || diff $tmp.out $tmp.zeroes_out
>  }
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] _require_debugfs(): simplify and fix for debian
  2024-06-11  3:02 ` [PATCH 4/5] _require_debugfs(): simplify and fix for debian Luis Chamberlain
@ 2024-06-11 14:35   ` Darrick J. Wong
  2024-06-12  7:51   ` Zorro Lang
  1 sibling, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2024-06-11 14:35 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Mon, Jun 10, 2024 at 08:02:01PM -0700, Luis Chamberlain wrote:
> Using findmnt -S debugfs arguments does not really output anything on
> debian, and is not needed, fix that.

AH, right, -S filter the "source" column in /proc/mounts.  That's
unimportant for fstests, because we really just want to know that the
*fstype* is debugfs, which is what -o FSTYPE does.

Ignoring my previous questions on this matter,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


> Fixes: 8e8fb3da709e ("fstests: fix _require_debugfs and call it properly")
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  common/rc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/common/rc b/common/rc
> index 18ad25662d5c..30beef4e5c02 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -3025,7 +3025,7 @@ _require_debugfs()
>  	local type
>  
>  	if [ -d "$DEBUGFS_MNT" ];then
> -		type=$(findmnt -rncv -T $DEBUGFS_MNT -S debugfs -o FSTYPE)
> +		type=$(findmnt -rncv -T $DEBUGFS_MNT -o FSTYPE)
>  		[ "$type" = "debugfs" ] && return 0
>  	fi
>  
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] fstests: add stress truncation + writeback test
  2024-06-11  3:02 ` [PATCH 5/5] fstests: add stress truncation + writeback test Luis Chamberlain
@ 2024-06-11 14:45   ` Darrick J. Wong
  2024-06-11 18:15     ` Luis Chamberlain
  0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2024-06-11 14:45 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Mon, Jun 10, 2024 at 08:02:02PM -0700, Luis Chamberlain wrote:
> Stress test folio splits by using the new debugfs interface to a target
> a new smaller folio order while triggering writeback at the same time.
> 
> This is known to only creates a crash with min order enabled, so for example
> with a 16k block sized XFS test profile, an xarray fix for that is merged
> already. This issue is fixed by kernel commit 2a0774c2886d ("XArray: set the
> marks correctly when splitting an entry").
> 
> If inspecting more closely, you'll want to enable on your kernel boot:
> 
> 	dyndbg='file mm/huge_memory.c +p'
> 
> Since we want to race large folio splits we also augment the full test
> output log $seqres.full with the test specific number of successful
> splits from vmstat thp_split_page and thp_split_page_failed. The larger
> the vmstat thp_split_page the more we stress test this path.
> 
> This test reproduces a really hard to reproduce crash immediately.
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  common/rc             |  14 +++++
>  tests/generic/751     | 127 ++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/751.out |   2 +
>  3 files changed, 143 insertions(+)
>  create mode 100755 tests/generic/751
>  create mode 100644 tests/generic/751.out
> 
> diff --git a/common/rc b/common/rc
> index 30beef4e5c02..60f572108818 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -158,6 +158,20 @@ _require_vm_compaction()
>  	    _notrun "Need compaction enabled CONFIG_COMPACTION=y"
>  	fi
>  }
> +
> +# Requires CONFIG_DEBUGFS and truncation knobs
> +_require_split_debugfs()

Er... I thought "split" referred to debugfs itself.

_require_split_huge_pages_knob?

> +{
> +       if [ ! -f $DEBUGFS_MNT/split_huge_pages ]; then
> +           _notrun "Needs CONFIG_DEBUGFS and split_huge_pages"
> +       fi
> +}
> +
> +_split_huge_pages_all()
> +{
> +	echo 1 > $DEBUGFS_MNT/split_huge_pages
> +}
> +
>  # Get hugepagesize in bytes
>  _get_hugepagesize()
>  {
> diff --git a/tests/generic/751 b/tests/generic/751
> new file mode 100755
> index 000000000000..7cc96054a5a9
> --- /dev/null
> +++ b/tests/generic/751
> @@ -0,0 +1,127 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (C) 2024 Luis Chamberlain. All Rights Reserved.
> +#
> +# FS QA Test No. 751
> +#
> +# stress page cache truncation + writeback
> +#
> +# This aims at trying to reproduce a difficult to reproduce bug found with
> +# min order. The issue was root caused to an xarray bug when we split folios
> +# to another order other than 0. This functionality is used to support min
> +# order. The crash:
> +#
> +# https://gist.github.com/mcgrof/d12f586ec6ebe32b2472b5d634c397df

You might want to paste the stacktrace in here directly, in case the
gist ever goes away.

> +#
> +# This may also find future truncation bugs in the future, as truncating any
> +# mapped file through the collateral of using echo 1 > split_huge_pages will
> +# always respect the min order. Truncating to a larger order then is excercised
> +# when this test is run against any filesystem LBS profile or an LBS device.
> +#
> +# If you're enabling this and want to check underneath the hood you may want to
> +# enable:
> +#
> +# dyndbg='file mm/huge_memory.c +p'
> +#
> +# This tests aims at increasing the rate of successful truncations so we want
> +# to increase the value of thp_split_page in $seqres.full. Using echo 1 >
> +# split_huge_pages is extremely aggressive, and even accounts for anonymous
> +# memory on a system, however we accept that tradeoff for the efficiency of
> +# doing the work in-kernel for any mapped file too. Our general goal here is to
> +# race with folio truncation + writeback.
> +
> +. ./common/preamble
> +
> +_begin_fstest auto long_rw stress soak smoketest
> +
> +# Override the default cleanup function.
> +_cleanup()
> +{
> +	cd /
> +	rm -f $tmp.*
> +	rm -f $runfile
> +	kill -9 $split_huge_pages_files_pid > /dev/null 2>&1
> +}
> +
> +fio_config=$tmp.fio
> +fio_out=$tmp.fio.out
> +
> +# real QA test starts here
> +_supported_fs generic
> +_require_test
> +_require_scratch
> +_require_debugfs
> +_require_split_debugfs
> +_require_command "$KILLALL_PROG" "killall"
> +_fixed_by_git_commit kernel 2a0774c2886d \
> +	"XArray: set the marks correctly when splitting an entry"
> +
> +# we need buffered IO to force truncation races with writeback in the
> +# page cache
> +cat >$fio_config <<EOF
> +[force_large_large_folio_parallel_writes]
> +nrfiles=10
> +direct=0
> +bs=4M
> +group_reporting=1
> +filesize=1GiB
> +readwrite=write
> +fallocate=none
> +numjobs=$(nproc)
> +directory=$SCRATCH_MNT
> +runtime=100*${TIME_FACTOR}
> +time_based
> +EOF
> +
> +_require_fio $fio_config
> +
> +echo "Silence is golden"
> +
> +_scratch_mkfs >>$seqres.full 2>&1
> +_scratch_mount >> $seqres.full 2>&1
> +
> +# used to let our loops know when to stop
> +runfile="$tmp.keep.running.loop"
> +touch $runfile
> +
> +# The background ops are out of bounds, the goal is to race with fsstress.
> +
> +# Force folio split if possible, this seems to be screaming for MADV_NOHUGEPAGE
> +# for large folios.
> +while [ -e $runfile ]; do
> +	_split_huge_pages_all >/dev/null 2>&1
> +done &
> +split_huge_pages_files_pid=$!
> +
> +split_count_before=0
> +split_count_failed_before=0
> +
> +if grep -q thp_split_page /proc/vmstat; then
> +	split_count_before=$(grep ^thp_split_page /proc/vmstat | head -1 | awk '{print $2}')
> +	split_count_failed_before=$(grep ^thp_split_page_failed /proc/vmstat | head -1 | awk '{print $2}')
> +else
> +	echo "no thp_split_page in /proc/vmstat" >> $seqres.full
> +fi
> +
> +# we blast away with large writes to force large folio writes when
> +# possible.
> +echo -e "Running fio with config:\n" >> $seqres.full
> +cat $fio_config >> $seqres.full
> +$FIO_PROG $fio_config --alloc-size=$(( $(nproc) * 8192 )) --output=$fio_out
> +
> +rm -f $runfile
> +
> +wait > /dev/null 2>&1
> +
> +if grep -q thp_split_page /proc/vmstat; then
> +	split_count_after=$(grep ^thp_split_page /proc/vmstat | head -1 | awk '{print $2}')
> +	split_count_failed_after=$(grep ^thp_split_page_failed /proc/vmstat | head -1 | awk '{print $2}')

I think this ought to be a separate function for cleanliness?

_proc_vmstat()
{
	awk -v name="$1" '{if ($1 ~ name) {print($2)}}' /proc/vmstat
}

	split_count_after=$(_proc_vmstat thp_split_page)

Otherwise this test looks fine to me.

--D

> +	thp_split_page=$((split_count_after - split_count_before))
> +	thp_split_page_failed=$((split_count_failed_after - split_count_failed_before))
> +
> +	echo "vmstat thp_split_page: $thp_split_page" >> $seqres.full
> +	echo "vmstat thp_split_page_failed: $thp_split_page_failed" >> $seqres.full
> +fi
> +
> +status=0
> +exit
> diff --git a/tests/generic/751.out b/tests/generic/751.out
> new file mode 100644
> index 000000000000..6479fa6f1404
> --- /dev/null
> +++ b/tests/generic/751.out
> @@ -0,0 +1,2 @@
> +QA output created by 751
> +Silence is golden
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] fstests: add fsstress + compaction test
  2024-06-11  3:02 ` [PATCH 3/5] fstests: add fsstress + compaction test Luis Chamberlain
@ 2024-06-11 14:48   ` Darrick J. Wong
  2024-06-12  8:00   ` Zorro Lang
  1 sibling, 0 replies; 22+ messages in thread
From: Darrick J. Wong @ 2024-06-11 14:48 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Mon, Jun 10, 2024 at 08:02:00PM -0700, Luis Chamberlain wrote:
> Running compaction while we run fsstress can crash older kernels as per
> korg#218227 [0], the fix for that [0] has been posted [1] that patch
> was merged on v6.9-rc6 fixed by commit d99e3140a4d3 ("mm: turn
> folio_test_hugetlb into a PageType"). However even on v6.10-rc2 where
> this kernel commit is already merged we can still deadlock when running
> fsstress and at the same time triggering compaction, this is a new
> issue being reported now this through patch, but this patch also
> serves as a reproducer with a high confidence. It always deadlocks.
> If you enable CONFIG_PROVE_LOCKING with the defaults you will end up
> with a complaint about increasing MAX_LOCKDEP_CHAIN_HLOCKS [1], if
> you adjust that you then end up with a few soft lockup complaints and
> some possible deadlock candidates to evaluate [2].
> 
> Provide a simple reproducer and pave the way so we keep on testing this.
> 
> Without lockdep enabled we silently deadlock on the first run of the
> test without the fix applied. With lockdep enabled you get a splat about
> the possible deadlock on the first run of the test.
> 
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=218227
> [1] https://gist.github.com/mcgrof/824913b645892214effeb1631df75072
> [2] https://gist.github.com/mcgrof/926e183d21c5c4c55d74ec90197bd77a
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  common/rc             |  7 +++++
>  tests/generic/750     | 62 +++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/750.out |  2 ++
>  3 files changed, 71 insertions(+)
>  create mode 100755 tests/generic/750
>  create mode 100644 tests/generic/750.out
> 
> diff --git a/common/rc b/common/rc
> index e812a2f7cc67..18ad25662d5c 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -151,6 +151,13 @@ _require_hugepages()
>  		_notrun "Kernel does not report huge page size"
>  }
>  
> +# Requires CONFIG_COMPACTION
> +_require_vm_compaction()
> +{
> +	if [ ! -f /proc/sys/vm/compact_memory ]; then
> +	    _notrun "Need compaction enabled CONFIG_COMPACTION=y"
> +	fi
> +}
>  # Get hugepagesize in bytes
>  _get_hugepagesize()
>  {
> diff --git a/tests/generic/750 b/tests/generic/750
> new file mode 100755
> index 000000000000..334ab011dfa0
> --- /dev/null
> +++ b/tests/generic/750
> @@ -0,0 +1,62 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024 Luis Chamberlain.  All Rights Reserved.
> +#
> +# FS QA Test 750
> +#
> +# fsstress + memory compaction test
> +#
> +. ./common/preamble
> +_begin_fstest auto rw long_rw stress soak smoketest
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $runfile
> +	rm -f $tmp.*
> +	kill -9 $trigger_compaction_pid > /dev/null 2>&1
> +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> +
> +	wait > /dev/null 2>&1
> +}
> +
> +# Import common functions.
> +
> +# real QA test starts here
> +
> +_supported_fs generic
> +
> +_require_scratch
> +_require_vm_compaction
> +_require_command "$KILLALL_PROG" "killall"
> +
> +# We still deadlock with this test on v6.10-rc2, we need more work.
> +# but the below makes things better.
> +_fixed_by_git_commit kernel d99e3140a4d3 \
> +	"mm: turn folio_test_hugetlb into a PageType"
> +
> +echo "Silence is golden"
> +
> +_scratch_mkfs > $seqres.full 2>&1
> +_scratch_mount >> $seqres.full 2>&1
> +
> +nr_cpus=$((LOAD_FACTOR * 4))
> +nr_ops=$((25000 * nr_cpus * TIME_FACTOR))
> +fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus)
> +
> +# start a background trigger for memory compaction
> +runfile="$tmp.compaction"
> +touch $runfile
> +while [ -e $runfile ]; do
> +	echo 1 > /proc/sys/vm/compact_memory
> +	sleep 5
> +done &
> +trigger_compaction_pid=$!
> +
> +test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION")

Maybe put this with the other fsstress_args definition above, but
otherwise this looks reasonable.

Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> +
> +$FSSTRESS_PROG $FSSTRESS_AVOID "${fsstress_args[@]}" >> $seqres.full
> +wait > /dev/null 2>&1
> +
> +status=0
> +exit
> diff --git a/tests/generic/750.out b/tests/generic/750.out
> new file mode 100644
> index 000000000000..bd79507b632e
> --- /dev/null
> +++ b/tests/generic/750.out
> @@ -0,0 +1,2 @@
> +QA output created by 750
> +Silence is golden
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] fstests: add mmap page boundary tests
  2024-06-11  3:01 ` [PATCH 2/5] fstests: add mmap page boundary tests Luis Chamberlain
@ 2024-06-11 16:48   ` Darrick J. Wong
  2024-06-11 18:10     ` Luis Chamberlain
  2024-06-12  8:06   ` Zorro Lang
  1 sibling, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2024-06-11 16:48 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Mon, Jun 10, 2024 at 08:01:59PM -0700, Luis Chamberlain wrote:
> mmap() POSIX compliance says we should zero fill data beyond a file
> size up to page boundary, and issue a SIGBUS if we go beyond. While fsx
> helps us test zero-fill sometimes, fsstress also let's us sometimes test
> for SIGBUS however that is based on a random value and its not likely we
> always test it. Dedicate a specic test for this to make testing for
> this specific situation and to easily expand on other corner cases.
> 
> The only filesystem currently known to fail is tmpfs with huge pages,
> but the pending upstream patch "filemap: cap PTE range to be created to
> allowed zero fill in folio_map_range()" fixes this issue for tmpfs and
> also for LBS support.
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  common/rc             |   5 +-
>  tests/generic/749     | 238 ++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/749.out |   2 +
>  3 files changed, 244 insertions(+), 1 deletion(-)
>  create mode 100755 tests/generic/749
>  create mode 100644 tests/generic/749.out
> 
> diff --git a/common/rc b/common/rc
> index fa7942809d6c..e812a2f7cc67 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -60,12 +60,15 @@ _round_up_to_page_boundary()
>  	echo $(( (n + page_size - 1) & ~(page_size - 1) ))
>  }
>  
> +# You can override the $map_len but its optional, by default we use the
> +# max allowed size. If you use a length greater than the default you can
> +# expect a SIBGUS and test for it.
>  _mread()
>  {
>  	local file=$1
>  	local offset=$2
>  	local length=$3
> -	local map_len=$(_round_up_to_page_boundary $(_get_filesize $file))
> +	local map_len=${4:-$(_round_up_to_page_boundary $(_get_filesize $file)) }
>  
>  	# Some callers expect xfs_io to crash with SIGBUS due to the mread,
>  	# causing the shell to print "Bus error" to stderr.  To allow this
> diff --git a/tests/generic/749 b/tests/generic/749
> new file mode 100755
> index 000000000000..c1d3a2028549
> --- /dev/null
> +++ b/tests/generic/749
> @@ -0,0 +1,238 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) Luis Chamberlain. All Rights Reserved.
> +#
> +# FS QA Test 749
> +#
> +# As per POSIX NOTES mmap(2) maps multiples of the system page size, but if the
> +# data mapped is not multiples of the page size the remaining bytes are zeroed
> +# out when mapped and modifications to that region are not written to the file.
> +# On Linux when you write data to such partial page after the end of the
> +# object, the data stays in the page cache even after the file is closed and
> +# unmapped and  even  though  the data  is never written to the file itself,
> +# subsequent mappings may see the modified content. If you go *beyond* this

Does this happen (mwrite data beyond eof sticks around) with large
folios as well?

> +# page, you should get a SIGBUS. This test verifies we zero-fill to page
> +# boundary and ensures we get a SIGBUS if we write to data beyond the system
> +# page size even if the block size is greater than the system page size.
> +. ./common/preamble
> +. ./common/rc
> +_begin_fstest auto quick prealloc
> +
> +# Import common functions.
> +. ./common/filter
> +
> +# real QA test starts here
> +_supported_fs generic
> +_require_scratch_nocheck
> +_require_test
> +_require_xfs_io_command "truncate"
> +_require_xfs_io_command "falloc"
> +
> +# _fixed_by_git_commit kernel <pending-upstream> \
> +#        "filemap: cap PTE range to be created to allowed zero fill in folio_map_range()"
> +
> +filter_xfs_io_data_unique()
> +{
> +    _filter_xfs_io_offset | sed -e 's| |\n|g' | grep -E -v "\.|XX|\*" | \
> +	sort -u | tr -d '\n'
> +}
> +
> +
> +setup_zeroed_file()
> +{
> +	local file_len=$1
> +	local sparse=$2
> +
> +	if $sparse; then
> +		$XFS_IO_PROG -f -c "truncate $file_len" $test_file
> +	else
> +		$XFS_IO_PROG -f -c "falloc 0 $file_len" $test_file
> +	fi
> +}
> +
> +mwrite()
> +{
> +       local file=$1
> +       local offset=$2
> +       local length=$3
> +       local map_len=${4:-$(_round_up_to_page_boundary $(_get_filesize $file)) }
> +
> +       # Some callers expect xfs_io to crash with SIGBUS due to the mread,
> +       # causing the shell to print "Bus error" to stderr.  To allow this
> +       # message to be redirected, execute xfs_io in a new shell instance.
> +       # However, for this to work reliably, we also need to prevent the new
> +       # shell instance from optimizing out the fork and directly exec'ing
> +       # xfs_io.  The easiest way to do that is to append 'true' to the
> +       # commands, so that xfs_io is no longer the last command the shell sees.
> +       bash -c "trap '' SIGBUS; ulimit -c 0; \
> +		$XFS_IO_PROG $file \
> +               -c 'mmap -w 0 $map_len' \
> +               -c 'mwrite $offset $length'; \
> +	       true"
> +}
> +
> +do_mmap_tests()
> +{
> +	local block_size=$1
> +	local file_len=$2
> +	local offset=$3
> +	local len=$4
> +	local use_sparse_file=${5:-false}
> +	local new_filelen=0
> +	local map_len=0
> +	local csum=0
> +	local fs_block_size=$(_get_file_block_size $SCRATCH_MNT)
> +
> +	echo -en "\n\n==> Testing blocksize $block_size " >> $seqres.full
> +	echo -en "file_len: $file_len offset: $offset " >> $seqres.full
> +	echo -e "len: $len sparse: $use_sparse_file" >> $seqres.full
> +
> +	if ((fs_block_size != block_size)); then
> +		_fail "Block size created ($block_size) doesn't match _get_file_block_size on mount ($fs_block_size)"
> +	fi
> +
> +	rm -rf "${SCRATCH_MNT:?}"/*

rm -f $SCRATCH_MNT/file ?

> +
> +	# This let's us also test against sparse files
> +	setup_zeroed_file $file_len $use_sparse_file
> +
> +	# This will overwrite the old data, the file size is the
> +	# delta between offset and len now.
> +	$XFS_IO_PROG -f -c "pwrite -S 0xaa -b 512 $offset $len" \
> +		$test_file >> $seqres.full
> +
> +	sync
> +	new_filelen=$(_get_filesize $test_file)
> +	map_len=$(_round_up_to_page_boundary $new_filelen)
> +	csum_orig="$(_md5_checksum $test_file)"
> +
> +	# A couple of mmap() tests:
> +	#
> +	# We are allowed to mmap() up to the boundary of the page size of a
> +	# data object, but there a few rules to follow we must check for:
> +	#
> +	# a) zero-fill test for the data: POSIX says we should zero fill any
> +	#    partial page after the end of the object. Verify zero-fill.
> +	# b) do not write this bogus data to disk: on Linux, if we write data
> +	#    to a partially filled page, it will stay in the page cache even
> +	#    after the file is closed and unmapped even if it never reaches the
> +	#    file. Subsequent mappings *may* see the modified content, but it
> +	#    also can get other data. Since the data read after the actual

What other data?

> +	#    object data can vary we just verify the filesize does not change.
> +	if [[ $map_len -gt $new_filelen ]]; then
> +		zero_filled_data_len=$((map_len - new_filelen))
> +		_scratch_cycle_mount
> +		expected_zero_data="00"
> +		zero_filled_data=$($XFS_IO_PROG -r $test_file \
> +			-c "mmap -r 0 $map_len" \
> +			-c "mread -v $new_filelen $zero_filled_data_len" \
> +			-c "munmap" | \
> +			filter_xfs_io_data_unique)
> +		if [[ "$zero_filled_data" != "$expected_zero_data" ]]; then
> +			echo "Expected data: $expected_zero_data"
> +			echo "  Actual data: $zero_filled_data"
> +			_fail "Zero-fill expectations with mmap() not respected"
> +		fi
> +
> +		_scratch_cycle_mount
> +		$XFS_IO_PROG $test_file \
> +			-c "mmap -w 0 $map_len" \
> +			-c "mwrite $new_filelen $zero_filled_data_len" \
> +			-c "munmap"
> +		sync
> +		csum_post="$(_md5_checksum $test_file)"
> +		if [[ "$csum_orig" != "$csum_post" ]]; then
> +			echo "Expected csum: $csum_orig"
> +			echo " Actual  csum: $csum_post"
> +			_fail "mmap() write up to page boundary should not change actual file contents"

Do you really want to stop the test immediately?  Or keep going and see
what other errors fall out?  (i.e. s/_fail/echo/ here)

> +		fi
> +
> +		local filelen_test=$(_get_filesize $test_file)
> +		if [[ "$filelen_test" != "$new_filelen" ]]; then
> +			echo "Expected file length: $new_filelen"
> +			echo " Actual  file length: $filelen_test"
> +			_fail "mmap() write up to page boundary should not change actual file size"
> +		fi
> +	fi
> +
> +	# Now lets ensure we get SIGBUS when we go beyond the page boundary
> +	_scratch_cycle_mount
> +	new_filelen=$(_get_filesize $test_file)
> +	map_len=$(_round_up_to_page_boundary $new_filelen)
> +	csum_orig="$(_md5_checksum $test_file)"
> +	_mread $test_file 0 $map_len >> $seqres.full  2>$tmp.err
> +	if grep -q 'Bus error' $tmp.err; then
> +		cat $tmp.err
> +		_fail "Not expecting SIGBUS when reading up to page boundary"
> +	fi
> +
> +	# This should just work
> +	_mread $test_file 0 $map_len >> $seqres.full  2>$tmp.err
> +	if [[ $? -ne 0 ]]; then
> +		_fail "mmap() read up to page boundary should work"
> +	fi
> +
> +	# This should just work
> +	mwrite $map_len 0 $map_len >> $seqres.full  2>$tmp.err
> +	if [[ $? -ne 0 ]]; then
> +		_fail "mmap() write up to page boundary should work"
> +	fi
> +
> +	# If we mmap() on the boundary but try to read beyond it just
> +	# fails, we don't get a SIGBUS
> +	$XFS_IO_PROG -r $test_file \
> +		-c "mmap -r 0 $map_len" \
> +		-c "mread 0 $((map_len + 10))" >> $seqres.full  2>$tmp.err
> +	local mread_err=$?
> +	if [[ $mread_err -eq 0 ]]; then
> +		_fail "mmap() to page boundary works as expected but reading beyond should fail: $mread_err"
> +	fi
> +
> +	$XFS_IO_PROG -w $test_file \
> +		-c "mmap -w 0 $map_len" \
> +		-c "mwrite 0 $((map_len + 10))" >> $seqres.full  2>$tmp.err
> +	local mwrite_err=$?
> +	if [[ $mwrite_err -eq 0 ]]; then
> +		_fail "mmap() to page boundary works as expected but writing beyond should fail: $mwrite_err"
> +	fi
> +
> +	# Now let's go beyond the allowed mmap() page boundary
> +	_mread $test_file 0 $((map_len + 10)) $((map_len + 10)) >> $seqres.full  2>$tmp.err
> +	if ! grep -q 'Bus error' $tmp.err; then
> +		_fail "Expected SIGBUS when mmap() reading beyond page boundary"
> +	fi
> +
> +	mwrite $test_file 0 $((map_len + 10)) $((map_len + 10))  >> $seqres.full  2>$tmp.err
> +	if ! grep -q 'Bus error' $tmp.err; then
> +		_fail "Expected SIGBUS when mmap() writing beyond page boundary"
> +	fi
> +
> +	local filelen_test=$(_get_filesize $test_file)
> +	if [[ "$filelen_test" != "$new_filelen" ]]; then
> +		echo "Expected file length: $new_filelen"
> +		echo " Actual  file length: $filelen_test"
> +		_fail "reading or writing beyond file size up to mmap() page boundary should not change file size"
> +	fi
> +}
> +
> +test_block_size()
> +{
> +	local block_size=$1
> +
> +	do_mmap_tests $block_size 512 3 5
> +	do_mmap_tests $block_size 11k 0 $((4096 * 3 + 3))
> +	do_mmap_tests $block_size 16k 0 $((16384+3))
> +	do_mmap_tests $block_size 16k $((16384-10)) $((16384+20))
> +	do_mmap_tests $block_size 64k 0 $((65536+3))
> +	do_mmap_tests $block_size 4k 4090 30 true
> +}
> +
> +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
> +_scratch_mount
> +test_file=$SCRATCH_MNT/file
> +block_size=$(_get_file_block_size "$SCRATCH_MNT")
> +test_block_size $block_size
> +
> +echo "Silence is golden"
> +status=0
> +exit
> diff --git a/tests/generic/749.out b/tests/generic/749.out
> new file mode 100644
> index 000000000000..24658deddb99
> --- /dev/null
> +++ b/tests/generic/749.out
> @@ -0,0 +1,2 @@
> +QA output created by 749
> +Silence is golden
> -- 
> 2.43.0
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] fstests: add mmap page boundary tests
  2024-06-11 16:48   ` Darrick J. Wong
@ 2024-06-11 18:10     ` Luis Chamberlain
  2024-06-11 18:46       ` Darrick J. Wong
  0 siblings, 1 reply; 22+ messages in thread
From: Luis Chamberlain @ 2024-06-11 18:10 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Tue, Jun 11, 2024 at 09:48:11AM -0700, Darrick J. Wong wrote:
> On Mon, Jun 10, 2024 at 08:01:59PM -0700, Luis Chamberlain wrote:
> > +# As per POSIX NOTES mmap(2) maps multiples of the system page size, but if the
> > +# data mapped is not multiples of the page size the remaining bytes are zeroed
> > +# out when mapped and modifications to that region are not written to the file.
> > +# On Linux when you write data to such partial page after the end of the
> > +# object, the data stays in the page cache even after the file is closed and
> > +# unmapped and  even  though  the data  is never written to the file itself,
> > +# subsequent mappings may see the modified content. If you go *beyond* this
> 
> Does this happen (mwrite data beyond eof sticks around) with large
> folios as well?

That corner case of checking to see if it stays is not tested by this
test, but we could / should extend this test later for that. But then
the question becomes, what is right, given we are in grey area, if we
don't have any defined standard for it, it seems odd to test for it.

So the test currently only tests for correctness of what we expect for
POSIX and what we all have agreed for Linux.

Hurding everyone to follow suit for the other corner cases is something
perhaps we should do. Do we have a "strict fail" ? So that perhaps we can
later add a test case for it and so that onnce and if we get consensus
on what we do we can enable say a "strict-Linux" mode where we are
pedantic about a new world order?

> > +	rm -rf "${SCRATCH_MNT:?}"/*
> 
> rm -f $SCRATCH_MNT/file ?

Sure.

> > +	# A couple of mmap() tests:
> > +	#
> > +	# We are allowed to mmap() up to the boundary of the page size of a
> > +	# data object, but there a few rules to follow we must check for:
> > +	#
> > +	# a) zero-fill test for the data: POSIX says we should zero fill any
> > +	#    partial page after the end of the object. Verify zero-fill.
> > +	# b) do not write this bogus data to disk: on Linux, if we write data
> > +	#    to a partially filled page, it will stay in the page cache even
> > +	#    after the file is closed and unmapped even if it never reaches the
> > +	#    file. Subsequent mappings *may* see the modified content, but it
> > +	#    also can get other data. Since the data read after the actual
> 
> What other data?

Beats me, got that from the man page bible on mmap. I think its homework
for us to find out who is spewing that out, which gives a bit more value
to the idea of that strict-linux thing. How else will we find out?

> > +	#    object data can vary we just verify the filesize does not change.
> > +	if [[ $map_len -gt $new_filelen ]]; then
> > +		zero_filled_data_len=$((map_len - new_filelen))
> > +		_scratch_cycle_mount
> > +		expected_zero_data="00"
> > +		zero_filled_data=$($XFS_IO_PROG -r $test_file \
> > +			-c "mmap -r 0 $map_len" \
> > +			-c "mread -v $new_filelen $zero_filled_data_len" \
> > +			-c "munmap" | \
> > +			filter_xfs_io_data_unique)
> > +		if [[ "$zero_filled_data" != "$expected_zero_data" ]]; then
> > +			echo "Expected data: $expected_zero_data"
> > +			echo "  Actual data: $zero_filled_data"
> > +			_fail "Zero-fill expectations with mmap() not respected"
> > +		fi
> > +
> > +		_scratch_cycle_mount
> > +		$XFS_IO_PROG $test_file \
> > +			-c "mmap -w 0 $map_len" \
> > +			-c "mwrite $new_filelen $zero_filled_data_len" \
> > +			-c "munmap"
> > +		sync
> > +		csum_post="$(_md5_checksum $test_file)"
> > +		if [[ "$csum_orig" != "$csum_post" ]]; then
> > +			echo "Expected csum: $csum_orig"
> > +			echo " Actual  csum: $csum_post"
> > +			_fail "mmap() write up to page boundary should not change actual file contents"
> 
> Do you really want to stop the test immediately?  Or keep going and see
> what other errors fall out?  (i.e. s/_fail/echo/ here)

Good point. We could go on, I'll change on the next v2.

Thanks!

  Luis

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] fstests: add stress truncation + writeback test
  2024-06-11 14:45   ` Darrick J. Wong
@ 2024-06-11 18:15     ` Luis Chamberlain
  2024-06-11 18:29       ` Darrick J. Wong
  0 siblings, 1 reply; 22+ messages in thread
From: Luis Chamberlain @ 2024-06-11 18:15 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Tue, Jun 11, 2024 at 07:45:03AM -0700, Darrick J. Wong wrote:
> On Mon, Jun 10, 2024 at 08:02:02PM -0700, Luis Chamberlain wrote:
> > +# Requires CONFIG_DEBUGFS and truncation knobs
> > +_require_split_debugfs()
> 
> Er... I thought "split" referred to debugfs itself.
> 
> _require_split_huge_pages_knob?

Much better, thanks.

> > +# This aims at trying to reproduce a difficult to reproduce bug found with
> > +# min order. The issue was root caused to an xarray bug when we split folios
> > +# to another order other than 0. This functionality is used to support min
> > +# order. The crash:
> > +#
> > +# https://gist.github.com/mcgrof/d12f586ec6ebe32b2472b5d634c397df
> 
> You might want to paste the stacktrace in here directly, in case the
> gist ever goes away.

Its not a simple crash trace, it is pretty enourmous considering I
decoded it, and it has all locking candidates. Even including it after
the "---" lines of the patch might make someone go: TLDR. Thoughts?

> > +if grep -q thp_split_page /proc/vmstat; then
> > +	split_count_after=$(grep ^thp_split_page /proc/vmstat | head -1 | awk '{print $2}')
> > +	split_count_failed_after=$(grep ^thp_split_page_failed /proc/vmstat | head -1 | awk '{print $2}')
> 
> I think this ought to be a separate function for cleanliness?
> 
> _proc_vmstat()
> {
> 	awk -v name="$1" '{if ($1 ~ name) {print($2)}}' /proc/vmstat
> }

> Otherwise this test looks fine to me.

Thanks!

  Luis

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] fstests: add stress truncation + writeback test
  2024-06-11 18:15     ` Luis Chamberlain
@ 2024-06-11 18:29       ` Darrick J. Wong
  2024-06-11 18:59         ` Luis Chamberlain
  0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2024-06-11 18:29 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Tue, Jun 11, 2024 at 11:15:52AM -0700, Luis Chamberlain wrote:
> On Tue, Jun 11, 2024 at 07:45:03AM -0700, Darrick J. Wong wrote:
> > On Mon, Jun 10, 2024 at 08:02:02PM -0700, Luis Chamberlain wrote:
> > > +# Requires CONFIG_DEBUGFS and truncation knobs
> > > +_require_split_debugfs()
> > 
> > Er... I thought "split" referred to debugfs itself.
> > 
> > _require_split_huge_pages_knob?
> 
> Much better, thanks.
> 
> > > +# This aims at trying to reproduce a difficult to reproduce bug found with
> > > +# min order. The issue was root caused to an xarray bug when we split folios
> > > +# to another order other than 0. This functionality is used to support min
> > > +# order. The crash:
> > > +#
> > > +# https://gist.github.com/mcgrof/d12f586ec6ebe32b2472b5d634c397df
> > 
> > You might want to paste the stacktrace in here directly, in case the
> > gist ever goes away.
> 
> Its not a simple crash trace, it is pretty enourmous considering I
> decoded it, and it has all locking candidates. Even including it after
> the "---" lines of the patch might make someone go: TLDR. Thoughts?

I'd paste it in, even if it's quite lengthy.  I don't even think it's all that
much if you remove some of the less useful bits of the unwind:

"Crash excerpt is as follows:

"BUG: kernel NULL pointer dereference, address: 0000000000000036
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 2190 Comm: kworker/u38:5 Not tainted 6.9.0-rc5+ #14
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Workqueue: writeback wb_workfn (flush-7:5)
RIP: 0010:filemap_get_folios_tag+0xa9/0x200
Call Trace:
 <TASK>
 writeback_iter+0x17d/0x310
 write_cache_pages+0x42/0xa0
 iomap_writepages+0x33/0x50
 xfs_vm_writepages+0x63/0x90 [xfs]
 do_writepages+0xcc/0x260
 __writeback_single_inode+0x3d/0x340
 writeback_sb_inodes+0x1ed/0x4b0
 __writeback_inodes_wb+0x4c/0xe0
 wb_writeback+0x267/0x2d0
 wb_workfn+0x2a4/0x440
 process_one_work+0x189/0x3b0
 worker_thread+0x273/0x390
 kthread+0xda/0x110
 ret_from_fork+0x2d/0x50
 ret_from_fork_asm+0x1a/0x30
 </TASK>"

--D

> > > +if grep -q thp_split_page /proc/vmstat; then
> > > +	split_count_after=$(grep ^thp_split_page /proc/vmstat | head -1 | awk '{print $2}')
> > > +	split_count_failed_after=$(grep ^thp_split_page_failed /proc/vmstat | head -1 | awk '{print $2}')
> > 
> > I think this ought to be a separate function for cleanliness?
> > 
> > _proc_vmstat()
> > {
> > 	awk -v name="$1" '{if ($1 ~ name) {print($2)}}' /proc/vmstat
> > }
> 
> > Otherwise this test looks fine to me.
> 
> Thanks!
> 
>   Luis
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] fstests: add mmap page boundary tests
  2024-06-11 18:10     ` Luis Chamberlain
@ 2024-06-11 18:46       ` Darrick J. Wong
  2024-06-11 20:29         ` Luis Chamberlain
  0 siblings, 1 reply; 22+ messages in thread
From: Darrick J. Wong @ 2024-06-11 18:46 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Tue, Jun 11, 2024 at 11:10:13AM -0700, Luis Chamberlain wrote:
> On Tue, Jun 11, 2024 at 09:48:11AM -0700, Darrick J. Wong wrote:
> > On Mon, Jun 10, 2024 at 08:01:59PM -0700, Luis Chamberlain wrote:
> > > +# As per POSIX NOTES mmap(2) maps multiples of the system page size, but if the
> > > +# data mapped is not multiples of the page size the remaining bytes are zeroed
> > > +# out when mapped and modifications to that region are not written to the file.
> > > +# On Linux when you write data to such partial page after the end of the
> > > +# object, the data stays in the page cache even after the file is closed and
> > > +# unmapped and  even  though  the data  is never written to the file itself,
> > > +# subsequent mappings may see the modified content. If you go *beyond* this
> > 
> > Does this happen (mwrite data beyond eof sticks around) with large
> > folios as well?
> 
> That corner case of checking to see if it stays is not tested by this
> test, but we could / should extend this test later for that. But then
> the question becomes, what is right, given we are in grey area, if we
> don't have any defined standard for it, it seems odd to test for it.
> 
> So the test currently only tests for correctness of what we expect for
> POSIX and what we all have agreed for Linux.
> 
> Hurding everyone to follow suit for the other corner cases is something
> perhaps we should do. Do we have a "strict fail" ? So that perhaps we can
> later add a test case for it and so that onnce and if we get consensus
> on what we do we can enable say a "strict-Linux" mode where we are
> pedantic about a new world order?

I doubt there's an easy way to guarantee more than "initialized to zero,
contents may stay around in memory but will not be written to disk".
You could do asinine things like fault on every access and manually
inject zero bytes, but ... yuck.

That said -- let's say you have a 33k file, and a space mapping for
0-63k (e.g. it was preallocated).  Can the pagecache grab (say) a 64k
folio for the EOF part of the pagecache?  And can you mmap that whole
region?  And see even more grey area mmapping?  Or does mmap always cut
off the mapping at roundup(i_size_read(), PAGE_SIZE) ?

(I feel like I've asked this before, and forgotten the answer. :()

> > > +	rm -rf "${SCRATCH_MNT:?}"/*
> > 
> > rm -f $SCRATCH_MNT/file ?
> 
> Sure.
> 
> > > +	# A couple of mmap() tests:
> > > +	#
> > > +	# We are allowed to mmap() up to the boundary of the page size of a
> > > +	# data object, but there a few rules to follow we must check for:
> > > +	#
> > > +	# a) zero-fill test for the data: POSIX says we should zero fill any
> > > +	#    partial page after the end of the object. Verify zero-fill.
> > > +	# b) do not write this bogus data to disk: on Linux, if we write data
> > > +	#    to a partially filled page, it will stay in the page cache even
> > > +	#    after the file is closed and unmapped even if it never reaches the
> > > +	#    file. Subsequent mappings *may* see the modified content, but it
> > > +	#    also can get other data. Since the data read after the actual
> > 
> > What other data?
> 
> Beats me, got that from the man page bible on mmap. I think its homework
> for us to find out who is spewing that out, which gives a bit more value
> to the idea of that strict-linux thing. How else will we find out?

Oh, ok.  I couldn't tell if *you* had seen "other" data emerging from
the murk, or if that was merely what a spec says.  Please cite the
particular bible you were reading. ;)

> > > +	#    object data can vary we just verify the filesize does not change.
> > > +	if [[ $map_len -gt $new_filelen ]]; then
> > > +		zero_filled_data_len=$((map_len - new_filelen))
> > > +		_scratch_cycle_mount
> > > +		expected_zero_data="00"
> > > +		zero_filled_data=$($XFS_IO_PROG -r $test_file \
> > > +			-c "mmap -r 0 $map_len" \
> > > +			-c "mread -v $new_filelen $zero_filled_data_len" \
> > > +			-c "munmap" | \
> > > +			filter_xfs_io_data_unique)
> > > +		if [[ "$zero_filled_data" != "$expected_zero_data" ]]; then
> > > +			echo "Expected data: $expected_zero_data"
> > > +			echo "  Actual data: $zero_filled_data"
> > > +			_fail "Zero-fill expectations with mmap() not respected"
> > > +		fi
> > > +
> > > +		_scratch_cycle_mount
> > > +		$XFS_IO_PROG $test_file \
> > > +			-c "mmap -w 0 $map_len" \
> > > +			-c "mwrite $new_filelen $zero_filled_data_len" \
> > > +			-c "munmap"
> > > +		sync
> > > +		csum_post="$(_md5_checksum $test_file)"
> > > +		if [[ "$csum_orig" != "$csum_post" ]]; then
> > > +			echo "Expected csum: $csum_orig"
> > > +			echo " Actual  csum: $csum_post"
> > > +			_fail "mmap() write up to page boundary should not change actual file contents"
> > 
> > Do you really want to stop the test immediately?  Or keep going and see
> > what other errors fall out?  (i.e. s/_fail/echo/ here)
> 
> Good point. We could go on, I'll change on the next v2.
> 
> Thanks!

NP.

--D

> 
>   Luis
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 5/5] fstests: add stress truncation + writeback test
  2024-06-11 18:29       ` Darrick J. Wong
@ 2024-06-11 18:59         ` Luis Chamberlain
  0 siblings, 0 replies; 22+ messages in thread
From: Luis Chamberlain @ 2024-06-11 18:59 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Tue, Jun 11, 2024 at 11:29:59AM -0700, Darrick J. Wong wrote:
> On Tue, Jun 11, 2024 at 11:15:52AM -0700, Luis Chamberlain wrote:
> > On Tue, Jun 11, 2024 at 07:45:03AM -0700, Darrick J. Wong wrote:
> > > On Mon, Jun 10, 2024 at 08:02:02PM -0700, Luis Chamberlain wrote:
> > > > +# Requires CONFIG_DEBUGFS and truncation knobs
> > > > +_require_split_debugfs()
> > > 
> > > Er... I thought "split" referred to debugfs itself.
> > > 
> > > _require_split_huge_pages_knob?
> > 
> > Much better, thanks.
> > 
> > > > +# This aims at trying to reproduce a difficult to reproduce bug found with
> > > > +# min order. The issue was root caused to an xarray bug when we split folios
> > > > +# to another order other than 0. This functionality is used to support min
> > > > +# order. The crash:
> > > > +#
> > > > +# https://gist.github.com/mcgrof/d12f586ec6ebe32b2472b5d634c397df
> > > 
> > > You might want to paste the stacktrace in here directly, in case the
> > > gist ever goes away.
> > 
> > Its not a simple crash trace, it is pretty enourmous considering I
> > decoded it, and it has all locking candidates. Even including it after
> > the "---" lines of the patch might make someone go: TLDR. Thoughts?
> 
> I'd paste it in, even if it's quite lengthy.  I don't even think it's all that
> much if you remove some of the less useful bits of the unwind:
> 
> "Crash excerpt is as follows:
> 
> "BUG: kernel NULL pointer dereference, address: 0000000000000036
> #PF: supervisor read access in kernel mode
> #PF: error_code(0x0000) - not-present page
> PGD 0 P4D 0
> Oops: 0000 [#1] PREEMPT SMP NOPTI
> CPU: 7 PID: 2190 Comm: kworker/u38:5 Not tainted 6.9.0-rc5+ #14
> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> Workqueue: writeback wb_workfn (flush-7:5)
> RIP: 0010:filemap_get_folios_tag+0xa9/0x200
> Call Trace:
>  <TASK>
>  writeback_iter+0x17d/0x310
>  write_cache_pages+0x42/0xa0
>  iomap_writepages+0x33/0x50
>  xfs_vm_writepages+0x63/0x90 [xfs]
>  do_writepages+0xcc/0x260
>  __writeback_single_inode+0x3d/0x340
>  writeback_sb_inodes+0x1ed/0x4b0
>  __writeback_inodes_wb+0x4c/0xe0
>  wb_writeback+0x267/0x2d0
>  wb_workfn+0x2a4/0x440
>  process_one_work+0x189/0x3b0
>  worker_thread+0x273/0x390
>  kthread+0xda/0x110
>  ret_from_fork+0x2d/0x50
>  ret_from_fork_asm+0x1a/0x30
>  </TASK>"

Ah, sorry yes, this crash dump is small, the other one is the one that
was I thinking, which we still deadlock on and have only a lockdep hint
about likely what is going on. I'll include this dump on v2.

  Luis

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] fstests: add mmap page boundary tests
  2024-06-11 18:46       ` Darrick J. Wong
@ 2024-06-11 20:29         ` Luis Chamberlain
  0 siblings, 0 replies; 22+ messages in thread
From: Luis Chamberlain @ 2024-06-11 20:29 UTC (permalink / raw)
  To: Darrick J. Wong, hughd
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Tue, Jun 11, 2024 at 11:46:03AM -0700, Darrick J. Wong wrote:
> On Tue, Jun 11, 2024 at 11:10:13AM -0700, Luis Chamberlain wrote:
> > On Tue, Jun 11, 2024 at 09:48:11AM -0700, Darrick J. Wong wrote:
> > > On Mon, Jun 10, 2024 at 08:01:59PM -0700, Luis Chamberlain wrote:
> > > > +# As per POSIX NOTES mmap(2) maps multiples of the system page size, but if the
> > > > +# data mapped is not multiples of the page size the remaining bytes are zeroed
> > > > +# out when mapped and modifications to that region are not written to the file.
> > > > +# On Linux when you write data to such partial page after the end of the
> > > > +# object, the data stays in the page cache even after the file is closed and
> > > > +# unmapped and  even  though  the data  is never written to the file itself,
> > > > +# subsequent mappings may see the modified content. If you go *beyond* this
> > > 
> > > Does this happen (mwrite data beyond eof sticks around) with large
> > > folios as well?
> > 
> > That corner case of checking to see if it stays is not tested by this
> > test, but we could / should extend this test later for that. But then
> > the question becomes, what is right, given we are in grey area, if we
> > don't have any defined standard for it, it seems odd to test for it.
> > 
> > So the test currently only tests for correctness of what we expect for
> > POSIX and what we all have agreed for Linux.
> > 
> > Hurding everyone to follow suit for the other corner cases is something
> > perhaps we should do. Do we have a "strict fail" ? So that perhaps we can
> > later add a test case for it and so that onnce and if we get consensus
> > on what we do we can enable say a "strict-Linux" mode where we are
> > pedantic about a new world order?
> 
> I doubt there's an easy way to guarantee more than "initialized to zero,
> contents may stay around in memory but will not be written to disk".
> You could do asinine things like fault on every access and manually
> inject zero bytes, but ... yuck.

Sure, but I suspect the real issue is if it does something like leak
data which it should not. The test as-is does test to ensure the data
is zeroed.

If we want to add a test to close the mmap and ensure the data beyond
the file content up to PAGE_SIZE is still zeroed out, it's easy to do,
it was just that it seems that *could* end up with different results
depending on the filesystem.

> That said -- let's say you have a 33k file, and a space mapping for
> 0-63k 

What block size filesystem in this example? If the lengh is 33k, whether
or not it was truncated does not matter, the file size is what matters.
The block size is what we use for the minimum order folio, and sincee we
start at offset 0, a 33k sized file on a 64k block size filesystem will
get a 64k folio. On a 32k block size filesystem, it will get two 32k
foios.

> (e.g. it was preallocated).

Do you mean sparse or what? Because if its a sparse file it will still
change the size of the file, so just wanted to check.

> Can the pagecache grab (say) a 64k folio for the EOF part of the pagecache?

It depends on the block size of the filesystem. If 4k, we'd go up to
36k, and 33k-46k would be zereod.

With min order, we'd have a folio of 8k, 32k, or 64k. For 8k we'd have
5 folios of 8k size each, the last one have only 1k of data, and 3k
zeroed out. No PTEs would be assigned for that folio beyond 36k boundary
and so we'd SIGBUS on access beyond it. We test for this in this test.

> And can you mmap that whole region?

No, we test for this too here. You can  only mmap up to the aligned
PAGE_SIZE of the file size.

> And see even more grey area mmapping?

No, we limit up to PAGE_SIZE alignement.

> Or does mmap always cut
> off the mapping at roundup(i_size_read(), PAGE_SIZE) ?

That's right, we do this, without LBS this was implied, but with LBS
we have to be explicit about using the PAGE_SIZE alignment restriction.

This test checks for all that, and checks for both integrity of the contents
and file size even if you muck with the extra fluff allowed by mmap().

> > > What other data?
> > 
> > Beats me, got that from the man page bible on mmap. I think its homework
> > for us to find out who is spewing that out, which gives a bit more value
> > to the idea of that strict-linux thing. How else will we find out?
> 
> Oh, ok.  I couldn't tell if *you* had seen "other" data emerging from
> the murk, or if that was merely what a spec says.  Please cite the
> particular bible you were reading. ;)

From the mmap(2) man page: "subsequent mappings may see the modified content."
so I extended this with the implications of it using *may*.

Speaking of the man page, I see also that huge pages are addressed there
and when a huge page is used it says:

"The system automatically aligns length to be a multiple of the
underlying huge page size"

And so I believes that means we need to check for the huge page on
filemap_map_pages() and also the test and adjust it to align to the
specific huge page size if used...

Or just skip tmpfs / hugetlbfs for now...

  Luis

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 4/5] _require_debugfs(): simplify and fix for debian
  2024-06-11  3:02 ` [PATCH 4/5] _require_debugfs(): simplify and fix for debian Luis Chamberlain
  2024-06-11 14:35   ` Darrick J. Wong
@ 2024-06-12  7:51   ` Zorro Lang
  1 sibling, 0 replies; 22+ messages in thread
From: Zorro Lang @ 2024-06-12  7:51 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Mon, Jun 10, 2024 at 08:02:01PM -0700, Luis Chamberlain wrote:
> Using findmnt -S debugfs arguments does not really output anything on
> debian, and is not needed, fix that.
> 
> Fixes: 8e8fb3da709e ("fstests: fix _require_debugfs and call it properly")
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---

Thanks for fixing it.

Reviewed-by: Zorro Lang <zlang@redhat.com>

>  common/rc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/common/rc b/common/rc
> index 18ad25662d5c..30beef4e5c02 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -3025,7 +3025,7 @@ _require_debugfs()
>  	local type
>  
>  	if [ -d "$DEBUGFS_MNT" ];then
> -		type=$(findmnt -rncv -T $DEBUGFS_MNT -S debugfs -o FSTYPE)
> +		type=$(findmnt -rncv -T $DEBUGFS_MNT -o FSTYPE)
>  		[ "$type" = "debugfs" ] && return 0
>  	fi
>  
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] fstests: add fsstress + compaction test
  2024-06-11  3:02 ` [PATCH 3/5] fstests: add fsstress + compaction test Luis Chamberlain
  2024-06-11 14:48   ` Darrick J. Wong
@ 2024-06-12  8:00   ` Zorro Lang
  2024-06-13 21:10     ` Luis Chamberlain
  1 sibling, 1 reply; 22+ messages in thread
From: Zorro Lang @ 2024-06-12  8:00 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Mon, Jun 10, 2024 at 08:02:00PM -0700, Luis Chamberlain wrote:
> Running compaction while we run fsstress can crash older kernels as per
> korg#218227 [0], the fix for that [0] has been posted [1] that patch
> was merged on v6.9-rc6 fixed by commit d99e3140a4d3 ("mm: turn
> folio_test_hugetlb into a PageType"). However even on v6.10-rc2 where
> this kernel commit is already merged we can still deadlock when running
> fsstress and at the same time triggering compaction, this is a new
> issue being reported now this through patch, but this patch also
> serves as a reproducer with a high confidence. It always deadlocks.
> If you enable CONFIG_PROVE_LOCKING with the defaults you will end up
> with a complaint about increasing MAX_LOCKDEP_CHAIN_HLOCKS [1], if
> you adjust that you then end up with a few soft lockup complaints and
> some possible deadlock candidates to evaluate [2].
> 
> Provide a simple reproducer and pave the way so we keep on testing this.
> 
> Without lockdep enabled we silently deadlock on the first run of the
> test without the fix applied. With lockdep enabled you get a splat about
> the possible deadlock on the first run of the test.
> 
> [0] https://bugzilla.kernel.org/show_bug.cgi?id=218227
> [1] https://gist.github.com/mcgrof/824913b645892214effeb1631df75072
> [2] https://gist.github.com/mcgrof/926e183d21c5c4c55d74ec90197bd77a
> 
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  common/rc             |  7 +++++
>  tests/generic/750     | 62 +++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/750.out |  2 ++
>  3 files changed, 71 insertions(+)
>  create mode 100755 tests/generic/750
>  create mode 100644 tests/generic/750.out
> 
> diff --git a/common/rc b/common/rc
> index e812a2f7cc67..18ad25662d5c 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -151,6 +151,13 @@ _require_hugepages()
>  		_notrun "Kernel does not report huge page size"
>  }
>  
> +# Requires CONFIG_COMPACTION
> +_require_vm_compaction()
> +{
> +	if [ ! -f /proc/sys/vm/compact_memory ]; then
> +	    _notrun "Need compaction enabled CONFIG_COMPACTION=y"
> +	fi
> +}
>  # Get hugepagesize in bytes
>  _get_hugepagesize()
>  {
> diff --git a/tests/generic/750 b/tests/generic/750
> new file mode 100755
> index 000000000000..334ab011dfa0
> --- /dev/null
> +++ b/tests/generic/750
> @@ -0,0 +1,62 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) 2024 Luis Chamberlain.  All Rights Reserved.
> +#
> +# FS QA Test 750
> +#
> +# fsstress + memory compaction test
> +#
> +. ./common/preamble
> +_begin_fstest auto rw long_rw stress soak smoketest
> +
> +_cleanup()
> +{
> +	cd /
> +	rm -f $runfile
> +	rm -f $tmp.*
> +	kill -9 $trigger_compaction_pid > /dev/null 2>&1
> +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> +
> +	wait > /dev/null 2>&1
> +}
> +
> +# Import common functions.
> +
> +# real QA test starts here
> +
> +_supported_fs generic
> +
> +_require_scratch
> +_require_vm_compaction
> +_require_command "$KILLALL_PROG" "killall"
> +
> +# We still deadlock with this test on v6.10-rc2, we need more work.
> +# but the below makes things better.
> +_fixed_by_git_commit kernel d99e3140a4d3 \
> +	"mm: turn folio_test_hugetlb into a PageType"
> +
> +echo "Silence is golden"
> +
> +_scratch_mkfs > $seqres.full 2>&1
> +_scratch_mount >> $seqres.full 2>&1
> +
> +nr_cpus=$((LOAD_FACTOR * 4))
> +nr_ops=$((25000 * nr_cpus * TIME_FACTOR))
> +fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus)
> +
> +# start a background trigger for memory compaction
> +runfile="$tmp.compaction"
> +touch $runfile
> +while [ -e $runfile ]; do
> +	echo 1 > /proc/sys/vm/compact_memory
> +	sleep 5
> +done &
> +trigger_compaction_pid=$!
> +
> +test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION")
> +
> +$FSSTRESS_PROG $FSSTRESS_AVOID "${fsstress_args[@]}" >> $seqres.full
> +wait > /dev/null 2>&1

Won't this "wait" wait forever (except a ctrl+C), due to no one removes
the $runfile?

Thanks,
Zorro

> +
> +status=0
> +exit
> diff --git a/tests/generic/750.out b/tests/generic/750.out
> new file mode 100644
> index 000000000000..bd79507b632e
> --- /dev/null
> +++ b/tests/generic/750.out
> @@ -0,0 +1,2 @@
> +QA output created by 750
> +Silence is golden
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] fstests: add mmap page boundary tests
  2024-06-11  3:01 ` [PATCH 2/5] fstests: add mmap page boundary tests Luis Chamberlain
  2024-06-11 16:48   ` Darrick J. Wong
@ 2024-06-12  8:06   ` Zorro Lang
  2024-06-13 21:05     ` Luis Chamberlain
  1 sibling, 1 reply; 22+ messages in thread
From: Zorro Lang @ 2024-06-12  8:06 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Mon, Jun 10, 2024 at 08:01:59PM -0700, Luis Chamberlain wrote:
> mmap() POSIX compliance says we should zero fill data beyond a file
> size up to page boundary, and issue a SIGBUS if we go beyond. While fsx
> helps us test zero-fill sometimes, fsstress also let's us sometimes test
> for SIGBUS however that is based on a random value and its not likely we
> always test it. Dedicate a specic test for this to make testing for
> this specific situation and to easily expand on other corner cases.
> 
> The only filesystem currently known to fail is tmpfs with huge pages,
> but the pending upstream patch "filemap: cap PTE range to be created to
> allowed zero fill in folio_map_range()" fixes this issue for tmpfs and
> also for LBS support.
> 
> Suggested-by: Matthew Wilcox <willy@infradead.org>
> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com>
> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> ---
>  common/rc             |   5 +-
>  tests/generic/749     | 238 ++++++++++++++++++++++++++++++++++++++++++
>  tests/generic/749.out |   2 +
>  3 files changed, 244 insertions(+), 1 deletion(-)
>  create mode 100755 tests/generic/749
>  create mode 100644 tests/generic/749.out
> 
> diff --git a/common/rc b/common/rc
> index fa7942809d6c..e812a2f7cc67 100644
> --- a/common/rc
> +++ b/common/rc
> @@ -60,12 +60,15 @@ _round_up_to_page_boundary()
>  	echo $(( (n + page_size - 1) & ~(page_size - 1) ))
>  }
>  
> +# You can override the $map_len but its optional, by default we use the
> +# max allowed size. If you use a length greater than the default you can
> +# expect a SIBGUS and test for it.
>  _mread()
>  {
>  	local file=$1
>  	local offset=$2
>  	local length=$3
> -	local map_len=$(_round_up_to_page_boundary $(_get_filesize $file))
> +	local map_len=${4:-$(_round_up_to_page_boundary $(_get_filesize $file)) }
>  
>  	# Some callers expect xfs_io to crash with SIGBUS due to the mread,
>  	# causing the shell to print "Bus error" to stderr.  To allow this
> diff --git a/tests/generic/749 b/tests/generic/749
> new file mode 100755
> index 000000000000..c1d3a2028549
> --- /dev/null
> +++ b/tests/generic/749
> @@ -0,0 +1,238 @@
> +#! /bin/bash
> +# SPDX-License-Identifier: GPL-2.0
> +# Copyright (c) Luis Chamberlain. All Rights Reserved.
> +#
> +# FS QA Test 749
> +#
> +# As per POSIX NOTES mmap(2) maps multiples of the system page size, but if the
> +# data mapped is not multiples of the page size the remaining bytes are zeroed
> +# out when mapped and modifications to that region are not written to the file.
> +# On Linux when you write data to such partial page after the end of the
> +# object, the data stays in the page cache even after the file is closed and
> +# unmapped and  even  though  the data  is never written to the file itself,
> +# subsequent mappings may see the modified content. If you go *beyond* this
> +# page, you should get a SIGBUS. This test verifies we zero-fill to page
> +# boundary and ensures we get a SIGBUS if we write to data beyond the system
> +# page size even if the block size is greater than the system page size.
> +. ./common/preamble
> +. ./common/rc
> +_begin_fstest auto quick prealloc
> +
> +# Import common functions.
> +. ./common/filter
> +
> +# real QA test starts here
> +_supported_fs generic
> +_require_scratch_nocheck
> +_require_test
> +_require_xfs_io_command "truncate"
> +_require_xfs_io_command "falloc"
> +
> +# _fixed_by_git_commit kernel <pending-upstream> \
> +#        "filemap: cap PTE range to be created to allowed zero fill in folio_map_range()"
> +
> +filter_xfs_io_data_unique()
> +{
> +    _filter_xfs_io_offset | sed -e 's| |\n|g' | grep -E -v "\.|XX|\*" | \
> +	sort -u | tr -d '\n'
> +}
> +
> +
> +setup_zeroed_file()
> +{
> +	local file_len=$1
> +	local sparse=$2
> +
> +	if $sparse; then
> +		$XFS_IO_PROG -f -c "truncate $file_len" $test_file
> +	else
> +		$XFS_IO_PROG -f -c "falloc 0 $file_len" $test_file
> +	fi
> +}
> +
> +mwrite()
> +{
> +       local file=$1
> +       local offset=$2
> +       local length=$3
> +       local map_len=${4:-$(_round_up_to_page_boundary $(_get_filesize $file)) }
> +
> +       # Some callers expect xfs_io to crash with SIGBUS due to the mread,
> +       # causing the shell to print "Bus error" to stderr.  To allow this
> +       # message to be redirected, execute xfs_io in a new shell instance.
> +       # However, for this to work reliably, we also need to prevent the new
> +       # shell instance from optimizing out the fork and directly exec'ing
> +       # xfs_io.  The easiest way to do that is to append 'true' to the
> +       # commands, so that xfs_io is no longer the last command the shell sees.
> +       bash -c "trap '' SIGBUS; ulimit -c 0; \
> +		$XFS_IO_PROG $file \
> +               -c 'mmap -w 0 $map_len' \
> +               -c 'mwrite $offset $length'; \
> +	       true"
> +}

As you've moved the _mread to common/rc, why not do the same for this mwrite?

Thanks,
Zorro

> +
> +do_mmap_tests()
> +{
> +	local block_size=$1
> +	local file_len=$2
> +	local offset=$3
> +	local len=$4
> +	local use_sparse_file=${5:-false}
> +	local new_filelen=0
> +	local map_len=0
> +	local csum=0
> +	local fs_block_size=$(_get_file_block_size $SCRATCH_MNT)
> +
> +	echo -en "\n\n==> Testing blocksize $block_size " >> $seqres.full
> +	echo -en "file_len: $file_len offset: $offset " >> $seqres.full
> +	echo -e "len: $len sparse: $use_sparse_file" >> $seqres.full
> +
> +	if ((fs_block_size != block_size)); then
> +		_fail "Block size created ($block_size) doesn't match _get_file_block_size on mount ($fs_block_size)"
> +	fi
> +
> +	rm -rf "${SCRATCH_MNT:?}"/*
> +
> +	# This let's us also test against sparse files
> +	setup_zeroed_file $file_len $use_sparse_file
> +
> +	# This will overwrite the old data, the file size is the
> +	# delta between offset and len now.
> +	$XFS_IO_PROG -f -c "pwrite -S 0xaa -b 512 $offset $len" \
> +		$test_file >> $seqres.full
> +
> +	sync
> +	new_filelen=$(_get_filesize $test_file)
> +	map_len=$(_round_up_to_page_boundary $new_filelen)
> +	csum_orig="$(_md5_checksum $test_file)"
> +
> +	# A couple of mmap() tests:
> +	#
> +	# We are allowed to mmap() up to the boundary of the page size of a
> +	# data object, but there a few rules to follow we must check for:
> +	#
> +	# a) zero-fill test for the data: POSIX says we should zero fill any
> +	#    partial page after the end of the object. Verify zero-fill.
> +	# b) do not write this bogus data to disk: on Linux, if we write data
> +	#    to a partially filled page, it will stay in the page cache even
> +	#    after the file is closed and unmapped even if it never reaches the
> +	#    file. Subsequent mappings *may* see the modified content, but it
> +	#    also can get other data. Since the data read after the actual
> +	#    object data can vary we just verify the filesize does not change.
> +	if [[ $map_len -gt $new_filelen ]]; then
> +		zero_filled_data_len=$((map_len - new_filelen))
> +		_scratch_cycle_mount
> +		expected_zero_data="00"
> +		zero_filled_data=$($XFS_IO_PROG -r $test_file \
> +			-c "mmap -r 0 $map_len" \
> +			-c "mread -v $new_filelen $zero_filled_data_len" \
> +			-c "munmap" | \
> +			filter_xfs_io_data_unique)
> +		if [[ "$zero_filled_data" != "$expected_zero_data" ]]; then
> +			echo "Expected data: $expected_zero_data"
> +			echo "  Actual data: $zero_filled_data"
> +			_fail "Zero-fill expectations with mmap() not respected"
> +		fi
> +
> +		_scratch_cycle_mount
> +		$XFS_IO_PROG $test_file \
> +			-c "mmap -w 0 $map_len" \
> +			-c "mwrite $new_filelen $zero_filled_data_len" \
> +			-c "munmap"
> +		sync
> +		csum_post="$(_md5_checksum $test_file)"
> +		if [[ "$csum_orig" != "$csum_post" ]]; then
> +			echo "Expected csum: $csum_orig"
> +			echo " Actual  csum: $csum_post"
> +			_fail "mmap() write up to page boundary should not change actual file contents"
> +		fi
> +
> +		local filelen_test=$(_get_filesize $test_file)
> +		if [[ "$filelen_test" != "$new_filelen" ]]; then
> +			echo "Expected file length: $new_filelen"
> +			echo " Actual  file length: $filelen_test"
> +			_fail "mmap() write up to page boundary should not change actual file size"
> +		fi
> +	fi
> +
> +	# Now lets ensure we get SIGBUS when we go beyond the page boundary
> +	_scratch_cycle_mount
> +	new_filelen=$(_get_filesize $test_file)
> +	map_len=$(_round_up_to_page_boundary $new_filelen)
> +	csum_orig="$(_md5_checksum $test_file)"
> +	_mread $test_file 0 $map_len >> $seqres.full  2>$tmp.err
> +	if grep -q 'Bus error' $tmp.err; then
> +		cat $tmp.err
> +		_fail "Not expecting SIGBUS when reading up to page boundary"
> +	fi
> +
> +	# This should just work
> +	_mread $test_file 0 $map_len >> $seqres.full  2>$tmp.err
> +	if [[ $? -ne 0 ]]; then
> +		_fail "mmap() read up to page boundary should work"
> +	fi
> +
> +	# This should just work
> +	mwrite $map_len 0 $map_len >> $seqres.full  2>$tmp.err
> +	if [[ $? -ne 0 ]]; then
> +		_fail "mmap() write up to page boundary should work"
> +	fi
> +
> +	# If we mmap() on the boundary but try to read beyond it just
> +	# fails, we don't get a SIGBUS
> +	$XFS_IO_PROG -r $test_file \
> +		-c "mmap -r 0 $map_len" \
> +		-c "mread 0 $((map_len + 10))" >> $seqres.full  2>$tmp.err
> +	local mread_err=$?
> +	if [[ $mread_err -eq 0 ]]; then
> +		_fail "mmap() to page boundary works as expected but reading beyond should fail: $mread_err"
> +	fi
> +
> +	$XFS_IO_PROG -w $test_file \
> +		-c "mmap -w 0 $map_len" \
> +		-c "mwrite 0 $((map_len + 10))" >> $seqres.full  2>$tmp.err
> +	local mwrite_err=$?
> +	if [[ $mwrite_err -eq 0 ]]; then
> +		_fail "mmap() to page boundary works as expected but writing beyond should fail: $mwrite_err"
> +	fi
> +
> +	# Now let's go beyond the allowed mmap() page boundary
> +	_mread $test_file 0 $((map_len + 10)) $((map_len + 10)) >> $seqres.full  2>$tmp.err
> +	if ! grep -q 'Bus error' $tmp.err; then
> +		_fail "Expected SIGBUS when mmap() reading beyond page boundary"
> +	fi
> +
> +	mwrite $test_file 0 $((map_len + 10)) $((map_len + 10))  >> $seqres.full  2>$tmp.err
> +	if ! grep -q 'Bus error' $tmp.err; then
> +		_fail "Expected SIGBUS when mmap() writing beyond page boundary"
> +	fi
> +
> +	local filelen_test=$(_get_filesize $test_file)
> +	if [[ "$filelen_test" != "$new_filelen" ]]; then
> +		echo "Expected file length: $new_filelen"
> +		echo " Actual  file length: $filelen_test"
> +		_fail "reading or writing beyond file size up to mmap() page boundary should not change file size"
> +	fi
> +}
> +
> +test_block_size()
> +{
> +	local block_size=$1
> +
> +	do_mmap_tests $block_size 512 3 5
> +	do_mmap_tests $block_size 11k 0 $((4096 * 3 + 3))
> +	do_mmap_tests $block_size 16k 0 $((16384+3))
> +	do_mmap_tests $block_size 16k $((16384-10)) $((16384+20))
> +	do_mmap_tests $block_size 64k 0 $((65536+3))
> +	do_mmap_tests $block_size 4k 4090 30 true
> +}
> +
> +_scratch_mkfs >> $seqres.full 2>&1 || _fail "mkfs failed"
> +_scratch_mount
> +test_file=$SCRATCH_MNT/file
> +block_size=$(_get_file_block_size "$SCRATCH_MNT")
> +test_block_size $block_size
> +
> +echo "Silence is golden"
> +status=0
> +exit
> diff --git a/tests/generic/749.out b/tests/generic/749.out
> new file mode 100644
> index 000000000000..24658deddb99
> --- /dev/null
> +++ b/tests/generic/749.out
> @@ -0,0 +1,2 @@
> +QA output created by 749
> +Silence is golden
> -- 
> 2.43.0
> 
> 


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 2/5] fstests: add mmap page boundary tests
  2024-06-12  8:06   ` Zorro Lang
@ 2024-06-13 21:05     ` Luis Chamberlain
  0 siblings, 0 replies; 22+ messages in thread
From: Luis Chamberlain @ 2024-06-13 21:05 UTC (permalink / raw)
  To: Zorro Lang
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Wed, Jun 12, 2024 at 04:06:34PM +0800, Zorro Lang wrote:
> On Mon, Jun 10, 2024 at 08:01:59PM -0700, Luis Chamberlain wrote:
> > +mwrite()
> > +{
> > +       local file=$1
> > +       local offset=$2
> > +       local length=$3
> > +       local map_len=${4:-$(_round_up_to_page_boundary $(_get_filesize $file)) }
> > +
> > +       # Some callers expect xfs_io to crash with SIGBUS due to the mread,
> > +       # causing the shell to print "Bus error" to stderr.  To allow this
> > +       # message to be redirected, execute xfs_io in a new shell instance.
> > +       # However, for this to work reliably, we also need to prevent the new
> > +       # shell instance from optimizing out the fork and directly exec'ing
> > +       # xfs_io.  The easiest way to do that is to append 'true' to the
> > +       # commands, so that xfs_io is no longer the last command the shell sees.
> > +       bash -c "trap '' SIGBUS; ulimit -c 0; \
> > +		$XFS_IO_PROG $file \
> > +               -c 'mmap -w 0 $map_len' \
> > +               -c 'mwrite $offset $length'; \
> > +	       true"
> > +}
> 
> As you've moved the _mread to common/rc, why not do the same for this mwrite?

I didn't move it as this mwrite() is only used by one test. Let me know
if you want me to move it.

  Luis

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 3/5] fstests: add fsstress + compaction test
  2024-06-12  8:00   ` Zorro Lang
@ 2024-06-13 21:10     ` Luis Chamberlain
  0 siblings, 0 replies; 22+ messages in thread
From: Luis Chamberlain @ 2024-06-13 21:10 UTC (permalink / raw)
  To: Zorro Lang
  Cc: patches, fstests, linux-xfs, linux-mm, linux-fsdevel, akpm, ziy,
	vbabka, seanjc, willy, david, hughd, linmiaohe, muchun.song,
	osalvador, p.raghav, da.gomez, hare, john.g.garry

On Wed, Jun 12, 2024 at 04:00:48PM +0800, Zorro Lang wrote:
> On Mon, Jun 10, 2024 at 08:02:00PM -0700, Luis Chamberlain wrote:
> > Running compaction while we run fsstress can crash older kernels as per
> > korg#218227 [0], the fix for that [0] has been posted [1] that patch
> > was merged on v6.9-rc6 fixed by commit d99e3140a4d3 ("mm: turn
> > folio_test_hugetlb into a PageType"). However even on v6.10-rc2 where
> > this kernel commit is already merged we can still deadlock when running
> > fsstress and at the same time triggering compaction, this is a new
> > issue being reported now this through patch, but this patch also
> > serves as a reproducer with a high confidence. It always deadlocks.
> > If you enable CONFIG_PROVE_LOCKING with the defaults you will end up
> > with a complaint about increasing MAX_LOCKDEP_CHAIN_HLOCKS [1], if
> > you adjust that you then end up with a few soft lockup complaints and
> > some possible deadlock candidates to evaluate [2].
> > 
> > Provide a simple reproducer and pave the way so we keep on testing this.
> > 
> > Without lockdep enabled we silently deadlock on the first run of the
> > test without the fix applied. With lockdep enabled you get a splat about
> > the possible deadlock on the first run of the test.
> > 
> > [0] https://bugzilla.kernel.org/show_bug.cgi?id=218227
> > [1] https://gist.github.com/mcgrof/824913b645892214effeb1631df75072
> > [2] https://gist.github.com/mcgrof/926e183d21c5c4c55d74ec90197bd77a
> > 
> > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
> > ---
> >  common/rc             |  7 +++++
> >  tests/generic/750     | 62 +++++++++++++++++++++++++++++++++++++++++++
> >  tests/generic/750.out |  2 ++
> >  3 files changed, 71 insertions(+)
> >  create mode 100755 tests/generic/750
> >  create mode 100644 tests/generic/750.out
> > 
> > diff --git a/common/rc b/common/rc
> > index e812a2f7cc67..18ad25662d5c 100644
> > --- a/common/rc
> > +++ b/common/rc
> > @@ -151,6 +151,13 @@ _require_hugepages()
> >  		_notrun "Kernel does not report huge page size"
> >  }
> >  
> > +# Requires CONFIG_COMPACTION
> > +_require_vm_compaction()
> > +{
> > +	if [ ! -f /proc/sys/vm/compact_memory ]; then
> > +	    _notrun "Need compaction enabled CONFIG_COMPACTION=y"
> > +	fi
> > +}
> >  # Get hugepagesize in bytes
> >  _get_hugepagesize()
> >  {
> > diff --git a/tests/generic/750 b/tests/generic/750
> > new file mode 100755
> > index 000000000000..334ab011dfa0
> > --- /dev/null
> > +++ b/tests/generic/750
> > @@ -0,0 +1,62 @@
> > +#! /bin/bash
> > +# SPDX-License-Identifier: GPL-2.0
> > +# Copyright (c) 2024 Luis Chamberlain.  All Rights Reserved.
> > +#
> > +# FS QA Test 750
> > +#
> > +# fsstress + memory compaction test
> > +#
> > +. ./common/preamble
> > +_begin_fstest auto rw long_rw stress soak smoketest
> > +
> > +_cleanup()
> > +{
> > +	cd /
> > +	rm -f $runfile
> > +	rm -f $tmp.*
> > +	kill -9 $trigger_compaction_pid > /dev/null 2>&1
> > +	$KILLALL_PROG -9 fsstress > /dev/null 2>&1
> > +
> > +	wait > /dev/null 2>&1
> > +}
> > +
> > +# Import common functions.
> > +
> > +# real QA test starts here
> > +
> > +_supported_fs generic
> > +
> > +_require_scratch
> > +_require_vm_compaction
> > +_require_command "$KILLALL_PROG" "killall"
> > +
> > +# We still deadlock with this test on v6.10-rc2, we need more work.
> > +# but the below makes things better.
> > +_fixed_by_git_commit kernel d99e3140a4d3 \
> > +	"mm: turn folio_test_hugetlb into a PageType"
> > +
> > +echo "Silence is golden"
> > +
> > +_scratch_mkfs > $seqres.full 2>&1
> > +_scratch_mount >> $seqres.full 2>&1
> > +
> > +nr_cpus=$((LOAD_FACTOR * 4))
> > +nr_ops=$((25000 * nr_cpus * TIME_FACTOR))
> > +fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus)
> > +
> > +# start a background trigger for memory compaction
> > +runfile="$tmp.compaction"
> > +touch $runfile
> > +while [ -e $runfile ]; do
> > +	echo 1 > /proc/sys/vm/compact_memory
> > +	sleep 5
> > +done &
> > +trigger_compaction_pid=$!
> > +
> > +test -n "$SOAK_DURATION" && fsstress_args+=(--duration="$SOAK_DURATION")
> > +
> > +$FSSTRESS_PROG $FSSTRESS_AVOID "${fsstress_args[@]}" >> $seqres.full
> > +wait > /dev/null 2>&1
> 
> Won't this "wait" wait forever (except a ctrl+C), due to no one removes
> the $runfile?

Odd, pretty sure I tested it and it didn't wait forever, but I'll add
the rm after the FSSTRESS call.

  Luis

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-06-13 21:10 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-11  3:01 [PATCH 0/5] fstests: add some new LBS inspired tests Luis Chamberlain
2024-06-11  3:01 ` [PATCH 1/5] common: move mread() to generic helper _mread() Luis Chamberlain
2024-06-11 14:28   ` Darrick J. Wong
2024-06-11  3:01 ` [PATCH 2/5] fstests: add mmap page boundary tests Luis Chamberlain
2024-06-11 16:48   ` Darrick J. Wong
2024-06-11 18:10     ` Luis Chamberlain
2024-06-11 18:46       ` Darrick J. Wong
2024-06-11 20:29         ` Luis Chamberlain
2024-06-12  8:06   ` Zorro Lang
2024-06-13 21:05     ` Luis Chamberlain
2024-06-11  3:02 ` [PATCH 3/5] fstests: add fsstress + compaction test Luis Chamberlain
2024-06-11 14:48   ` Darrick J. Wong
2024-06-12  8:00   ` Zorro Lang
2024-06-13 21:10     ` Luis Chamberlain
2024-06-11  3:02 ` [PATCH 4/5] _require_debugfs(): simplify and fix for debian Luis Chamberlain
2024-06-11 14:35   ` Darrick J. Wong
2024-06-12  7:51   ` Zorro Lang
2024-06-11  3:02 ` [PATCH 5/5] fstests: add stress truncation + writeback test Luis Chamberlain
2024-06-11 14:45   ` Darrick J. Wong
2024-06-11 18:15     ` Luis Chamberlain
2024-06-11 18:29       ` Darrick J. Wong
2024-06-11 18:59         ` Luis Chamberlain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).